锂电池研究之五——NASA 和 CALCE 锂电池数据集，基于 Pytorch 的 RNN、LSTM、GRU 寿命预测

原创博客，转载请注明出处，谢谢！

代码下载地址：

https://github.com/XiuzeZhou/NASA

https://github.com/XiuzeZhou/CALCE

1. 需求

之前已经对 NASA 和 CALCE 这两个数据集进行分析，然后用最简单的神经网络 MLP 来预测。（PS: 相关内容请看文末的参考文献）

最近经常有人问，只有 MLP，怎么没有 RNN 和 LSTM ？其实，它们的实验我之前写锂电池论文的时候就完成了，只是最近一边忙着赶论文，一边准备托福，所以没顾得上码字。

锂电池的容量数据其实是一个时间序列，MLP 并不能很好地对时序数据进行分析。对时间序列的分析最经典的算法当然要数 RNN、LSTM 和 GRU 了。

故，接下来我会主要采用 RNN、LSTM 和 GRU 这三个序列模型进行寿命预测。

2. 模型简介

具体的公式就不介绍了，它们已经经典到教科书式的必备知识点了。主要的思路是输入一个序列，结合之前序列留下的信息来更新模型。

3. 函数介绍

与 MLP 不同，RNN、LSTM 和 GRU 它们的网络结构 Pytorch 都已经充分定义。所以，正确使用它们就好了。

RNN: nn.RNN(input_size, hidden_size, num_layers, batch_first)

LSTM: nn.LSTM(input_size, hidden_size, num_layers, batch_first)

GRU: nn.GRU(input_size, hidden_size, num_layers, batch_first)

我们从它们的函数定义来看，这三个模型是相同的，因此一个跑通了其他两个也就顺便搞定了。它们的复杂程度从低到高：RNN < GRU < LSTM。所以，我会用 LSTM 函数来举例说明。

3.1 参数介绍

nn.LSTM(input_size, hidden_size, num_layers, batch_first)

这几个参数，特别是前三个，是模型最基本、最重要的参数（其他参数设置比较固定，故不讨论）。

input_size: 输入数据的特征维数，也就是从时间序列数据中滑动窗口每次截取的那小一段数据

hidden_size: 隐层的维度

num_layers: 层数

batch_first: 默认是 False，因此它默认输入数据 shape=(seq_length, batch_size, input_size)。我之所以把它单独拎出来，只是因为我习惯将 batch_size 放在最前面（也可能只是自己的强迫症作祟。。。）。

3.2 输入输出

输入： input, (h_0, c_0)

输出： output, (h_n, c_n)

他们俩的结构一样，以输入为例。

input ：将 batch_size 设置为 True, 那么我们输入数据(batch_size, seq_length=1, input_size=T)，feature_dim 为输入数据大小 T，即滑动窗口大小。
h_0: 初始隐藏信息，shape=(batch_size, num_layers*num_directions, hidden_size)
c_0 : 和 h_0 的形状相同，初始状态。

h_0, c_0 如果不提供，那么默认都是０。

4. 代码讲解

4.1 网络定义

class Net(nn.Module):
    def __init__(self, input_size, hidden_dim, num_layers, n_class=1, mode='LSTM'):
        super(Net, self).__init__()
        self.cell = nn.LSTM(input_size=input_size, hidden_size=hidden_dim, 
                            num_layers=num_layers, batch_first=True)
        if mode == 'GRU':
            self.cell = nn.GRU(input_size=input_size, hidden_size=hidden_dim, 
                               num_layers=num_layers, batch_first=True)
        elif mode == 'RNN':
            self.cell = nn.RNN(input_size=input_size, hidden_size=hidden_dim, 
                               num_layers=num_layers, batch_first=True)
        self.linear = nn.Linear(hidden_dim, n_class)

    def forward(self, x):           # x shape: (batch_size, seq_len, input_size)
        out, _ = self.cell(x) 
        out = out.reshape(-1, hidden_dim)
        out = self.linear(out)      # out shape: (batch_size, n_class=1)
        return out

网络都封装好了，感觉比 MLP 还简单。

所以，主要功夫变成了怎么把自己的数据整理好符合它们定义的格式。

4.2 训练函数

def tain(lr=0.001, 
         feature_size=16, 
         hidden_dim=128, 
         num_layers=2, 
         weight_decay=0.0, 
         mode='LSTM', 
         EPOCH=1000, 
         seed=0):
    score_list, result_list = [], []
    for i in range(4):
        name = Battary_list[i]
        train_x, train_y, train_data, test_data = get_train_test(
            Battery, name, window_size=feature_size)
        train_size = len(train_x)
        print('sample size: {}'.format(train_size))

        setup_seed(seed)
        model = Net(input_size=feature_size, 
                    hidden_dim=hidden_dim, 
                    num_layers=num_layers, 
                    mode=mode)

        optimizer = torch.optim.Adam(model.parameters(), 
                                     lr=lr, weight_decay=weight_decay)
        criterion = nn.MSELoss()

        test_x = train_data.copy()
        loss_list, y_ = [0], []
        mae, rmse, re = 1, 1, 1
        score_, score = 1,1
        for epoch in range(EPOCH):
            X = np.reshape(train_x/Rated_Capacity,(-1, 1, feature_size)
                          ).astype(np.float32) #(batch_size, seq_len, input_size)
            y = np.reshape(train_y[:,-1]/Rated_Capacity,(-1,1)).astype(np.float32)

            X, y = torch.from_numpy(X), torch.from_numpy(y)
            output= model(X)
            output = output.reshape(-1, 1)
            loss = criterion(output, y)
            optimizer.zero_grad()              # clear gradients for this training step
            loss.backward()                    # backpropagation, compute gradients
            optimizer.step()                   # apply gradients

            if (epoch + 1)%100 == 0:
                test_x = train_data.copy()    #每100次重新预测一次
                point_list = []
                while (len(test_x) - len(train_data)) < len(test_data):
                    x = np.reshape(np.array(test_x[-feature_size:])/Rated_Capacity,
                                   (-1, 1, feature_size)).astype(np.float32)
                    x = torch.from_numpy(x)  # shape: (batch_size, 1, input_size)
                    pred = model(x)
                    next_point = pred.data.numpy()[0,0] * Rated_Capacity
                    test_x.append(next_point)    #测试值加入原来序列用来继续预测下一个点
                    point_list.append(next_point) #保存输出序列最后一个点的预测值
                y_.append(point_list)             #保存本次预测所有的预测值
                loss_list.append(loss)
                mae, rmse = evaluation(y_test=test_data, y_predict=y_[-1])
                re = relative_error(y_test=test_data, 
                                    y_predict=y_[-1], threshold=Rated_Capacity*0.7)
                print('epoch:{:<2d} | loss:{:<6.4f} | MAE:{:<6.4f} | \
                      RMSE:{:<6.4f} | RE:{:<6.4f}'.format(epoch, loss, mae, rmse, re))
            score = [re, mae, rmse]
            if (loss < 1e-3) and (score_[0] < score[0]):
                break
            score_ = score.copy()

        score_list.append(score_)
        result_list.append(y_[-1])
    return score_list, result_list

4.3 主函数

首先，主要参数设置；然后，随机运行 10 次，取其结果平均值。

# NASA数据集 参数设置
window_size = 16
EPOCH = 1000
lr = 0.001           # learning rate
hidden_dim = 256
num_layers = 2
weight_decay = 0.0
mode = 'LSTM'        # RNN, LSTM, GRU
Rated_Capacity = 2.0

SCORE = []
for seed in range(10):
    print('seed: ', seed)
    score_list, _ = tain(lr=lr, feature_size=window_size, hidden_dim=hidden_dim,
                         num_layers=num_layers, weight_decay=weight_decay, mode=mode, 
                         EPOCH=EPOCH, seed=seed)
    print('------------------------------------------------------------------')
    for s in score_list:
        SCORE.append(s)

mlist = ['re', 'mae', 'rmse']
for i in range(3):
    s = [line[i] for line in SCORE]
    print(mlist[i] + ' mean: {:<6.4f}'.format(np.mean(np.array(s))))
print('------------------------------------------------------------------')
print('------------------------------------------------------------------')

5. 实验结果

5.1 定量评估

I. NASA 数据集

	RE	RMSE	MAE
RNN	0.2851	0.0848	0.0749
LSTM	0.2610	0.0966	0.0866
GRU	0.3044	0.0921	0.0905

II. CALCE 数据集

	RE	RMSE	MAE
RNN	0.1614	0.1099	0.0938
LSTM	0.0902	0.0736	0.0542
GRU	0.1319	0.0946	0.0671

5.2 总结

(1) 从数据集上看，CALCE 比 NASA 好太多！因为前者的数据多且密，每个样本特征之间的差值较小，数据上更“连续”，让模型训练更充分。不妨假设样本特征之间差值很大，那如果预测它们中间的值，那误差是不是也更大呢？

(2) 从模型角度看，LSTM 确实比 RNN 好些，特别是在 CALCE 上。但是 GRU 并没有比 LSTM 好，前者简化了后者，但是在这两个数据集上并没有优势。但是，GRU 训练模型，能节省一些时间。