原创博客,转载请注明出处,谢谢!

代码下载地址:https://github.com/XiuzeZhou/CALCE

1. 需求

前面把 NASA 和 CALCE 这两锂电池数据集关键信息提取了,并用最简单的 MLP 来预测 NASA 的电池寿命。

接下来,用 MLP 来预测 CALCE 的电池寿命。

2. CALCE 数据集

I. 测试条件

温度:测试温度 1 度

充电:以的恒定电流(CC)模式进行充电,直到电池电压达到 4.2V,然后以恒定电压(CV)模式充电,直到充电电流降至 20mA。

放电:以恒定电流(CC)模式进行放电,直到电池电压降到 2.7V。

终止条件:当电池达到寿命终止(End Of Life, EOF)标准——额定容量下降到它的30%,即电池的额定容量从 1.1Ahr 到 0.77Ahr。

II. 数据集介绍及预处理

CALCE 原始数据集的数据格式是 xlsx 的,数据集有 17 列信息,需要先进行预处理。首先读取 xlsx 文件,然后从中筛选出关键信息:容量,电流和电压。

这部分详细的内容,请看我的博客: https://snailwish.com/395/

加载数据完毕后,我们定义函数来查看锂电池随充放电次数的变化曲线:

#Rated_Capacity = 1.1
fig, ax = plt.subplots(1, figsize=(12, 8))
color_list = ['b:', 'g--', 'r-.', 'c.']
for name,color in zip(Battary_list, color_list):
    df_result = Battery[name]
    ax.plot(df_result['cycle'], df_result['capacity'], color, label='Battery_'+name)
ax.set(xlabel='Discharge cycles', ylabel='Capacity (Ah)', 
       title='Capacity degradation at ambient temperature of 1°C')
plt.legend()

3. 训练集和测试集

I. 序列生成

锂电池容量是一个整体减小趋势的序列。我们采用一个长度为 window_size 的滑动窗口,从序列头部到尾部,每次移动 1 个数据,来截取我们需要的训练数据。比如原始序列是 [1, 2, 3, 4, 5],window_size=3,那么 (x, y) 对应的训练数据和标签就是 ([1, 2, 3], 4),([2, 3, 4], 5)。

def build_sequences(text, window_size):
    #text:list of capacity
    x, y = [],[]
    for i in range(len(text) - window_size):
        sequence = text[i:i+window_size]
        target = text[i+1:i+1+window_size]

        x.append(sequence)
        y.append(target)

    return np.array(x), np.array(y)

II. 生成训练集和测试集

可以采用留一评估,即一个数据作为测试集,其他所有数据作为训练集。这样做比较简单。因此,我们把三组锂电池的全部数据为训练集,剩余一组数据为测试集。

# 留一评估:一组数据为测试集,其他所有数据全部拿来训练
def get_train_test(data_dict, name, window_size=8, train_ratio=0.):
    data_sequence=data_dict[name][1]
    train_data, test_data = data_sequence[:window_size+1], data_sequence[window_size+1:]
    train_x, train_y = build_sequences(text=train_data, window_size=window_size)
    for k, v in data_dict.items():
        if k != name:
            data_x, data_y = build_sequences(text=v[1], window_size=window_size)
            train_x, train_y = np.r_[train_x, data_x], np.r_[train_y, data_y]

    return train_x, train_y, list(train_data), list(test_data)

4. MLP 网络

I. 网络定义

MLP 的定义很简单,主要的参数就是隐含层的层数堆叠。

class Net(nn.Module):
    def __init__(self, feature_size=8, hidden_size=[16, 8]):
        super(Net, self).__init__()
        self.feature_size, self.hidden_size = feature_size, hidden_size
        self.layer0 = nn.Linear(self.feature_size, self.hidden_size[0])
        self.layers = [nn.Sequential(
            nn.Linear(self.hidden_size[i], self.hidden_size[i+1]), nn.ReLU()) 
                       for i in range(len(self.hidden_size) - 1)]
        self.linear = nn.Linear(self.hidden_size[-1], 1)

    def forward(self, x):
        out = self.layer0(x)
        for layer in self.layers:
            out = layer(out)
        out = self.linear(out) 
        return out

II. 评估方法

a. 均方根差 (Root Mean Square Error,缩写 RMSE)

b. 平均绝对误差 (Mean Absolute Error,缩写 MAE)

c. 剩余充放电次数相对误差 (Relative Error,缩写 RE)

III. 训练模型

训练函数定义:LR:学习率,feature_size:也能本特征数目,hidden_size:隐含层结构,weight_decay:正则项系数,window_size:滑动宽口大小,EPOCH:训练次数,seed:随机数种子

def tain(LR=0.01, feature_size=8, hidden_size=[16,8], weight_decay=0.0, 
         window_size=8, EPOCH=1000, seed=0):
    mae_list, rmse_list, re_list = [], [], []
    result_list = []
    for i in range(4):
        name = Battary_list[i]
        train_x, train_y, train_data, test_data = get_train_test(Battery, name, window_size)
        train_size = len(train_x)
        print('sample size: {}'.format(train_size))

        setup_seed(seed)
        model = Net(feature_size=feature_size, hidden_size=hidden_size)
        if torch.cuda.is_available():
            model = model.cuda()

        optimizer = torch.optim.Adam(model.parameters(), lr=LR, weight_decay=weight_decay)
        criterion = nn.MSELoss()

        test_x = train_data.copy()
        loss_list, y_ = [0], []
        for epoch in range(EPOCH):
            X = np.reshape(train_x/Rated_Capacity, (-1, feature_size)).astype(np.float32)
            y = np.reshape(train_y[:,-1]/Rated_Capacity,(-1,1)).astype(np.float32)

            X, y = torch.from_numpy(X), torch.from_numpy(y)
            output= model(X)
            loss = criterion(output, y)
            optimizer.zero_grad()              # clear gradients for this training step
            loss.backward()                    # backpropagation, compute gradients
            optimizer.step()                   # apply gradients

            if (epoch + 1)%100 == 0:
                test_x = train_data.copy() #每100次重新预测一次
                point_list = []
                while (len(test_x) - len(train_data)) < len(test_data):
                    x = np.reshape(np.array(test_x[-feature_size:])/Rated_Capacity, 
                                   (-1, feature_size)).astype(np.float32)
                    x = torch.from_numpy(x)
                    pred = model(x) 
                    next_point = pred.data.numpy()[0,0] * Rated_Capacity
                    test_x.append(next_point)#测试值加入原来序列用来继续预测下一个点
                    point_list.append(next_point)#保存输出序列最后一个点的预测值
                y_.append(point_list)#保存本次预测所有的预测值
                loss_list.append(loss)
                mae, rmse = evaluation(y_test=test_data, y_predict=y_[-1])
                re = relative_error(
                    y_test=test_data, y_predict=y_[-1], threshold=Rated_Capacity*0.7)
                print('epoch:{:<2d} | loss:{:<6.4f} | MAE:{:<6.4f} | RMSE:{:<6.4f} | \
                RE:{:<6.4f}'.format(epoch, loss, mae, rmse, re))
            if (len(loss_list) > 1) and (abs(loss_list[-2] - loss_list[-1]) < 1e-5):
                break

        mae, rmse = evaluation(y_test=test_data, y_predict=y_[-1])
        re = relative_error(
            y_test=test_data, y_predict=y_[-1], threshold=Rated_Capacity*0.7)
        mae_list.append(mae)
        rmse_list.append(rmse)
        re_list.append(re)
        result_list.append(y_[-1])
    return re_list, mae_list, rmse_list, result_list

5. 实验结果

I. 定量评估

定义好参数后,采用留一评估,不同的种子下运行 10 次,即设置 10 个不同的随机种子,然后取均值。

window_size = 8
EPOCH = 1000
LR = 0.01    # learning rate
feature_size = window_size
hidden_size = [32,16]
weight_decay = 0.0
Rated_Capacity = 1.1

MAE, RMSE, RE = [], [], []
for seed in range(10):
    re_list, mae_list, rmse_list, _ = tain(LR, feature_size, hidden_size, weight_decay,
                                           window_size, EPOCH, seed)
    RE.append(np.mean(np.array(re_list)))
    MAE.append(np.mean(np.array(mae_list)))
    RMSE.append(np.mean(np.array(rmse_list)))
    print('------------------------------------------------------------------')

print('RE: mean: {:<6.4f} | std: {:<6.4f}'.format(
    np.mean(np.array(RE)), np.std(np.array(RE))))
print('MAE: mean: {:<6.4f} | std: {:<6.4f}'.format(
    np.mean(np.array(MAE)), np.std(np.array(MAE))))
print('RMSE: mean: {:<6.4f} | std: {:<6.4f}'.format(
    np.mean(np.array(RMSE)), np.std(np.array(RMSE))))
print('------------------------------------------------------------------')
print('------------------------------------------------------------------')
MAE RMSE RE
0.0943 0.1253 0.2003

II. 定性评估

接下来,查看预测的实验效果:查看每组电池的拟合曲线。

seed = 6
_, _, _, result_list = tain(LR, feature_size, hidden_size, weight_decay, 
                            window_size, EPOCH, seed)
for i in range(4):
    name = Battary_list[i]
    train_x, train_y, train_data, test_data = get_train_test(Battery, name, window_size)

    aa = train_data[:window_size+1].copy() # 第一个输入序列
    [aa.append(a) for a in result_list[i]] # 测试集预测结果

    battery = Battery[name]
    fig, ax = plt.subplots(1, figsize=(12, 8))
    ax.plot(battery['cycle'], battery['capacity'], 'b.', label=name)
    ax.plot(battery['cycle'], aa, 'r.', label='Prediction')
    plt.plot([-1,1000],[Rated_Capacity*0.7, Rated_Capacity*0.7], 
             c='black', lw=1, ls='--')  # 临界点直线
    ax.set(xlabel='Discharge cycles', ylabel='Capacity (Ah)', 
           title='Capacity degradation at ambient temperature of 1°C')
    plt.legend()




III. 总结

从实验结果上看,在 CALCE 这个数据集上,MLP 的效果并不算特比好,特别是蓝色曲线后半段的预测部分。大概是因为以下两个原因:

(1) 这些数据序列都比较长,MLP 没有考虑长时间效应;

(2) 锂电池容量前期变化比较平缓,中后期变化比较陡峭,变化程度不同。

更多内容

1. NASA 锂电池数据集,基于 Python 的锂电池寿命预测: https://snailwish.com/395/

2. CALCE 锂电池数据集,基于 Python 数据处理: https://snailwish.com/437/

3. NASA 锂电池数据集,基于 python 的 MLP 锂电池寿命预测: https://snailwish.com/427/

4. NASA 和 CALCE 锂电池数据集,基于 Pytorch 的 RNN、LSTM、GRU 寿命预测: https://snailwish.com/497/

5. 基于 Pytorch 的 Transformer 锂电池寿命预测: https://snailwish.com/555/

6. 锂电池研究之七——基于 Pytorch 的高斯函数拟合时间序列数据: https://snailwish.com/576/