6. Pytorch-based Transformer Network for Remaining Useful Life Prediction of Lithium-Ion Batteries
paper:
http://zhouxiuze.com/pub/Transformer.pdf
code:
https://github.com/XiuzeZhou/RUL
referce:
D. Chen, W. Hong, and X. Zhou, "Transformer Network for Remaining Useful Life Prediction of Lithium-Ion Batteries", IEEE Access, 2022, vol. 10, pp. 19621-19628.
1. Introduction
Recently, Transformer-based models are very hot in CV and NLP, but articles on Li-ion battery remain blank. Therefore, I write an article about Transformer for Remaining Useful Life (RUL) of Li-ion batteries and release all Pytorch codes, hoping to help others.
2. Problem Definition
With the charge-discharge cycle increases, the performance of Li-ion batteries generally degrades. Battery performance can be measured by capacity, so SOH, a health indicator for battery aging, can be defined by the following capacity ratio:
{\textit{SOH}}(t)=\frac{C_t}{C_0}\times 100\%,where C_0 denotes rated capacity, and C_t denotes the measured capacity of cycle, t. As the number of times a battery is charged/discharged increases, capacity degrades. For a battery, End of Life (EOL) which is closely related to its capacity, is defined as the point when remaining capacity reaches 70-80% of initial capacity. Fig. 2 illustrates an example of RUL prediction.
3. Proposed Model
Our proposed model consisting of four parts: input and normalization, denoising, Transformer, and prediction. The architecture is shown in Fig. 3.
Input and Normalization. To reduce the influence of input data distribution changes on neural networks, the data must be normalized. Let \mathbf{x}=\left\{x_{1},x_{2},\dots,x_{n} \right\} denote the input sequence of capacity with length n, which is mapped to (0,1]:
\mathbf{x}'=\frac{\mathbf{x}}{C_0},
where C_0 denotes rated capacity.
Denoising. Raw input is always full of noise, especially when charge/discharge regeneration occurs. To maintain stability and robustness, input data must be denoised before being fed into deep neural networks. Denoising Auto-Encoder (DAE), an unsupervised method in learning useful features, which is adopted by our method, reconstructs input data from lower-dimensional representation preserving as much information as possible in the process.
Gaussian noise is added to the normalized input,\mathbf{x}’_t, to obtain the corrupted vector, \widetilde{\mathbf{x}}_t. DAE is defined as follows:
\mathbf{z}=a\left({W}^T\widetilde{\mathbf{x}}_t+{b} \right),
Loss function of DAE:
\mathcal L_d=\frac{1}{n}\sum_{t=1}^{n}
\ell(\widetilde{{x}}_t-\widehat{{x}}_t) +
\lambda \left ( \left \| {W} \right \|^2_{F} + \left \| {W'} \right \|^2_{F} \right)
Transformer. The Transformer layers are a stack of Transformer encoders that extract the degradation features from the reconstructed data, with two sub-layers: Multi-Head Self-Attention and Feed-Forward.
\textit{MultiHead}\left({H}^{l-1}\right) = [head_1;head_2;\cdots;head_h]{W}^{O};
Prediction. Finally, to predict unknown capacity, a full connection layer is used to map the representation learned by the last Transformer cell to arrive at the final prediction \widehat{x}_t:
\widehat{x}_t=f\left({W}_p{H}^h + {b}_p\right)Loss function. The learning procedure optimizes both tasks simultaneously in a unified framework. Mean Square Error (MSE) is used to evaluate loss, and the objective function is defined as follows:
\mathcal L=\sum_{t=T+1}^{n}\left(x_t-\widehat{x}_{t} \right)^2+\alpha\sum_{i=1}^{n} \ell(\widetilde{\mathbf{x}}_i-\widehat{\mathbf{x}}_i)+\lambda\Omega \left(\Theta \right)4. Experiments
4.1 Data sets
We conducted experiments using two public data sets: NASA and CALCE. The NASA data set, available from the NASA Ames Research Center web site. CALCE data set is available from the Center for Advanced Life Cycle Engineering (CALCE) of the University of Maryland.
4.2 Code
The module is mainly used for noise reduction, the data from sensors often has a lot of noise, which degrades the training of networks. Therefore, it is better remove noise before training.
4.3 Main Network
The main network has two functions: denoising and Transformer.
4.4 Training
4.5 Setting and Running
The following parameters are the best of my computer using grid search method to obtain. If your results are different from mine, you can load the computer generated random weights generated from my device. Of course, it is better to use grid search method to obtain the optimal parameters.
4.6 Grid Search Method
Grid search method retrains the model to obtain the optimal parameters.