by
Wenmeng Liu
M.A.Sc., Shanghai Jiao Tong University, 2018 B.Eng., Northwestern Polytechnical University, 2015
A Report Submitted in Partial Fulfillment of the Requirements for the Degree of
MASTER OF ENGINEERING
in the Department of Electrical and Computer Engineering
© Wenmeng Liu, 2021 University of Victoria
All rights reserved. This report may not be reproduced in whole or in part, by photocopying or other means, without the permission of the author.
Early Prediction of Battery Cycle Life Using Machine Learning
by
Wenmeng Liu
M.A.Sc., Shanghai Jiao Tong University, 2018 B.Eng., Northwestern Polytechnical University, 2015
Supervisory Committee
Dr. T. Aaron Gulliver, Supervisor
(Department of Electrical and Computer Engineering)
Dr. Wu-Sheng Lu, Departmental Member
Supervisory Committee
Dr. T. Aaron Gulliver, Supervisor
(Department of Electrical and Computer Engineering)
Dr. Wu-Sheng Lu, Departmental Member
(Department of Electrical and Computer Engineering)
ABSTRACT
Lithium-ion batteries are widely used in transportation and vehicle electrification be-cause of their low cost, high-energy density, and long lifetime. However, aging, com-plex nonlinear degradation, and diverse operation conditions degrade the performance of batteries. In addition, it often takes months to years to evaluate the cycle life of an Li-ion battery. Therefore, accurate prediction of the cycle life using the data from the first few cycles is imperative.
In this study, supervised Machine Learning (ML) techniques are used to develop data-driven models that can accurately predict the cycle life of lithium iron phosphate (LFP) batteries. This model is built using battery data from the first 300 cycles, at which point most batteries have yet to exhibit capacity degradation. The dataset employed consists of experimental results for 124 LFP batteries tested under 72 fast charging pro-tocols. This dataset is one of the largest open-access datasets for Li-ion batteries. To in-vestigate the electrochemical evolution of each battery, 14 domain-based features are considered which fall into three categories: ∆Q(V )features, discharge capacity, and physical measurements related to battery degradation (such as internal resistance).
∆Q(V )is the discharge capacity difference between the end and start cycles at a
dis-charge voltage V. An early prediction model on cycle life can be regarded as a regres-sion problem. A subset of these features are used as input to the ML models. The ML models used in this study are a linear model withL1andL2norm regularization (elastic
net), a tree-based model (random forest), and a linear model withL1norm
its high correlation with cycle life. The elastic net model provides the best overall pre-diction performance (7.28% test error), followed by the random forest model (7.85% test error) and the benchmark model (11.45% test error).
Contents
Supervisory Committee ii
Abstract iii
Table of Contents v
List of Tables vii
List of Figures viii
Acronyms x Acknowledgements xii Dedication xiii 1 Introduction 1 1.1 Objectives . . . 3 1.2 Related Work . . . 4 1.3 Report Organization . . . 6 2 Lithium-ion Batteries 7 2.1 Li-ion Battery Degradation . . . 7
2.2 Dataset Generation . . . 9
2.3 Data Analysis . . . 14
2.4 Correlation between Cycle Life and Discharge Capacity . . . 18
3 Machine Learning 20 3.1 Machine Learning Problem . . . 21
3.2 Dataset Splitting . . . 21
3.2.2 Random Splitting . . . 22
3.3 Domain-Based Feature Generation . . . 24
3.4 The Benchmark Model . . . 35
3.5 The Machine Learning Models . . . 36
3.5.1 Elastic Net Model . . . 36
3.5.2 Random Forest Model . . . 36
3.6 Evaluation Metrics . . . 37
4 Results and Discussion 38 4.1 Feature Selection . . . 38
4.2 Model Evaluation and Validation . . . 41
4.2.1 Batch Splitting Results . . . 41
4.2.2 Random Splitting Results . . . 48
4.3 Discussion . . . 54
5 Conclusions and Future Work 56 5.1 Conclusions . . . 56
5.2 Future Work . . . 57
List of Tables
Table 2.1 A summary of cathode materials [2] . . . 8
Table 2.2 A summary of anode materials [2] . . . 9
Table 2.3 LFP battery profile . . . 10
Table 3.1 Domain-based features . . . 32
Table 4.1 Feature weights for the random forest model with batch splitting . 39 Table 4.2 Feature weights for the elastic net model with batch splitting . . . . 39
Table 4.3 Feature weights for the random forest model with random splitting 40 Table 4.4 Feature weights for the elastic net model with random splitting . . 41
Table 4.5 MPE for the training data with batch splitting . . . 42
Table 4.6 MPE for the validation data with batch splitting . . . 42
Table 4.7 MPE for the test data with batch splitting . . . 42
Table 4.8 RMSE for the training data with batch splitting . . . 47
Table 4.9 RMSE for the validation data with batch splitting . . . 48
Table 4.10 RMSE for the test data with batch splitting . . . 48
Table 4.11 MPE for the training data with random splitting . . . 48
Table 4.12 MPE for the test data with random splitting . . . 48
Table 4.13 RMSE for the training data with random splitting . . . 53
List of Figures
Figure 1.1 Hybrid powertrain system model. . . 1
Figure 1.2 The charge/discharge process in an Li-ion battery [1]. . . 3
Figure 1.3 An overview of early prediction of battery cycle life. . . 4
Figure 2.1 Arbin LBT battery testing cycler and environmental chamber [46]. 10 Figure 2.2 Fast-charging protocols, (a) charging C-rate versus SOC for the two-step charging protocol and (b) charging C-rate versus SOC for the one-step charging protocol. . . 12
Figure 2.3 Fast-charging protocols for (a) batch 1, (b) batch 2, and (c) batch 3. 14 Figure 2.4 Discharge capacity versus the number of cycles for a representa-tive battery. . . 15
Figure 2.5 Discharge capacity versus the number of cycles for the first 300 cycles. . . 16
Figure 2.6 Battery voltage and temperature for the (a), (c) charging process and (b), (d) discharging process. . . 17
Figure 2.7 Average temperatures over the first 300 cycles. . . 17
Figure 2.8 Correlation between the cycle life and discharge capacity at (a) cycle 100, (b) cycle 200, and (c) cycle 300. . . 19
Figure 3.1 The ML pipeline [47]. . . 20
Figure 3.2 Cycle life histogram with batch splitting. . . 23
Figure 3.3 Cycle life histogram with random splitting. . . 23
Figure 3.4 Discharge capacity versus discharge voltage for a representative battery, (a) discharge capacity for cycles 10 and 100 , (b) discharge capacity for cycles 10 and 200, and (c) discharge capacity for cy-cles 10 and 300. . . 26
Figure 3.5 ∆Q(V )versus discharge voltage for the 124 batteries, (a)∆Q100−10(V ), (b)∆Q200−10(V ), and (c)∆Q300−10(V ). . . 28
Figure 3.6 Correlation between the variance of∆Q(V )and cycle life on a log-log scale, (a) variance of∆Q100−10(V ), (b) variance of∆Q200−10(V ),
and (c) variance of∆Q300−10(V ). . . 30
Figure 3.7 ∆Q(V )versus the number of cycles for a representative battery, (a)∆Q100−10(V ), (b)∆Q200−10(V ), and (c)∆Q300−10(V ). . . 34
Figure 3.8 Discharge capacity versus the number of cycles after polynomial fitting. . . 35 Figure 4.1 Observed and predicted cycle lives for the benchmark model with
batch splitting for (a) the first 100 cycles, (b) the first 200 cycles, and (c) the first 300 cycles. The inset is the histogram of the resid-uals (observed cycle life - predicted cycle life) for the test data. . . 44 Figure 4.2 Observed and predicted cycle lives for the elastic net model with
batch splitting for (a) the first 100 cycles, (b) the first 200 cycles, and (c) the first 300 cycles. The inset is the histogram of the resid-uals (observed cycle life - predicted cycle life) for the test data. . . 45 Figure 4.3 Observed and predicted cycle lives for the random forest model
with batch splitting for (a) the first 100 cycles, (b) the first 200 cy-cles, and (c) the first 300 cycles. The inset is the histogram of the residuals (observed cycle life - predicted cycle life) for the test data. 47 Figure 4.4 Observed and predicted cycle lives for the benchmark model with
random splitting for (a) the first 100 cycles, (b) the first 200 cycles, and (c) the first 300 cycles. The inset is the histogram of the resid-uals (observed cycle life - predicted cycle life) for the test data. . . 50 Figure 4.5 Observed and predicted cycle lives for the elastic net model with
random splitting for (a) the first 100 cycles, (b) the first 200 cycles, and (c) the first 300 cycles. The inset is the histogram of the resid-uals (observed cycle life - predicted cycle life) for the test data. . . 51 Figure 4.6 Observed and predicted cycle lives for the random forest model
with random splitting for (a) the first 100 cycles, (b) the first 200 cycles, and (c) the first 300 cycles. The inset is the histogram of the residuals (observed cycle life - predicted cycle life) for the test data. . . 53
Acronyms
C-rate Current-Rate CC Constant CurrentCC-CV Constant Current-Constant Voltage DOD Depth of Discharge
ESS Energy Storage System EV Electrical Vehicle HEV Hybrid Electric Vehicle ICE Internal Combustion Engine LCO Lithium Cobalt Oxide
LFP Lithium Iron Phosphate
Li Lithium
Li-ion Lithium-ion
LTO Lithium Titanate Oxide ML Machine Learning MPE Mean Percent Error
NMC Lithium Nickel Manganese Cobalt Oxide RF Random Forest
ACKNOWLEDGEMENTS
First and foremost, I would like to thank my parents for their unconditional love and support for my dreams. They have always encouraged me to believe in myself since I was a child.
I would also like to take this opportunity to thank my supervisor, Dr. T. Aaron Gul-liver for his continuous support and guidance on my MEng study. Aaron is a very supportive and passionate supervisor. He always provided me with insightful advice throughout my educational career.
My heartful appreciation to committee member Dr. Wu-Sheng Lu for his amazing optimization courses. A special shoutout to Dr. Kwang Moo Yi for his immense knowl-edge in machine learning and deep learning.
I am grateful to realtor.com for providing me an opportunity to work as a software engineer intern, where I kicked off my career.
I thank my beloved husband Boyu Wang for his love and support throughout my studies, especially in the midst of COVID-19. I feel fortunate to work with my mentor Rich McCue from the digital scholarship commons team at UVic. Special thanks to Dr. Xiaochuan Pan and Dr. Yikun Zhang for being great life mentors.
Last but not the least, I thank my friends Ziwei Wang, Jack Liu, Ever Liu, Ziyi Gan, and Lily Chen for filling my life outside schooling.
DEDICATION
To my father, mother, auntie and husband for their support, love, and encouragement.
Introduction
Li-ion batteries are used in a wide range of applications such as hybrid powertrain systems as shown in Figure 1.1 due to their low cost, high-energy density, and long lifetime. Hybrid powertrain systems are an alternative to traditional electrical plants which allow operators more options for optimizing the configuration to best serve dif-ferent load profiles [1]. An example is hybrid propulsion systems in vessels. By driving the propellers using both internal combustion engines (ICEs) and electric motors, the ICEs can operate at a higher efficiency and store surplus energy in a rechargeable en-ergy storage system (ESS) [2]. An ESS provides the possibility to store enen-ergy for use at a later time [1]. Unlike conventional mechanical propulsion that only relies on ICEs as the power source, hybrid powertrain systems use a rechargeable battery ESS and elec-tric motors to partially replace the ICE [2]. Modern hybrid propulsion systems benefit from the incorporation of new technologies such as lithium-ion (Li-ion) batteries, fuel cells, and solar and wind power sources. These systems are used in hybrid electric ve-hicles (HEVs) and marine vessels [1].
Figure 1.1: Hybrid powertrain system model.
The cost and performance of Li-ion batteries are important considerations when designing a hybrid powetrain system. The cost of a Li-ion battery ESS includes invest-ment, operation, replacements, and recycling costs [3]. An Li-ion battery should be
replaced once it reaches the recommended cycle life. Cycle life is the number of cycles a battery goes through before the capacity of the battery falls below 80% of the nomi-nal capacity [4]. An accurate prediction of cycle life will aid in selecting the appropriate energy management and power control strategies for Li-ion batteries.
A well-known application of Li-batteries is in electronic products. Apple uses Li-ion batteries in their products because they charge faster, last longer, and have a higher power density compared with traditional batteries [5]. Apple uses fast charging proto-cols to quickly reach 80% of battery capacity, and then switches to slow charging [5]. Fast charging protocols use a high current input to accelerate the charging process, but this can increase battery degradation [6].
Battery degradation is inevitable in practical applications, and battery capacity de-creases with the number of charging and discharging cycles [7]. This is because of the irreversible electrochemical reactions inside an Li-ion battery [8]. Battery degradation under fast charging protocols cannot be captured by electrochemical models as this degradation is also due to battery materials [9]. There are four main components in an Li-ion battery, i.e., positive electrode (cathode), negative electrode (anode), con-ducting electrolyte, and separator as shown in Figure 1.2. An electrochemical reaction takes place to absorb energy when charging and release stored energy when discharg-ing. Figure 1.2 shows that when an Li-ion battery is charging, lithium (Li) ions (denoted as blue dots) move from the cathode through the separator to the anode. When an Li-ion battery is discharging, Li Li-ions (denoted as red dots) move from the anode to the cathode through the separator. This charging and discharging continues until the bat-tery reaches its cycle life. It was reported in [10] that Li-ion batteries degrade due to the increase in internal resistance. The capacity degradation and increase in internal resistance affect the performance of these batteries [11].
It is often very time consuming to evaluate the cycle life of an Li-ion battery ex-perimentally. Each cycle of an Li-ion battery test can take 5 hours [4], which means 100 cycles take approximately 21 days. Therefore, it is imperative to develop an early prediction model that can estimate the cycle life of Li-ion batteries.
Figure 1.2: The charge/discharge process in an Li-ion battery [1].
1.1
Objectives
Early prediction of battery cycle life can improve battery design. Cycle life prediction results can be used to optimize charging protocols and improve battery life [12]. An overview of the prediction procedure is shown in Figure 1.3. This procedure is as fol-lows. First, battery cycling test results are used to generate an Li-ion battery dataset. For this project, early cycle data was obtained for 124 Li-ion batteries with repeated charging and discharging for 300 cycles. This data is used as the input to an early pre-diction model which predicts the cycle life of the Li-ion batteries. The prepre-diction fea-tures include discharge capacity,∆Q(V )features, and physical measurements related to battery degradation (such as internal resistance). ∆Q(V )is the discharge capacity difference between the end and start cycles at a discharge voltage V.∆Q(V )features refer to parameters based on∆Q(V )such as the variance of∆Q(V ). Manufacturers
can utilize these prediction results during the battery design process.
Figure 1.3: An overview of early prediction of battery cycle life.
In order to accurately predict cycle life, the early prediction model is formulated as a machine learning (ML) problem. Three ML models are considered, the benchmark model, the elastic net model, and the random forest model. Furthermore, two met-rics are used to evaluate the model performance. The objectives of this project are as follows.
• Obtain an Li-ion battery dataset created using different fast charging protocols. • Propose domain-based features that capture battery degradation.
• Develop three ML models to predict the cycle life using the domain-based fea-tures for the first 300 cycles.
• Implement two metrics to verify the performance of the three ML models.
1.2
Related Work
Previous research has developed four types of models for Li-ion battery cycle life: em-pirical models (mathematical equations and equivalent circuit models), electrochem-ical models (Doyle-Fuller-Newman model [13]), multi-physics models (such as elec-trochemical thermal modelling [14]), and molecular and atomistic models [15]–[18]. Empirical models have no physical meaning [15]. Electrochemical models have high computational costs due to the need to solve partial differential equations and the large number of unknown parameters [19]. The high level of detail with multi-physics and
molecular models results in high computational complexity so they are generally not an option for real-time prediction [15], [16]. In [20] and [21], semi-empirical models were obtained to predict capacity loss and battery lifetime. Since then, physical and semi-empirical models have been developed that concentrate on battery degradation such as lithium plating [22] and increasing impedance [23]. Semi-empirical and mech-anistic models are widely used for lifetime estimation [24]. Though semi-empirical models have achieved good cycle life prediction, it remains challenging to develop models that can capture Li-ion battery behavior cycled under fast charging conditions given the nonlinear battery degradation behavior [25], [26]. Fast charging conditions with high charging currents can cause localized overcharging to occur, leading to de-composition of the electrolyte and a significant increase in the internal resistance [27]. With recent developments in computational power and data generation on Li-ion batteries, ML approaches to predict cycle life are becoming popular as they do not in-volve complex physical models [27]. Data-driven approaches extract useful features from data to model battery degradation [28]. In [29] a modified Elman neural network was proposed to predict the remaining useful life of Li-ion batteries. Battery degra-dation parameters were combined with a support vector regression-particle filter in [30] to predict the cycle life. Generally, data-driven approaches use prior knowledge on battery degradation [31]–[34]. However, accurate prediction using early-cycle data without prior knowledge of battery degradation is challenging due to the nonlinear degradation process [12]. A weak correlation (ρ = 0.1) between the remaining capacity at cycles 80 and 500 for 24 Li-ion batteries was found in [35], indicating the difficulty in predicting battery cycle life.
This work introduces domain-based features to build early-prediction models with three ML models: a linear model withL1and L2norm regularization (elastic net), a
tree-based model (random forest), and a linear model with L1 norm regularization
(benchmark). The 124 Li-ion batteries used in this study were tested under 72 fast charging protocols [4]. Domain-based features are obtained which can be divided into three categories: ∆Q(V )features, discharge capacity, and physical measurements re-lated to battery degradation (such as internal resistance). These features are extracted from the first 300 cycles and used as input to the three ML models. Two metrics are used to verify model performance.
1.3
Report Organization
This report is organized as follows.
Chapter 1 provided a brief overview of cycle life prediction for Li-ion batteries. The motivation of the project and related work were also discussed.
Chapter 2 describes Li-ion battery degradation. The factors contributing to battery degradation are discussed, and the definition of cycle life is given. A visualization of the Li-ion battery dataset is provided.
Chapter 3 introduces the ML problem and the domain-based features are given. A brief description of the ML models used in this project is provided.
Chapter 4 evaluates the usefulness of the features with the ML models. The model performance is assessed using the root mean squared error (RMSE) and mean percent error (MPE), and the results are discussed.
Chapter 2
Lithium-ion Batteries
Li-ion batteries play a vital role in transportation and vehicle electrification because of their low cost, high-energy density, and long lifetime. In this chapter, Li-ion bat-tery degradation and the corresponding factors are discussed. In order to understand the characteristics of Li-ion batteries, data for 124 lithium iron phosphate (LFP) A123 APE1860MIA cells under 72 fast charging conditions is analyzed.
2.1
Li-ion Battery Degradation
The charge and discharge process in an Li-ion battery can be formulated as [36]
Li++ e−+ θLi − θ (2.1)
whereθis the material,Li − θis the lithium inserted inθ,e−is an electron, andLi+is
a lithium ion. The types of materials used for the anode and cathode can have a sig-nificant effect on the performance and degradation of an Li-ion battery. Therefore, a review of cathode and anode materials is given in this section. Then, the factors con-tributing to battery degradation are discussed.
Cathode materials in Li-ion batteries
Lithium metal oxide is used for the cathode in Li-ion batteries. LFP, Lithium Cobalt Oxide (LCO), and Lithium Nickel Manganese Cobalt Oxide (NMC) are commonly used in cathode materials. Specific capacity is a measure of how much energy a battery contains in comparison to its weight, which is expressed in milliamps-hours per gram
(mAh/g) [37]. Specific capacity, voltage, safety, cost, and lifetime are commonly con-sidered in choosing cathode materials [38], [39].
LCO is one of the earliest cathode materials used due to its layered structure and large specific capacity (274 mAh/g). However, the shortcomings of LCO include poor thermal stability because of its layered structure and high cost especially when it is used on a large scale [2]. In comparison, LFP provides greater safety with high thermal sta-bility because of the strong P-O covalent bond in phosphate. However, the major con-cern with LFP is the low specific capacity compared with LCO. NMC is a nickel-based material with a layered structure. Therefore, NMC has high specific capacity and long lifetime compared to cobalt-based materials such as LCO. However, it was found in [40] that NMC batteries only work well with a low charge/discharge rate at low temper-atures (−20◦C). Compared with LCO and NCO, LFP is safer under high temperatures
and has a longer lifetime, as illustrated in Table 2.1. LFP batteries now dominate the HEV and electrical vehicle (EV) markets because of their attributes.
Table 2.1: A summary of cathode materials [2]
LCO LFP NMC
Theoretical specific
capacity (mAh/g) 274 170 280
Advantages large specific capacity good thermal stabilitylong lifetime high specific capacitylong lifetime Disadvantages poor thermal stabilityhigh cost low specific capacity poor thermal stability
Anode materials in Li-ion batteries
The two commonly used anode materials are carbon and lithium alloy metals. Carbon-based materials such as graphite are used because of their low cost, abundant availabil-ity, and stable cycling performance [41]. The main drawback of graphite is low specific capacity (372 mAh/g) which is much less than that of lithium metal (3860 mAh/g).
Among the lithium alloy metals used in anodes, lithium titanate oxide (LTO) is com-monly used. LTO has a long lifetime because the change in cell volume is less than 0.2% during the charge/discharge process [42]. Compared with graphite, LTO has a lower specific capacity (175 mAh/g), lower battery voltage, and high cost.
Lithium metal is a promising anode material because of its high specific capacity [43]. Lithium metal inherits the light metal attribute from Li and has a high specific
capacity of 3861 mAh/g [43]. However, it is expensive to produce [36]. Table 2.2 sum-marizes the attributes of anode materials that are commonly used in Li-ion batteries.
Table 2.2: A summary of anode materials [2]
Graphite LTO Lithium metal
Theoretical specific
capacity (mAh/g) 372 175 3861
Advantages abundant availabilitylow cost
stable cycling performance long lifetime
high specific capacity high voltage Disadvantages low specific capacity low specific capacitylow voltage
high cost high cost
An understanding of the factors that contribute to battery degradation is needed to predict the cycle life. Battery degradation starts after the first charge process. The four main factors that affect this degradation are temperature, operation time, current rate (C-rate), and depth of discharge (DOD) [3]. Temperature refers to the temperature at which the battery operates. Battery operation in high temperatures can damage the structure of cathode materials [44]. Anode materials such as lithium metal have poor performance in cold environments. Both low and high temperature can accelerate bat-tery degradation [20]. The operation time is the sum of the charge and discharge times and is related to the charge and discharge C-rates. C-rate is a measure of the rate at which a battery is discharged relative to its maximum capacity. For example, 1 C in-dicates that the battery is fully charged/discharged in 1 hour and 5 C means that the charge/discharge time is 1/5 of an hour. Under high C-rate conditions, Li ions quickly accumulate on the anode side. DOD is battery capacity that has been discharged ex-pressed as a percentage of maximum capacity. A high DOD means that much of the en-ergy in an Li-ion battery has been consumed and this can contribute to battery degra-dation [20].
2.2
Dataset Generation
The dataset used in this study is available online [4]. To generate this dataset, 124 LFP batteries were cycled under 72 different fast charging protocols with varying charg-ing C-rates. An LFP battery is a type of Li-ion battery uscharg-ing LFP as the cathode and graphite as the anode [45]. Table 2.3 gives the characteristics of the LFP batteries that were tested.
The LFP batteries were placed in an environmental chamber with the ambient tem-perature maintained at 30◦C. A 48-channel Arbin LBT battery testing cycler was
con-nected to these batteries to control the charging and discharging as shown in Figure 2.1. Note that batteries generate heat during charging and discharging, which results in a higher battery temperature than the ambient temperature.
Table 2.3: LFP battery profile
Nominal voltage 3.3 V
Nominal capacity 1.1 Ah
Manufacturer recommended
fast charging protocol 3.6 C constant current-constant voltage(CC-CV)
Type A123 APR18650M1A cell
The LFP batteries were cycled using 72 different fast charging protocols which can be classified into two categories, two-step and one-step. The number of steps denotes the steps a battery takes to charge from 0% state of charge (SOC) to 80% SOC. SOC indicates the level of charge of a battery relative to its nominal capacity. Figures 2.2a and 2.2b show examples of two-step and one-step fast charging protocols, respectively. In these figures, CC1 denotes the first constant current (CC) step and CC2 denotes the second CC step. CC-CV denotes constant current-constant voltage. In Figure 2.2a, the battery is first charged at a constant C-rate (6 C) until the SOC reaches 50%. Then, it is charged at 4 C until the SOC reaches 80%. These two steps take 660 s. The last step consists of first charging at 1 C and then at a constant voltage of 3.6 V until the SOC reaches 100%. The nominal voltage of the battery is 3.6 V as shown in Table 2.3. The one-step charging protocol maintains a charging C-rate of 5 C until the SOC reaches 80% as shown in Figure 2.2b. The rest of the charging is the same as two-step charging.
0 20 40 60 80 100
State of Charge (SOC %) 0 2 4 6 8 10 C-rate CC1 CC2 CC-CV (a)
0 20 40 60 80 100 State of Charge (SOC %)
0 2 4 6 8 10 C-rate CC1 CC-CV (b)
Figure 2.2: Fast-charging protocols, (a) charging C-rate versus SOC for the two-step charging protocol and (b) charging C-rate versus SOC for the one-step charging
protocol.
The 124 LFP batteries were divided into three batches and tested. Batch 1 had 41 LFP batteries and was cycled on May 12, 2017 as shown in Figure 2.3a. Batch 2 had 43 LFP batteries and was tested on June 30, 2017 as shown in Figure 2.3b. Batch 3 had 40 LFP batteries and was tested on April 12, 2018 as shown in Figure 2.3c. The x axis shows the fast charging protocols while the y axis shows the number of batteries tested. The information about the fast charging protocols is summarized at the bottom of the figures. For example, 3.6C(80%)-3.6C denotes one-step charging with a charging C-rate of 3.6 C until 80% SOC, and 5.4C(40%)-3C denotes two-step charging first with a C-rate of 5.4 C from 0% SOC to 40% SOC, and then 3 C until 80% SOC.
3.6C(80%)-3.6C 4C(80%)-4C 4.4C(80%)-4.4C4.8C(80%)-4.8C5.4C(40%)-3.6C5.4C(50%)-3C5.4C(60%)-3C5.4C(60%)-3.6C5.4C(70%)-3C5.4C(80%)-5.4C6C(30%)-3.6C 6C(40%)-3C 6C(40%)-3.6C6C(50%)-3C6C(50%)-3.6C6C(60%)-3C7C(30%)-3.6C7C(40%)-3C7C(40%)-3.6C8C(15%)-3.6C8C(25%)-3.6C8C(35%)-3.6C Fast-charging Protocol 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Number of Batteries (a) 1C(4%)-6C2C(10%)-6C2C(2%)-5C 2C(7%)-5.5C 3.6C(22%)-5.5C3.6C(2%)-4.85C3.6C(30%)-6C 3.6C(9%)-5C4C(13%)-5C4C(31%)-54C(40%)-6C4C(4%)-4.85C4.4C(24%)-5C 4.4C(47%)-5.5C4.4C(55%)-6C4.4C(8%)-4.85C 4.65C(19%)-4.85C 4.65C(44%)-5C4.65C(69%)-6C4.8C(80%)-4.8C 4.9C(27%)-4.75C4.9C(61%)-4.5C4.9C(69%)-4.25C5.2C(10%)-4.75C5.2C(37%)-4.5C5.2C(50%)-4.25C 5.2C(58%)-4C 5.2C(66%)-3.5C5.2C(71%)-3C5.6C(25%)-4.5C5.6C(38%)-4.25C5.6C(47%)-4C5.6C(58%)-3.5C5.6C(5%)-4.75C5.6C(65%)-3C6C(20%)-4.5C6C(31%)-4.25C 6C(40%)-4C 6C(4%)-4.75C6C(52%)-3.5C6C(60%)-3C Fast-charging Protocol 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Number of Batteries (b)
5C(67%)-4C 5.3C(54%)-4C 5.6C(36%)-4.3C 5.6C(19%)-4.6C 3.7C(31%)-5.9C 4.8C(80%)-4.8C 5.9C(15%)-4.6C 5.9C(60%)-3.1C Fast-charging Protocol 0 1 2 3 4 5 6 7 8 Number of Batteries (c)
Figure 2.3: Fast-charging protocols for (a) batch 1, (b) batch 2, and (c) batch 3.
2.3
Data Analysis
A battery cycle life test measures battery performance degradation during periodic cy-cling [2]. In these tests, voltage, C-rate, internal resistance, discharge capacity, cell can temperature, and cycle life are measured. Discharge capacity indicates the amount of energy a battery has lost during discharging. The nominal discharge capacity of the batteries in this study is 1.1 Ah, so if the capacity drops below 0.88 Ah, it has reached the end of its cycle life. Figure 2.4 shows the capacity degradation of a battery over a number of cycles. The cycle life of the battery is 636 cycles. The cycle life of all LFP batteries used in this study ranges from 150 to 2300 cycles. The average and standard deviation of the cycle life is 798 and 371, respectively.
100 200 300 400 500 600 Number of Cycles 0.900 0.925 0.950 0.975 1.000 1.025 1.050 1.075 1.100
Discharge Capacity (Ah) Cycle Life
Figure 2.4: Discharge capacity versus the number of cycles for a representative battery.
Figure 2.5 shows the discharge capacity as a function of the number of cycles for the first 300 cycles of the 124 LFP batteries. In this work, three groups of early-cycle data are used for modelling, namely the first 100, 200, and 300 cycles. The starting point of all LFP batteries is cycle 2. Thus, the first 100 cycles refers to cycle data from cycle 2 to cycle 100, the first 200 cycles from cycle 2 to cycle 200, and the first 300 cycles from cycle 2 to cycle 300. Figure 2.5 also shows that there are outliers in the data and these will be considered in Section 3.3.
Figure 2.5: Discharge capacity versus the number of cycles for the first 300 cycles.
The voltage and cell can temperature under charge and discharge for the 124 LFP batteries are illustrated in Figure 2.6. A darker color represents a higher C-rate. The or-ange lines denote charging and the blue lines denote discharging. Figures 2.6a and 2.6b show that a higher charging/discharging C-rate results in a more rapid increase/decrease in battery voltage. Figures 2.6c and 2.6d show the change in cell can temperature for different charging/discharging C-rates. A higher charging/discharging C-rate causes a more rapid increase in cell can temperature. Under the same charging and discharging C-rate, the cell can temperature increase due to discharging is higher than that due to charging. Figure 2.7 shows that the average cell can temperature of the 124 LFP batter-ies ranges from 29◦C to 39◦C
0.0 0.2 0.4 0.6 0.8 1.0 Capacity (Ah) 2.0 2.5 3.0 3.5 Voltage (V) 1C 2C 3C 4C 5C 6C 7C 8C (a) 0.0 0.2 0.4 0.6 0.8 1.0 Capacity (Ah) 2.0 2.5 3.0 3.5 Voltage (V) 2C 4C 6C 8C 10C 12C (b) 0.0 0.2 0.4 0.6 0.8 1.0 Capacity (Ah) 30 32 34 36 38 40 Cell Can T emperature (°C) 1C 2C 3C 4C 5C 6C 7C 8C (c) 0.0 0.2 0.4 0.6 0.8 1.0 Capacity (Ah) 35 40 45 50 55 Cell Can T emperature (°C) 2C 4C 6C 8C 10C 12C (d)
Figure 2.6: Battery voltage and temperature for the (a), (c) charging process and (b), (d) discharging process. 50 100 150 200 250 300 Number of Cycles 29 30 31 32 33 34 35 36 37 38 39 Average T emperature (°C)
2.4
Correlation between Cycle Life and Discharge
Ca-pacity
The Pearson correlation coefficient is used to determine the correlation between cycle life and discharge capacity and is given by
ρX ,Y =
cov(X , Y )
σXσY (2.2)
whereX is the discharge capacity, Y is the cycle life,covis the covariance,σX is the
standard deviation of the discharge capacity, andσY is the standard deviation of the cycle life. The range ofρX ,Y is from -1 to +1.
Figure 2.8 gives the Pearson correlation coefficientρfor the 124 batteries for the first 100, 200, and 300 cycles. In addition, the colored dots denote the cycle life which ranges from cycles 150 to 2300. These results indicate a weak correlation between cycle life and discharge capacity. The Pearson correlation coefficient is 0.32 at cycle 100, 0.41 at cycle 200, and 0.46 at cycle 300. It is reasonable that a weak relationship exists in the first 300 cycles because capacity degradation can be negligible during these early cycles. In fact, the capacity of some batteries at cycle 100 shows a slight increase compared to the initial capacity [4]. Thus, only using the discharge capacity is insufficient to predict the cycle life of an Li-ion battery.
In order to predict the lifetime of Li-ion batteries, relevant features related to battery degradation should be considered such as cell can temperature and internal resistance. Based on these domain-based features, early prediction models for predicting of the cycle life of Li-ion batteries are introduced in Chapter 3.
1.0 1.1 Discharge Capacity (Ah)
102 103 Cycle Life = 0.32 250 500 750 1000 1250 1500 1750 2000 2250 (a) 1.0 1.1
Discharge Capacity (Ah)
102 103 Cycle Life = 0.41 250 500 750 1000 1250 1500 1750 2000 2250 (b) 1.0 1.1
Discharge Capacity (Ah)
102 103 Cycle Life = 0.46 250 500 750 1000 1250 1500 1750 2000 2250 (c)
Figure 2.8: Correlation between the cycle life and discharge capacity at (a) cycle 100, (b) cycle 200, and (c) cycle 300.
Chapter 3
Machine Learning
The ML pipeline consists of five steps as shown in Figure 3.1. The first step is prepar-ing the data which involves dataset generation and cleanprepar-ing. The next step is feature engineering which is the process of using domain-based knowledge to create features for the ML model [47]. Domain-based features fall into three categories: ∆Q(V ) fea-tures, discharge capacity, and physical measurements related to battery degradation. Feature selection is the process of choosing an appropriate subset of these features. The development of an early prediction model for cycle life can be regarded as a re-gression problem which belongs to the field of supervised ML. The ML model is then built, trained, and validated. Three ML models are considered, namely a linear model withL1andL2norm regularization (elastic net), a tree-based model (random forest),
and a linear model withL1norm regularization (benchmark). Two metrics are used to
evaluate model performance.
3.1
Machine Learning Problem
Battery cycle life can be formulated as
ˆ
yi = ˆwTxi (3.1)
whereyˆi is the model prediction of the logarithm of the cycle life of batteryi, xi is the m-dimensional of batteryi feature vector, andwˆ is them-dimensional weights vector.
The reason for choosing the logarithm of the cycle life is that battery degradation is a nonlinear process. Overfitting is a common problem in ML which refers to fitting the training data more than is warranted [48]. Insufficient data size and high model complexity are two common causes of overfitting [49]. Regularization is employed in the training process to address the overfitting problem.
To get the best weights wˆ, least-squares optimization with regularization is
em-ployed
ˆ
w= argminw||y−Xw||22+ λH(w) (3.2)
where y is then-dimensional observed battery cycle life vector, X is then × mmatrix
of features, w is them-dimensional weights vector,λis a scalar hyperparameter with a
non-negative value, andH (w)is the regularization term. Two common regularization
terms are theL1norm
H (w) = ||w||1 (3.3)
and the elastic net regularization
H (w) =1 − α
2 ||w||
2
2+ α||w||1 (3.4)
whereαis a scalar hyperparameter with0 ≤ α ≤ 1.
3.2
Dataset Splitting
3.2.1
Batch Splitting
The dataset for the 124 LFP batteries was divided into training, validation, and test datasets as illustrated in Figure 3.2. The batch splitting used here is the same as in [4]. There are three batches in the dataset as mentioned in Section 2.2. Batch 1 with 41 LFP batteries is the training dataset, batch 2 with 43 LFP batteries is the validation dataset,
and batch 3 with 40 LFP batteries is the test dataset. The training dataset is used to train model features and determine the model weights. The validation dataset is used to select the best hyperparameters for the model. The test dataset is used to evaluate the model performance.
3.2.2
Random Splitting
Batch splitting is not commonly used for dataset splitting. The drawbacks of batch splitting are the limited sizes and the training dataset only comes from one batch. An-other approach is random splitting using a splitting ratio. Here, the dataset for the 124 LFP batteries was randomly divided into training and test datasets with a splitting ratio of 0.8. The training dataset then has 99 batteries and the test dataset has 25 batteries. In addition, 5-fold cross validation is employed to tune the hyperparameters during training so a validation dataset is not required. Cross validation is a resampling pro-cess to evaluate the model performance on a limited amount of data [50].
Figures 3.2 and 3.3 show the cycle life distribution of the batteries with batch split-ting and random splitsplit-ting, respectively. The colors indicate the number of batteries used for training, validation, and test. For example, in Figure 3.2, 33 batteries have a cycle life of 500. Among these batteries, 15, 16, and 2 batteries are used for training, validation, and test and are shown in orange, blue and green, respectively. Note that data for some cycle life values such as 2000 cycles only exist in the test dataset. This is done so that generalization of the model can be evaluated.
500 1000 1500 2000
Cycle Life
0 5 10 15 20 25 30Nu
mb
er
of
Ba
tte
rie
s
train
valid
test
Figure 3.2: Cycle life histogram with batch splitting.
500 1000 1500 2000
Cycle Life
0 5 10 15 20 25 30Nu
mb
er
of
Ba
tte
rie
s
train
test
3.3
Domain-Based Feature Generation
This section introduces the generation of domabased features that serve as the in-puts to the model. In Section 2.4, the discharge capacity of the first 300 cycles was shown to provide a weak correlation with the cycle life. Therefore, additional features are used as inputs and they can be divided into two categories, physical measurements
and∆Q(V )features. Physical measurements include cell can temperature, operation
time, current rate, and internal resistance. ∆Q(V )is the discharge capacity difference between the end and start cycles at a discharge voltage V. The end cycle is cycle 100, 200, or 300. The start cycle is cycle 10 as in [4].
Parameters based on∆Q(V )such as the variances have been used for battery degra-dation diagnosis [51]–[53]. The parameters defined as
∆Q100−10(V ) = Q100(V ) −Q10(V )
∆Q200−10(V ) = Q200(V ) −Q10(V )
∆Q300−10(V ) = Q300(V ) −Q10(V )
(3.5)
are considered where∆Q100−10(V )is the discharge capacity difference between cycles
100 and 10,Q10(V )is the discharge capacity at cycle 10,Q100(V )is the discharge
capac-ity at cycle 100, ∆Q200−10(V ) is the discharge capacity difference between cycles 200
and 10,Q200(V )is the discharge capacity at cycle 200,Q300−10(V )is the discharge
ca-pacity difference between cycles 300 and 10, andQ300(V )is the discharge voltage at
cycle 300, and V is the discharge voltage.
Figure 3.4 shows the discharge capacity versus discharge voltage between cycles 100, 200, 300, and cycle 10 for a representative battery. The discharge voltage range is 2.0 V to 3.6 V. In Figure 3.4, the x axis (discharge capacity) is the dependent variable and the y axis (discharge voltage) is the independent variable, which follows the convention in [4]. For a given discharge voltage, the discharge capacity decreases from cycles 10 to 100 as shown in Figure 3.4a. The discharge capacity decrease from cycles 10 to 200 is shown in Figure 3.4b and the discharge capacity decrease from cycles 10 to 300 is shown in Figure 3.4c.
(a)
(c)
Figure 3.4: Discharge capacity versus discharge voltage for a representative battery, (a) discharge capacity for cycles 10 and 100 , (b) discharge capacity for cycles 10 and
200, and (c) discharge capacity for cycles 10 and 300.
Figure 3.5 shows the discharge capacity difference versus discharge voltage for the 124 batteries. Each line represents the discharge capacity difference of a battery. In Figure 3.5, the x axis (∆Q100−10(V ), ∆Q200−10(V ), and ∆Q300−10(V ) ) is the dependent
variable and the y axis (discharge voltage) is the independent variable, which follows the convention in [4]. Figure 3.5a shows the discharge capacity difference between cycles 100 and 10. Figure 3.5b shows the discharge capacity difference between cycles 200 and 10. Figure 3.5c shows the discharge capacity difference between cycles 300 and 10. Note that negative values of∆Q100−10(V ), ∆Q200−10(V ), and ∆Q300−10(V )are
expected because the discharge capacity at cycle 10 is close to the maximum discharge capacity.
(a)
(c)
Figure 3.5:∆Q(V )versus discharge voltage for the 124 batteries, (a)∆Q100−10(V ), (b)
∆Q200−10(V ), and (c)∆Q300−10(V ).
A strong correlation between cycle life and the variance of∆Q(V )on a log-log scale is shown in Figures 3.6a, 3.6b, and 3.6c for the 124 batteries. The colored dots de-note the cycle life of the batteries which range from 150 to 2300. The Pearson corre-lation coefficientρis -0.93 forvar(∆Q100−10(V )), -0.96 forvar(∆Q200−10(V )), and -0.96
forvar(∆Q300−10(V )). These high correlation coefficients indicate that features based
(a)
(c)
Figure 3.6: Correlation between the variance of∆Q(V )and cycle life on a log-log scale, (a) variance of∆Q100−10(V ), (b) variance of∆Q200−10(V ), and (c) variance of
∆Q300−10(V ).
The domain-based features for the three categories, namely physical measurements, discharge capacity, and ∆Q(V ), are given in Table 3.1. Features based on∆Q(V )are the statistical parameters: variance, minimum, maximum, and mean of∆Q100−10(V ),
∆Q200−10(V )and∆Q300−10(V ). The physical measurements used are the average
oper-ation time, the temperature integral, the minimum internal resistance, and the differ-ence in internal resistance. The average operation time is defined as
average_operation_time= 1 n − 1 n X i =2 charging_timei (3.6)
wherei is the cycle number, charging_timei is the charging time at cyclei, andn =
100, 200,or 300. The temperature integral is defined as
temperature_integral=
Z tn
t2
wheret2is the initial temperature,tnis the temperature at cyclen, andT (t )is the
tem-perature between cycle 2 and cyclen. The minimum internal resistance is defined as
minimum_internal_resistance= min
i I R(i ) (3.8)
whereI R(i )is the internal resistance at cyclei with ranges from cycles 2 to 100, cycles
2 to 200, and cycles 2 to 300. The difference in internal resistance is defined as
difference_in_internal_resistance= I R(n) − I R(2) (3.9)
whereI R(n)is the internal resistance at cyclen = 100,200, or300andI R(2)is the initial
internal resistance. These four physical measurement features have been shown to be related to cycle life [4].
Table 3.1: Domain-based features
100 cycles 200 cycles 300 cycles
∆Q(V ) features variance of ∆Q100−10(V ) variance of ∆Q200−10(V ) variance of ∆Q300−10(V ) minimum of ∆Q100−10(V ) minimum of ∆Q200−10(V ) minimum of ∆Q300−10(V ) maximum of ∆Q100−10(V ) maximum of ∆Q200−10(V ) maximum of ∆Q300−10(V )
mean of∆Q100−10(V ) mean of∆Q200−10(V ) mean of∆Q300−10(V )
discharge capacity
[4]
slope of the linear fit to curve for cycles 2 to 100
slope of the linear fit to curve for cycles 2 to 200
slope of the linear fit to curve for cycles 2 to 300 intercept of the
linear fit to curve for cycles 2 to 100
intercept of the linear fit to curve for cycles 2 to 200
intercept of the linear fit to curve for cycles 2 to 300 slope of the
linear fit to curve for cycles 91 to 100
slope of the linear fit to curve for cycles 191 to 200
slope of the linear fit to curve for cycles 291 to 300 intercept of the
linear fit to curve for cycles 91 to 100
intercept of the linear fit to curve for cycles 191 to 200
intercept of the linear fit to curve for cycles 291 to 300 discharge capacity
for cycle 2 discharge capacityfor cycle 2 discharge capacityfor cycle 2 discharge capacity
for cycle 100 discharge capacityfor cycle 200 discharge capacityfor cycle 300
physical measurements
average
operation time operation timeaverage operation timeaverage the temperature integral for cycles 2 to 100 the temperature integral for cycles 2 to 200 the temperature integral for cycles 2 to 300 minimum internal resistance for cycles 2 to 100 minimum internal resistance for cycles 2 to 200 minimum internal resistance for cycles 2 to 300 difference in internal resistance between cycle 2 and cycle 100 difference in internal resistance between cycle 2 and cycle 200 difference in internal resistance between cycle 2 and cycle 300
Figure 3.7 shows∆Q(V )versus the number of cycles of a representative battery be-fore polynomial fitting (blue lines) and after polynomial fitting (orange lines) using a fourth degree polynomial fitting to smooth discrepancies. These discrepancies are due to temperature variations [4]. Figure 3.7a indicates a linear relationship between the number of cycles and∆Q100−10(V ). A linear relationship also emerges in Figure 3.7b
(cycles 2 to 200) and Figure 3.7c (cycles 2 to 300). Figure 3.8 shows discharge capacity versus the number of cycles with fourth degree polynomial fitting for all 124 batteries.
(b)
(c)
Figure 3.7:∆Q(V )versus the number of cycles for a representative battery, (a)
50 100 150 200 250 300 Number of Cycles 0.900 0.925 0.950 0.975 1.000 1.025 1.050 1.075 1.100
Discharge Capacity (Ah)
Figure 3.8: Discharge capacity versus the number of cycles after polynomial fitting.
3.4
The Benchmark Model
The benchmark model [4] only uses the log variance of∆Q(V )feature for prediction. As discussed in Section 3.3, the variance of∆Q(V )has a strong correlation with cycle life. It was found that the log variance of∆Q100−10(V )can be used to predict cycle life
[4]. The best weightswˆ for the benchmark model are obtained as
ˆ
w= argminw||y−Xw||22+ λ||w||1 (3.10)
where y is then-dimensional logarithm of the observed battery cycle life vector and X
is then × m matrix of the log variance of∆Q100−10(V ),∆Q200−10(V ), and∆Q300−10(V ). L1norm regularization means that one of the features will be selected and the
oth-ers will be ignored [9]. For batch splitting, grid search is used for hyperparameter tun-ing with the validation dataset. In the scikit-learn Python implementation of the bench-mark model, stochastic gradient descent is employed with grid search to optimizeλ. A value ofλ = 0.01was obtained for this model using the data for 100, 200, and 300 cy-cles. For random splitting, the scikit-learn Python implementation of the benchmark model employs stochastic gradient descent with 5-fold cross validation to optimizeλ. A value ofλ = 0.001was obtained for this model using the data for 100, 200, and 300 cycles.
3.5
The Machine Learning Models
The 14 domain-based features in Table 3.1 were considered to train the elastic net and random forest models. However, a small set of features can help to prevent overfitting by eliminating correlated features. Therefore, a subset of 9 features are selected in the elastic net and random forest models.
3.5.1
Elastic Net Model
The best weightswˆ for the elastic net model are obtained as
ˆ w= argminw||y−Xw||22+ λ( 1 − α 2 ||w|| 2 2+ α||w||1) (3.11)
where y is then-dimensional observed battery cycle life vector, X is then × mmatrix
of features,λis a scalar hyperparameter with non-negative value, andαis a scalar hy-perparameter with non-negative value.
The elastic net model combines the benefits of theL1andL2norms. The advantage
of theL1norm lies in feature selection. Adding theL2norm can improve model
perfor-mance [54]. For batch splitting, grid search is used for hyperparameter tuning with the validation dataset. In the scikit-learn Python implementation of the elastic net model, stochastic gradient descent with grid search is employed to optimizeλandα. The val-ues obtained areλ = 0.001andα = 0.001using the data for 100 cycles,λ = 0.072and
α = 0.001using the data for 200 cycles, andλ = 0.100andα = 0.001using the data for 300 cycles. For random splitting, the scikit-learn Python implementation of the elastic net model employs stochastic gradient descent with 5-fold cross validation to optimize
λandα. The values obtained areλ = 0.0005andα = 0.0001using the data for 100 cy-cles,λ = 0.009andα = 0.0001using the data for 200 cycles, andλ = 0.008andα = 0.0001
using the data for 300 cycles.
3.5.2
Random Forest Model
The random forest model averages the results from a number of decision trees and av-erages the predictions of the trained decision tree [9]. For batch splitting, grid search is used for hyperparameter tuning with the validation dataset. In the scikit-learn Python implementation of the random forest model, stochastic gradient descent with grid search is employed to optimize the number of treesnand the maximum depth of treesd. The
values obtained aren = 5andd = 100using the data for 100 cycles,n = 20andd = 1000
using the data for 200 cycles, andn = 20andd = 100using the data for 300 cycles. For
random splitting, the scikit-learn Python implementation of the random forest model employs stochastic gradient descent with 5-fold cross validation to optimizen andd.
The values obtained aren = 1000andd = 20using the data for 100 cycles, andn = 100
andd = 20using the data for 200 cycles and 300 cycles.
3.6
Evaluation Metrics
Two metrics are used to evaluate model performance, namely root mean squared error (RMSE) and mean percent error (MPE). They are used to measure how far the predicted cycle life is from the observed cycle life. The RMSE is defined as
RMSE= s 1 n n X i =1 (yi− ˆyi)2 (3.12)
whereyi is the observed cycle life of batteryi, yiˆ is the corresponding predicted cycle
life of batteryi, andnis the number of batteries. A lower RMSE indicates better model
performance. The MPE is given by
MPE= 1 n n X i =1 yi− ˆyi yi × 100 (3.13)
whereyiis the observed cycle life of batteryi,yˆiis is the corresponding predicted cycle
life of batteryi, andnis the number of batteries. The MPE is the average of the
differ-ences between the observed cycle life and predicted cycle life. A lower MPE indicates better model performance.
Chapter 4
Results and Discussion
This chapter presents the prediction of cycle life with the three ML models. The bench-mark model only uses the log variance of∆Q(V )feature. The elastic net and random forest models use a subset of features as will be discussed in Section 4.1. The evaluation metrics are used to measure model performance in Section 4.2.
4.1
Feature Selection
Nine of the domain-based features in Table 3.1 are used with the elastic net and random forest models based on [4], [9]. These features are discharge capacity cycle 2 (Q_cycle2), log minimum of∆Q(V )(log_DeltaQ_min), log variance of∆Q(V )(log_DeltaQ_var), log intercept of the linear fit to discharge capacity (log_Qlinfit_int), log slope of the lin-ear fit to discharge capacity (log_Qlinfit_slope), log minimum of internal resistance (log_IR_min), log difference in internal resistance (log_IR_diff), average operation time (charging_time), and the temperature integral (Tintegral).
Table 4.1 gives the feature weights for 100, 200, and 300 cycles for the random forest model with batch splitting. A higher value indicates the feature is more important. The log_DeltaQ_min feature weight increases from 26.5 using the data for 100 cycles to 30.7 using the data for 300 cycles. The log_DeltaQ_var feature weight also increase from 25.1 using the data for 100 cycles to 28.4 using the data for 300 cycles. These two features related to∆Q(V )have the highest weights which means they have a greater effect on the prediction. The log_Qlinfit_int feature weight increases from 4.1 using the data for 100 cycles to 9.6 using the data for 300 cycles while the log_Qlinfit_slope feature weight decreases from 2.9 using the data for 100 cycles to 1.4 using the data for 300 cycles. For
features related to physical measurements, the charging_time feature weight decreases from 17.9 using the data for 100 cycles to 13.3 using the data for 300 cycles.
Table 4.1: Feature weights for the random forest model with batch splitting 100 cycles 200 cycles 300 cycles
Q_cycle2 3.3 2.4 2.0 charging_time 17.9 16.2 13.3 Tintegral 6.1 5.3 4.2 log_DeltaQ_min 26.5 27.8 30.7 log_DeltaQ_var 25.1 26.7 28.4 log_Qlinfit_int 4.1 7.5 9.6 log_Qlinfit_slope 2.9 1.6 1.4 log_IR_min 4.9 3.6 3.2 log_IR_diff 6.4 4.7 4.3
Table 4.2 gives the feature weights for 100, 200, and 300 cycles for the elastic net model with batch splitting and shows that the log_DeltaQ_var and log_DeltaQ_min features have the highest weights. The log_DeltaQ_min feature weight increases from 5.6 using the data for 100 cycles to 6.9 using the data for 300 cycles and the log_DeltaQ_var feature weight increases from 6.3 using the data for 100 cycles to 6.8 using the data for 300 cycles. Similarly, the log_Qlinfit_int feature weight increases from 0.2 using the data for 100 cycles to 0.7 using the data for 300 cycles, but the log_Qlinfit_slope feature weight decreases from 0.6 using the data for 100 cycles to 0.2 using the data for 300 cy-cles. The charging_time feature weight decreases from 3.5 using the data for 100 cycles to 1.4 using the data for 300 cycles.
Table 4.2: Feature weights for the elastic net model with batch splitting 100 cycles 200 cycles 300 cycles
Q_cycle2 2.3 1.5 1.3 charging_time 3.5 1.9 1.4 Tintegral 0.7 0.3 0.2 log_DeltaQ_min 5.6 6.2 6.9 log_DeltaQ_var 6.3 6.6 6.8 log_Qlinfit_int 0.2 0.4 0.7 log_Qlinfit_slope 0.6 0.3 0.2 log_IR_min 4.3 3.4 2.6 log_IR_diff 1.1 0.5 0.3
Table 4.3 gives the feature weights for 100, 200, and 300 cycles for the random forest model with random splitting. The log_DeltaQ_min feature weight increases from 30.3
using the data for 100 cycles to 33.2 using the data for 300 cycles. The log_DeltaQ_var feature weight also increases from 32.8 using the data for 100 cycles to 35.9 using the data for 300 cycles. These two features related to∆Q(V ) have the hightest weights. Compared with the random forest feature weights with batch splitting, the weights of the log_DeltaQ_var and log_DeltaQ_min features increase with random splitting. The log_Qlinfit_int feature weight increases from 3.5 using the data for 100 cycles to 16.3 using the data for 300 cycles while the log_Qlinfit_slope feature weight decreases from 2.0 using the data for 100 cycles to 1.8 using the data for 300 cycles. For features related to physical measurements, the charging_time feature weight decreases from 7.8 using the data for 100 cycles to 5.5 using the data for 300 cycles. The charging_time feature weight using the data for 100 cycles decreases from 17.9 with batch splitting to 7.8 with random splitting.
Table 4.3: Feature weights for the random forest model with random splitting 100 cycles 200 cycles 300 cycles
Q_cycle2 2.1 1.7 1.3 charging_time 7.8 6.1 5.5 Tintegral 2.7 2.4 2.1 log_DeltaQ_min 30.3 31.6 33.2 log_DeltaQ_var 32.8 34.2 35.9 log_Qlinfit_int 3.5 8.0 16.3 log_Qlinfit_slope 2.0 1.9 1.8 log_IR_min 8.9 7.2 4.9 log_IR_diff 6.8 5.2 3.8
Table 4.4 gives the feature weights for 100, 200, and 300 cycles for the elastic net model with random splitting and shows that the log_DeltaQ_var and log_DeltaQ_min features have the highest weights. The log_DeltaQ_min feature weight increases from 6.2 using the data for 100 cycles to 18.4 using the data for 300 cycles and log_DeltaQ_var feature weight increases from 7.6 using the data for 100 cycles to 10.0 using the data for 300 cycles. Similarly, the log_Qlinfit_int feature weight increases from 1.0 using the data for 100 cycles to 4.1 using the data for 300 cycles, but the log_Qlinfit_slope feature weight decreases from 2.1 using the data for 100 cycles to 0.3 using the data for 300 cycles. The charging_time feature weight decreases from 1.1 using the data for 100 cycles to 0.8 using the data for 300 cycles. The charging_time feature weight using the data for 100 cycles decreases from 3.5 with batch splitting to 1.1 with random splitting. The log_DeltaQ_var feature weight using the data for 300 cycles increases from 6.8 with
batch splitting to 10.0 with random splitting. The log_DeltaQ_min feature weight using the data for 300 cycles also increases from 6.9 with batch splitting to 18.4 with random splitting.
Table 4.4: Feature weights for the elastic net model with random splitting 100 cycles 200 cycles 300 cycles
Q_cycle2 3.0 2.8 2.6 charging_time 1.1 1.0 0.8 Tintegral 0.6 0.4 0.2 log_DeltaQ_min 6.2 17.3 18.4 log_DeltaQ_var 7.6 8.0 10.0 log_Qlinfit_int 1.0 2.6 4.1 log_Qlinfit_slope 2.1 1.9 0.3 log_IR_min 3.3 2.0 1.8 log_IR_diff 0.9 0.4 0.3
4.2
Model Evaluation and Validation
Three models are considered in this section, the benchmark model using the log vari-ance of∆Q(V ), and the elastic net and the random forest models with the same nine features. The evaluation metrics MPE and RMSE are used to evaluate model prediction performance.
4.2.1
Batch Splitting Results
Tables 4.5, 4.6, and 4.7 present the MPE for the three models for 100, 200, and 300 cy-cles with batch splitting into training, validation, and test data, respectively. The elas-tic net model achieves the lowest test error (9.58%) using the data for 300 cycles, fol-lowed by the random forest model (13.35%) and benchmark model (13.84%). There-fore, the elastic net model has the best model performance. The benchmark model performance improves from a test error of 15.69% using the data for 100 cycles to a test error of 13.84% using the data for 300 cycles.
Table 4.5: MPE for the training data with batch splitting 100 cycles 200 cycles 300 cycles
benchmark 16.94% 12.91% 12.63%
elastic net 7.57% 6.30% 5.97%
random forest 4.92% 4.68% 4.09%
Table 4.6: MPE for the validation data with batch splitting 100 cycles 200 cycles 300 cycles
benchmark 19.57% 13.29% 12.84%
elastic net 11.38% 7.58% 6.42%
random forest 13.65% 8.95% 7.98%
Table 4.7: MPE for the test data with batch splitting 100 cycles 200 cycles 300 cycles
benchmark 15.69% 15.32% 13.84%
elastic net 11.96% 11.27% 9.58% random forest 14.67% 13.52% 13.35%
The observed and predicted cycle life results for the benchmark, elastic net, and random forest models with batch splitting are illustrated in Figures 4.1, 4.2, and 4.3, re-spectively. In these figures, the training, validation, and test results are shown in blue, orange, and green, respectively. A dot on the dashed black line means the predicted cycle life is the same as the observed cycle life. The histograms show the distribution of the residuals (predicted cycle life - observed cycle life) for the test data. These re-sults show that increasing the number of cycles moves the dots closer to the dashed black line, i.e. the residuals decrease so the predicted results are closer to the observed results.
(a)
(c)
Figure 4.1: Observed and predicted cycle lives for the benchmark model with batch splitting for (a) the first 100 cycles, (b) the first 200 cycles, and (c) the first 300 cycles.
The inset is the histogram of the residuals (observed cycle life - predicted cycle life) for the test data.
(b)
(c)
Figure 4.2: Observed and predicted cycle lives for the elastic net model with batch splitting for (a) the first 100 cycles, (b) the first 200 cycles, and (c) the first 300 cycles.
The inset is the histogram of the residuals (observed cycle life - predicted cycle life) for the test data.
(a)
(c)
Figure 4.3: Observed and predicted cycle lives for the random forest model with batch splitting for (a) the first 100 cycles, (b) the first 200 cycles, and (c) the first 300 cycles.
The inset is the histogram of the residuals (observed cycle life - predicted cycle life) for the test data.
The RMSE with the three ML models for 100, 200, and 300 cycles and batch splitting into training, validation, and test data is given in Tables 4.8, 4.9, and 4.10, respectively. The elastic net model achieves the lowest test RMSE (199) using the data for 300 cycles, followed by the random forest model (224) and benckmark model (266). In terms of model performance, MPE is a more important metric for battery lifetime prediction than RMSE [9].
Table 4.8: RMSE for the training data with batch splitting 100 cycles 200 cycles 300 cycles
benchmark 168 157 173
elastic net 62 56 81
Table 4.9: RMSE for the validation data with batch splitting 100 cycles 200 cycles 300 cycles
benchmark 219 210 223
elastic net 123 85 102
random forest 148 140 128
Table 4.10: RMSE for the test data with batch splitting 100 cycles 200 cycles 300 cycles
benchmark 283 291 266
elastic net 207 206 199
random forest 243 234 224
4.2.2
Random Splitting Results
Tables 4.11 and 4.12 present the MPE for the three models for 100, 200, and 300 cycles with random splitting into training and test data, respectively. The elastic net model achieves the lowest test error (7.28%) using the data for 300 cycles, followed by the ran-dom forest model (7.85%) and benchmark model (11.45%). The use of ranran-dom splitting results in a lower test MPE for the three ML models. For example, the test error with the random forest model is 7.85% which is lower than the 13.35% with batch splitting.
Table 4.11: MPE for the training data with random splitting 100 cycles 200 cycles 300 cycles
benchmark 12.99% 9.72% 9.37%
elastic net 8.98% 7.33% 6.69%
random forest 3.63% 3.20% 3.09%
Table 4.12: MPE for the test data with random splitting 100 cycles 200 cycles 300 cycles
benchmark 15.29% 12.18% 11.45%
elastic net 11.59% 7.93% 7.28%
random forest 14.28% 9.51% 7.85%
The observed and predicted cycle life results for the benchmark, elastic net, and random forest models with random splitting are illustrated in Figures 4.4, 4.5, and 4.6,
respectively. In these figures, the training and test results are shown in blue and green, respectively. A dot on the dashed black line means the predicted cycle life is the same as the observed cycle life. The histograms show the distribution of the residuals (pre-dicted cycle life - observed cycle life) for the test data. These results show that random splitting results in a lower test RMSE for the three ML models.
(a)
(c)
Figure 4.4: Observed and predicted cycle lives for the benchmark model with random splitting for (a) the first 100 cycles, (b) the first 200 cycles, and (c) the first 300 cycles.
The inset is the histogram of the residuals (observed cycle life - predicted cycle life) for the test data.
(b)
(c)
Figure 4.5: Observed and predicted cycle lives for the elastic net model with random splitting for (a) the first 100 cycles, (b) the first 200 cycles, and (c) the first 300 cycles. The inset is the histogram of the residuals (observed cycle life - predicted cycle life)
(a)
(c)
Figure 4.6: Observed and predicted cycle lives for the random forest model with random splitting for (a) the first 100 cycles, (b) the first 200 cycles, and (c) the first 300 cycles. The inset is the histogram of the residuals (observed cycle life - predicted cycle
life) for the test data.
The RMSE with the three ML models for 100, 200, and 300 cycles and random split-ting into training and test data is given in Tables 4.13 and 4.14, respectively. The elastic net model achieves the lowest test RMSE (95) using the data for 300 cycles, followed by the random forest model (107) and benckmark model (132).
Table 4.13: RMSE for the training data with random splitting 100 cycles 200 cycles 300 cycles
benchmark 150 132 136
elastic net 130 109 104
random forest 52 50 48
Table 4.14: RMSE for the test data with random splitting 100 cycles 200 cycles 300 cycles
benchmark 173 155 132
elastic net 152 132 95