Machine learning towards prediction of conceptual Density Functional Theory properties

(1)

Master Thesis Chemistry

Machine learning towards prediction of conceptual

density functional theory properties

By

Wendy Kossen

27

th

of May 2021

Student Number

11002182

Research Institute

Amsterdam Institute of Molecular and Life

Sciences

Research Group

Theoretical Chemistry

Supervisor

prof. dr. L. Visscher

Daily supervisors

Bas van Beek Msc.

Jelena Belić Msc.

Second Examinator

Dhr. dr. ir. B. Ensing

(2)

1 Abstract

Fossil fuels have a significant impact on the environment and a new sustainable energy source needs to be found. Solar energy can be the energy source we are looking for. Dye-sensitized solar cells can help to use the solar energy to split water and store the energy as hydrogen. Excitation energy calculations can provide us with the information if the dye is able to absorb light in the visible light region or not. Quantum dots can be useful for converting the solar energy into an electric current, having the advantage of a tuneable bandgap depending on size. For this purpose, they need to be stabilised. Conceptual DFT calculations can provide useful information about the ligand-core interaction and thus the ability of the ligand to stabilise the quantum dot. A machine learning model is developed that will be able to predict the properties that are normally calculated with conceptual DFT.

(3)

2

1. Introduction

Currently, our energy is provided mostly by fossil fuels. However, they are depleting rapidly and the combustion of these fuels produces green-house gasses which are known to have a significant impact on global warming and climate change. The environmental concerns lead to the public awareness that we must urgently explore alternative, renewable and carbon-neutral energy sources to replace fossil fuels. One of these energy sources would be the sun. The supply of solar energy to our planet is gigantic, harvesting less than 0,02% of this energy would already satisfy our current energy needs.1 One approach would be to convert the solar energy directly to storable energy-rich compounds known as solar fuels.2_{Hydrogen is the most desirable energy carrier, since the combustion of hydrogen only} produces water. Dye sensitized photo-electrochemical cells are promising candidates for this efficient transformation of solar energy.3_{These cells consist of a semiconductor sensitized with a light} absorbing molecule (dye) that is coupled to a water oxidation catalyst.4_{Both the dye and the catalyst} need to be optimal for the most efficient transformation. An optimal dye has characteristics such as a wide range of absorption for visible light, affinity toward fast electron injection into the semiconductor, and a low charge recombination rate. The absorption for visible light can be determined using excitation energy calculations. We will perform these quantum chemical calculations, focussing on time-dependent methods, as these are capable of quickly evaluating the position and intensity of absorption peaks in the visible part of the solar spectrum.5,6_{We will focus on} how the excitation energy is affected by adding ligands to the dye core, the hypothesis being that the addition of ligands will lower the excitation energy to the visible light region.

Another approach would be to convert the energy of the sunlight into an electric current. This can be achieved with the help of quantum dots: semiconductor colloidal nanocrystals. Quantum dots have the advantage of a tuneable bandgap as a result of size variation.7_{The energy gap increases with a} decrease in size of the quantum dot, allowing for the construction of nanostructured solar cells that are able to harvest a wide range of the solar spectrum. However, the quantum dot needs to be stabilized, which can be done with the addition of ligands. For the stabilisation to occur, the interaction of the ligands with each other, the core and the solvent are important.8_{If the ligand-solvent} interaction is stronger than the ligand-core interaction, the ligand will favour dissociation and destabilize the quantum dot. If a ligand is too bulky, it simply won’t fit on the surface, which also leads to dissociation. The property we are interested in the most is the ligand-core interaction. We can calculate this explicitly with standard DFT calculations; however, these calculations are both expensive and specific to the core-ligand pairs. We want to have a general idea of the ligand-core interaction, independent of the core. With the help of conceptual DFT, we can calculate properties of the ligand that will give us a reasonable indication of this interaction.

We want to be able to screen many ligands for the quantum dot stabilisation. Performing conceptual DFT calculations on the ligands is no longer feasible when the list exceeds a hundred thousand. However, we can use the conceptual DFT calculations for the development and training of a machine learning model. This model might be able to predict the properties of future ligands to good accuracy, making for a quicker and cheaper procedure of approximating the properties without having to calculate them explicitly. Eventually, this model might be extended to be able to predict the influence of a ligand on the excitation energy of the dye.

(5)

4

1.1 Density Functional Theory

The goal of quantum chemical approaches is to find the approximate solution of the time-independent, non-relativistic Schrödinger equation, shown in Equation 1 in its simplest form.9 Equation 1: Simple notation of the Schrödinger equation

Ĥ𝜓 = 𝐸𝜓

Where H is the Hamiltonian operator for a molecular system, representing the total energy. E is the ground state energy of the system and 𝜓 is the wavefunction. The Hamiltonian consists of the kinetic and potential energy. Since the electrons move much faster than the nuclei, we can approximate the nuclei to be at a fixed point, called the Born-Oppenheimer approximation.10_{This means that the nuclei} kinetic energy will be zero and the nuclei potential energy will merely be a constant, reducing the Hamiltonian to the so-called electronic Hamiltonian. Once the wavefunction is determined all properties of interest can be obtained by applying the appropriate operator to the wavefunction.9 However, this is not a simple task, in practice it is impossible to solve the Schrödinger equation exactly. Nevertheless, there is a recipe for approaching the wave function of the ground state, i.e. the state which delivers the lowest energy. This recipe is the variational principle - we minimize the energy by searching through all acceptable N-electron trial wave functions to find the energy and corresponding wave function that are closest to the ground state energy and wavefunction.9_{Since searching over all} acceptable functions is impossible, a subset is chosen to find the best approximation of the exact wavefunction from that subset.

One important example of such a subset is the Hartree-Fock approximation, which consists of all anti-symmetric products composed of N spin orbitals.11_{The N-electron wavefunction will be approximated} by an antisymmetrized product of N one-electron wavefunctions χ. This product is referred to as a Slater determinant, 𝛷𝑆𝐷, and shown in the short-hand notation in Equation 2, where only the diagonal

elements are given.

Equation 2: Short-hand notation of the Slater determinant

𝜓0≈ 𝛷𝑆𝐷 =

1

√𝑁!det {𝜒1(𝑥1) 𝜒2(𝑥2) … . 𝜒𝑁(𝑥𝑁)}

Density functional theory (DFT) is based on the idea that the electron density determines the ground state properties of the system. The electron density is defined as the multiple integral over the spin coordinates(sN) of all electrons and over all but one of the spatial variables(xN), shown in Equation 3.9 Equation 3: Definition of the electron density

𝜌(𝑟) = 𝑁∫ … ∫ | 𝜓(𝑥1, 𝑥2, … , 𝑥𝑁) |2 𝑑𝑠1𝑑𝑥2… 𝑑𝑥𝑁

𝜌(𝑟) determines the probability of finding any of the N electrons within the volume element r with arbitrary spin, while the other N-1 electrons have arbitrary positions and spin in the state represented by ψ.9_{Strictly speaking 𝜌(𝑟) is a probability density, but it is common practice to call it the electron} density. Unlike the wavefunction, the electron density can be measured experimentally. The concept of electron density can be expended to the probability of finding not one, but two electrons within two volume elements, while the remaining N-2 electrons have arbitrary positions and spin. This is called the pair density.9_{The probability of finding two electrons with the same spin at the same point} in space is exactly zero. Thus, electrons of like spin do not move independently from each other. This effect is known as exchange or Fermi correlation. This correlation is included in the Hartree-Fock approach, due to the antisymmetry of a Slater determinant.12

(6)

5 The first attempt to use the electron density rather than the wave function to obtain information about atomic and molecular systems dates back to the early work of Thomas and Fermi.13,14_A quantum statistical model of electrons is formulated which takes into account only the kinetic energy and treats the nuclear-electron and electron-electron contributions in a classical way. This leads to an expression of the energy of an atom, shown in Equation 4. The importance is not how well this equation can describe the energy, because it is a very coarse approximation, but that the energy is given completely in terms of the electron density.

Equation 4: Expression of the energy of an atom defined by Thomas and Fermi. Z = number of protons

𝐸𝑇𝐹 = 3 10(3𝜋 2₎2₃_{∫ 𝜌}5₃_{(𝑟)𝑑𝑟 − 𝑍 ∫}𝜌(𝑟) 𝑟 𝑑𝑟 + 1 2∫ ∫ 𝜌(𝑟1)𝜌(𝑟2) 𝑟12 𝑑𝑟1𝑑𝑟2

Density functional theory as we know it today was formed after the theorems provided by Hohenberg and Kohn.15_{These theorems represent the major theoretical pillars on which all modern day density} functional theories are erected. The first theorem proves that the electron density can in fact uniquely determine the Hamilton operator and thus all properties of the system. The second theorem stated that the functional for the ground state energy of the system, delivers the lowest energy only if the input density is the true ground state density. This is in comparison with the variational principle, where the functional will deliver the lowest energy only if the input wavefunction is the true ground state wavefunction.9

Kohn and Sham put the Hohenberg and Kohn theorems to use and came with a way to approximate the functional that determines the ground state energy of the system.16_{They realize that orbital-based} approaches, such as the Hartree-Fock method, perform better at determining the kinetic energy than the method provided by Thomas and Fermi. Kohn and Sham introduced the concept of a non-interacting reference system, which has a Slater determinant as exact solution This system is built from a set of orbitals such that the major part of the kinetic energy can be computed with good accuracy.16_{The remainder is merged with the non-classical contributions to the electron-electron} repulsion. These are unknown, but usually fairly small. With this method, as much information as possible is computed exactly, leaving only a small part of the total energy to be determined by an approximate functional.

(7)

6

1.2 Conceptual Density Functional Theory

Conceptual DFT (cDFT), developed by Parr, is a branch of DFT.17_{cDFT concentrates on the extraction} of chemically relevant concepts and principles from DFT, based on derivatives of the energy with respect to the number of electrons and the external potential.18_{This allows for the quantification of} chemical properties, such as hardness and softness, that are also widely used outside the scope of computational chemistry.19_{In Figure 1 the hierarchy of cDFT properties is shown.}

Figure 1: Definitions of conceptual DFT properties, derived from the energy with respect to the number of electrons and external potential.20

The calculated properties are conceptual and can therefore not be directly confirmed with experimental values, e.g. there is no experiment available that measures the hardness of a molecule. However, it is possible to calculate the properties by hand with the use of experimental data, specifically the energy. Looking at Figure 2, the energy of the molecule can only be experimentally determined for integer numbers of electrons, meaning the neutral state, n+1 state and n-1 state.21 This can be plotted, with the slope representing the electronic chemical potential. Thinking of the values in between as a smooth curve will make it possible to determine the electronic chemical potential for each molecule, independent of the number of electrons being an integer number or not. Aforementioned states can furthermore be approximated with Koopmans theorem, which states that the ionisation

energy of the molecule approximates the inverse energy of the HOMO.22_{This provides the baseline} for two methods that are able to calculate the electronic chemical potential, and all other cDFT properties related to the number of electrons.

The first method is the Finite Difference Linearization (FDL), this method depends on the ionisation energy, electron affinity and neutral energy of the molecule.21_{The second method is the Frontier} Molecular Orbital (FMO), this method depends on the energies of the HOMO and LUMO orbitals.23 The FDL method is more accurate compared to the FMO, since the relevant derivatives are explicitly

Figure 2: Plot of the energy against the number of electrons for any arbitrary molecule.21

(8)

7 calculated using n+1 and n-1 states, whereas FMO merely approximates these derivatives via a combination of HOMO and LUMO energies.

Before cDFT can be properly used, it is important to know how sensitive it is to different levels of theory. Benchmarks are scarce in the literature, the level of theory used in one of the few examples furthermore being unavailable to us.24_{We performed cDFT calculations on a range of molecules,} including one charged molecule and 7 net neutral molecules, using multiple functionals and basis sets to determine the sensitivity. The goal is to compare the values, calculated with the different functionals using the FMO method, to the calculated values from the FDL method. This way it is possible to determine which basis set and functional will give the most accurate results.

(9)

8

1.3 Machine Learning

In the last decade, machine learning has been applied widely throughout computer science and beyond. It is used for many applications, such as spam filters, web searches, ad placement and drug design.25_{A machine learning algorithm can perform important tasks by generalizing from examples,} the more examples provided, the more complicated problems the machine can tackle. This is often more feasible and cost-effective than manual programming.25_{However, developing a successful} machine learning algorithm in practice is difficult, with no textbook approach available. The two types of problems that can be solved with machine learning include regression and classification. With classification, the output will be a set answer. The input can be divided in certain groups and the output will determine which group the input best belongs to. For example, machine learning can be used for spam filters where the machine will determine if an email belongs to the group of ‘spam’ or ‘no spam’. For regression problems the output of the machine can be anything, there are no set answers. For example, for the prediction of chemical properties, the machine is able to output a value of a property when given a molecule as input. We will be looking at the neural network as a model for our machine to solve our regression problem.

Figure 3: Schematic of a machine learning neural network model with 2 hidden layers

A neural network, depicted in Figure 3, consist of an input layer, an output layer and layers in between called hidden layers or activation units.26_{Each layer can be seen as a column vector consisting of a} number of nodes. As is visible in Figure 3, each node is connected to every node in the next layer. The significance of this connection is determined by the weight, a higher weight means a higher significance. This means that every layer has its own matrix of weights, were the dimension of the matrix is determined by the sizes of the current and next layer. To control the function mapping between each layer we also have an activation function. To come to an output, the input vector will be multiplied with the weights matrix of which the output will be run through an activation function, the resulting vector is the next layer. This can be depicted as an equation, as is shown in Equation 5. Equation 5: Function mapping from the input layer to the first hidden layer of a neural network

(10)

9 In our case, the neural network takes in a circular fingerprint, which represents the connectivity information of the molecule.27_{Going through the hidden layers it can produce an output, in this case} the value of the property we are trying to calculate. Predicting multiple properties at once will result in a larger output layer.

Before its use, the model will have to be trained. We divide the computed data in two parts, the machine learning model will take in 80% of the computed data for training.26_{The other 20% will be} used for validation. This division is required to reduce the effect of over- and underfitting, which can be seen in Figure 4.

Figure 4: Graphical representation of over- and underfitting of a machine learning model

When overfitting, the model can predict the properties correctly for the pre-existing training data, but it will have trouble predicting properties of unseen molecules. When underfitting, the model will have trouble correctly predicting the properties for both the training and new data. After training we will be predicting the properties of the molecules we kept apart, which the model has not seen before. Since we know what the output should be, we can compare the predictions with the expected values and validate the model.

For training, the backpropagation method is used, a schematic of this method can be seen in Figure 5. For this method the weights matrix of each layer is randomly initialized.26_{After a single run, the error} between the predicted and expected values is computed and the model will look back at the weights it first initialized. With the derivative of the error with respect to the weights, the model can determine better values for the weights. Once it determined new values, it will run the model again, determine the error etc. Each time the model runs, the weights matrices of the layers are adjusted and the prediction will change.

(11)

10 Figure 5: Schematic of the backpropagation training of a machine learning model

To find the best setup, we will be varying a few parameters, namely: - the number of hidden layers

- the sizes of the hidden layers - the number of epochs

- the batch size

- the activation function

- the number of properties trained at a time

The number of epochs is the amount of times the model will run over all the data. The batch size is the amount of data points the model uses during a single run. If the batch size is smaller than the total amount of data, is will take multiple runs for the model to complete one epoch. Two activation functions are available to us, the rectified linearization function (ReLU, Equation 6) and the sigmoid function (Equation 7).

Equation 6: Rectified Linearisation activation function

𝑅𝑒𝐿𝑈(𝑥) = max (0, 𝑥) Equation 7: Sigmoid activation function

𝑆𝑖𝑔𝑚𝑜𝑖𝑑(𝑥) = 1 1 + 𝑒−𝑥

The ReLU function is more commonly used for linear regression models and will change the output of the vector times the matrix to a 0 if the output is negative. The sigmoid function is more commonly used in logistical regression models and will change this value to anything between 0 and 1.26_To determine the best values for these hyperparameters and the best activation function, we will look at the correlation coefficient of the predicted against the expected values, also known as the R2_value. The closer to 1 this number is, the better the model performs.

(12)

11

2. Computational Details

2.1 Calculations of Excitation Energies

With the help of the QMFlows and CAT programs, three dye cores were taken and monosubstituted with 7 different ligands.28,29_{Next, the light absorbing dyes were optimised with DFTB3 using the 3-ob} parameter set.30_{Excitation energy calculations were performed with sTDDFT, using the} Coulomb-Attenuated B3LYP functional (CAM-B3LYP) and the DZP basis set, matching the level of theory of an article by J. Belic and B. van Beek.4,6,31_{The calculations were repeated with TDDFTB, using the 3-ob} parameter set.5

2.2 Sensitivity Analysis of conceptual DFT

Data has been created using packages from AMS, following the FMO method. All molecules were optimised using the DFTB method with the GFN1-xTB parameter set. For the basis set sensitivity analysis BP86 was chosen as the functional with DZP, TZ2P and augmented T2ZP (aug-TZ2P) as the tested basis sets since these are progressively more accurate. For the functional analysis the DZP basis set was chosen and a range of functionals were chosen from the groups GGA, meta GGA, hybrid, meta hybrid and range separated hybrids. Specifically: LDA, PBE, BLYP, M06-L, PBE0, B3LYP, M06 and CAM-B3LYP.11,31–34

The formulas needed to calculate the properties with the FDL and the FMO methods are given in Table 1, derived from the exact definition given in Figure 1.

Table 1: Formulas to calculate cDFT properties using the FDL and FMO method. Ea = Electron affinity. En = Energy of Neutral Molecule. Ei = Ionisation Energy.

Property FDL Formula23 _{FMO Formula}23

Electronic Chemical Potential (µ) 𝐸𝑎 − 𝐸𝑖 2 𝜀𝐻𝑂𝑀𝑂+ 𝜀𝐿𝑈𝑀𝑂 2 Electronegativity −1 ∗ µ −1 ∗ µ Hardness (η) 𝐸𝑎 − 2 ∗ 𝐸𝑛 + 𝐸𝑖 𝜀𝐿𝑈𝑀𝑂− 𝜀𝐻𝑂𝑀𝑂 Softness 1 η 1 η Electrophilicity Index µ2 2 ∗ η µ2 2 ∗ η

Dissociation Energy Nucleofuge (µ + η)2

2 ∗ η

(µ + η)2

2 ∗ η

Dissociation Energy Electrofuge (µ − η)2

2 ∗ η

(µ − η)2

2 ∗ η

Electro donating Power (3 ∗ µ−+ µ+)2

16 ∗ (µ+− µ−)

(3 ∗ µ−+ µ+)2

16 ∗ (µ+− µ−)

Electro accepting Power (µ−+ 3 ∗ µ+)2

16 ∗ (µ+− µ−)

(µ−+ 3 ∗ µ+)2

16 ∗ (µ+− µ−)

Electronic Chemical Potential (µ+) 𝐸𝑎 − 𝐸𝑛 𝜀_{𝐿𝑈𝑀𝑂} Electronic Chemical Potential (µ-) 𝐸𝑛 − 𝐸𝑖 𝜀_{𝐻𝑂𝑀𝑂}

(13)

12 A range of carboxylic acids and similar structures were chosen of different sizes, specifically: acetic acid, butyric acid, crotonic acid, formic acid, nitric acid, cytosine, nitrobenzene and acetate (shown in Figure 6).

Figure 6: Overview of all the molecules the cDFT sensitivity analysis was performed with. The Acetate molecule was only used for the basis set sensitivity calculations and excluded from the functional sensitivity analysis.

2.3 Machine Learning

Calculations were performed on a total of 8000 amines and carboxylic acids, taken from the GDB13 database. All molecules were optimised using the DFTB method with the GFN1-xTB parameter set. cDFT calculations were performed with the DZP basis set and CAM-B3LYP functional. The machine learning model consists of a neural network using the pytorch software.

(14)

13

3. Results and Discussion

3.1 Calculations of Excitation Energies

The lowest excitation energies were calculated of the core naphthalene diimide molecule (NDI, Figure 7) and monosubstituted NDI using the sTDDFT and TDDFTB methods. The energies provided in an article by J. Belic and B. van Beek are used as a reference for the sTDDFT calculations, since these were calculated using the same level of theory.

The results of the different methods are shown in Figure 8. We can point out that these two sets of data have good correlation and the existing data was successfully reproduced. The small difference between the

calculated values and the values of the article comes from the geometry difference between those molecules. It is assumed that this is a consequence of the development of the used software. In the process of the molecule generation, after the ligands are attached to the core, the whole molecule is optimized. In the development of CAT, a different universal force field (UFF) method is used, which can lead to different geometries calculated with a newer version of CAT. In addition, there has been an update of RDkit (version 2020.03) related to the geometry optimisation methods, which may also lead to a difference in geometry.

The DFTB results are lower in energy compared to sTDDFT. The systematic overestimation with sTDDFT and systematic underestimation with the DFTB excitation energy are expected compared to the experimental values, as described by J. Belic and B. van Beek.4_{However, both methods follow the} same general trend barring a constant offset. When comparing the unsubstituted core with the monosubstituted core, it is visible that the ligands lower the excitation energies to the preferred region. The lower limit hereby being the threshold for the water oxidation reaction (1,35eV) and the upper limit being the intensity of the UV light on the earth’s surface (3,2eV).

Figure 8: First excitation energies of the bare NDI core and monosubstituted NDI core, calculated with the TDDFTB method with 3-ob parameter set and the sTDDFT method with CAM-B3LYP functional.

0,0 0,5 1,0 1,5 2,0 2,5 3,0 3,5 4,0

Bare NDI o-ethyl bithiophene pyrrolidine nn-aniline nh-ethyl s-ethyl thiophene

Excita tio n E n ergy (e V)

NDI Core - Monosubstituted

sTDDFT(CAM-B3LYP) TDDFTB(3-ob) Article(sTDDFT(CAM-B3LYP))

Figure 7: The NDI dye core, highlighted atom is replaced during substitution.

(15)

14 The calculations were repeated on two more dye cores using the same ligands and methods; naphthalene monoimide (NMI, Figure 9) and phthaldiimide (phDI, Figure 10). The graph of the first excitation energies is shown in Figure 11 and Figure 12, respectively.

When looking at the graph we can again see that the sTDDFT method systematically overestimated the energies when compared to the DFTB calculations. Just like NDI, the data sets do follow the same trends. Comparing the unsubstituted core with the monosubstituted core we can see that the ligands again lower the excitation energies. For the NMI core the energy is not lowered enough for all the monosubstituted cores to be in the visible light region based on sTDDFT calculations. For the phDI core, the monosubstituted core are all lowered to the preferred region.

Figure 11: First excitation energies of the bare NMI core and monosubstituted NMI core, calculated with the TDDFTB method with 3-ob parameter set and the sTDDFT method with CAM-B3LYP functional.

0,0 0,5 1,0 1,5 2,0 2,5 3,0 3,5 4,0 4,5

Bare NMI o-ethyl bithiophene pyrrolidine nn-aniline nh-ethyl s-ethyl thiophene

NMI Core - Monosubstituted

sTDDFT(CAM-B3LYP) TDDFTB(3-ob)

Figure 10: The phDI dye core, the highlighted atom is replaced during substitution. Figure 9: The NMI dye core, the

highlighted atom is replaced during substitution.

(16)

15 Figure 12: First excitation energies of the bare phDI core and monosubstituted phDI core, calculated with the TDDFTB method with 3-ob parameter set and the sTDDFT method with CAM-B3LYP functional.

In Figure 13 the excitation energies of all the cores with the different ligands is shown, calculated with the sTDDFT method. This method was chosen for the comparison as it had control values that could be replicated. We can see that the unsubstituted NDI core has a slightly lower excitation energy than the NMI and phDI cores. There are small differences between the energies of the ligands with different cores. For the NMI core you can see that the substituted core has slightly higher excitation energies as the substituted NDI and phDI cores. The different cores all lay in the same energy region and follow the same trend of the ligands lowering the excitation energy.

Figure 13: First excitation energies of the different bare cores and monosubstituted cores, calculated with the sTDDFT method with CAM-B3LYP functional.

0,0 0,5 1,0 1,5 2,0 2,5 3,0 3,5 4,0 4,5

Bare phDI o-ethyl bithiophene pyrrolidine nn-aniline nh-ethyl s-ethyl thiophene

phDI Core- Monosubstituted

sTDDFT(CAM-B3LYP) TDDFTB(3-ob) 0,0 0,5 1,0 1,5 2,0 2,5 3,0 3,5 4,0 4,5

Core o-ethyl bithiophene pyrrolidine nn-aniline nh-ethyl s-ethyl thiophene

All Cores - sTDDFT method

(17)

16 To conclude, there is a visible difference between the excitation energies when using different DFT methods, with the sTDDFT and TDDFTB method systematically over- and underestimating the energies, respectively. The substitution of the cores leads to a decrease in excitation energies, the magnitude depending on the ligand the core was substituted with. Overall bithiophene seems to lower the excitation energy the most. This ligand is also the most promising for further investigation since it is the only tested ligand that lowers the excitation energy of all tested cores to the desirable range of 1,35-3,2 eV.

(18)

17

3.2 Sensitivity Analysis of conceptual DFT

A sensitivity analysis is performed with a range of functionals and basis sets on a set of carboxylic acids and similar structures. Not all functionals are shown in the graphs to improve readability, functionals with similar values are filtered out. All calculated values for all the functionals and basis sets are given in tables in the appendix.

3.2.1 Results Basis set Sensitivity

To determine the basis set sensitivity, we will be looking at acetic acid and acetate, since these are similar except for the charge. In Figure 14 a range of calculated properties are depicted, calculated with the acetic acid molecule. It is visible that the calculations are not sensitive to the basis set. Therefore, there is no need to compare these calculations to the FDL method.

Figure 14: Comparison of basis sets with the BP86 functional on the Acetic Acid molecule.

The basis set sensitivity calculations were repeated with acetate, which is a charged molecule, a range of these calculated properties are depicted in Figure 15. It is visible that the net charge of the acetate influences the calculations in such a way that there is a high sensitivity to the basis set. This sensitivity is due to the LUMO orbital, which is significantly more diffused in charged molecules and thus needs a basis set that includes diffuse functions. For this reason, further calculations are only performed with molecules with a net charge of 0 as the cDFT properties are already effectively converged with basis set DZP and no higher basis set is needed.

-0,30 -0,20 -0,10 0,00 0,10 0,20 0,30 0,40 El ec . Che m . P o t. El ectr o n eg . H ard n es s El ec tr o p h il. Di ss o c. E . ( e le ctr o fu ge ) El ec tr o d o n . P o w er El ectr o ac c. Po w er El ec t. Ch e m P o t. + El ec t. Ch e m P o t. -H ar tree

Comparing Basis Sets (BP86 functional) - Acetic Acid

aug-TZ2P TZ2P DZP 0 1 2 3 4 5 6 7 So ftn e ss 1/Ha rtr ee

(19)

18 Figure 15: Comparison of basis sets with the BP86 functional on the Acetate molecule.

3.2.2 Results Functional Sensitivity

To determine the functional sensitivity, calculations were performed using all molecules depicted in Figure 6 except acetate. In Figure 16 a range of calculated properties are given for the acetic acid molecule. There is a clear difference between the properties calculated with the GGA and the hybrid functionals. The FDL values are closest to the values calculated with the hybrid functionals.

Figure 16: Comparison of different functionals calculated using FMO to FDL calculations with the DZP basis set and Acetic Acid molecule, based on formulas in Table 1.

The calculations of butyric acid, crotonic acid, formic acid, nitric acid, cytosine and nitrobenzene are given in appendix B, the same conclusions can be drawn for all molecules.

-0,15 -0,10 -0,05 0,00 0,05 0,10 0,15 0,20 0,25 0,30 El ec . Che m . P o t. El ec tr o n eg . H ard n es s El ec tr o p h il. Di ss o c. E . ( e le ctr o fu ge ) El ec tr o d o n . P o w er El ec tr o ac c. P o w er El ec t. Ch e m P o t. + El ec t. Ch e m P o t. -H ar tree

Comparing Basissets (BP86 functional) - Acetate

aug-TZ2P TZ2P DZP 0 2 4 6 8 10 12 14 16 18 So ftn e ss 1/Ha rtr ee -0,50 -0,40 -0,30 -0,20 -0,10 0,00 0,10 0,20 0,30 0,40 0,50 Ele c. C h em . Po t. Ele ct ro n e g. H ar d n e ss Ele ct ro p h il. Dis s. E. (n u cle o fu ge ) Dis s. E. (e le ct ro fu ge ) Ele ct ro d o n . Po w e r Ele ct ro ac ce . Po w er Ele c. C h em . Po t. + Ele c. C h em . Po t. -H ar tree

Comparing Functionals with FDL calculations (DZP Basis set)

-Acetic Acid

B3LYP(Hybrid) CAM-B3LYP(RS) LDA BLYP(GGA) FDL Calculated 0,00 1,00 2,00 3,00 4,00 5,00 6,00 So ftn e ss 1/Ha rtr ee

(20)

19 The difference between the calculations with different kinds of functionals is in the underlying theory. The HOMO-LUMO gap calculated with DFT, also called the optical gap, is a good approximation of the lowest excitation energy.35_{However, this optical gap is often much smaller than the fundamental gap,} which is the difference of the ionisation energy and the electron affinity. The fundamental gap is more accurately approximated with Hartree-Fock. Using GGA correlation-exchange functionals, HOMO and LUMO are pure DFT orbitals. The hybrid functionals include a fraction of Hartree-Fock exchange energy to calculate the HOMO and LUMO energies, with the amount of Hartree-Fock exchange varying per functional.6_{Range-separated hybrids contain a different amount of Hartree-Fock exchange for} short and long range interactions. Including the HF exchange makes the hybrid functional more accurate in determining the fundamental gap. Therefore, the results of FDL calculations, based on the fundamental gap, should correspond better to results obtained with hybrid functionals since we are trying to describe the fundamental gap, which is also what we found. The downside of using hybrid functionals is the higher computational cost.

To conclude, cDFT is not sensitive to different basis sets when neutral molecules are used, since the smallest tested basis set (DZP) is enough to effectively converge the properties of interest. There is however a significant difference between calculations with hybrid and non-hybrid functionals, as the Hartree-Fock exchange is taken into account by hybrid functionals. The FDL calculations are generally more accurate than the FMO calculations. Comparing the FDL results to the FMO calculation results with different functionals shows that the hybrid functionals, especially range-separated hybrids, will most likely give more accurate results. Therefore, we will use the CAM-B3LYP functional with the DZP basis set for characterising the training set for the machine learning model, as is shown that this combination would be the most successful in predicting the needed properties.

(21)

20

3.2 Machine Learning

We will be training a machine learning model varying multiple parameters with the main parameter being the number of properties trained at a time. First, we will be training and predicting all the properties at once. Second, we will try to improve the model by training and predicting the properties on at a time. Lastly, we will see if there are multiple properties that prefer the same settings and train these at the same time. One parameter was changed at a time, the model was run five times and the mean correlation was taken for comparison. We need to run the model multiple times because the randomly initialised weights don’t always convert to a global minimum in the number of epochs set. Taking an average over multiple runs will mitigate this effect.

3.2.1 First approach: all at once

We will begin with training and predicting all the properties at once. Since we want to know which parameter value would be best fitted for all properties, we will be looking at the average correlation over all the properties for each set up.

First, looking at the number of the hidden layers, we can plot the expected values against the predicted values and calculate the R2 values. The closer the R2_{value is to one, the better the prediction. In} Figure 17 the correlation is plotted for all properties and for different numbers of hidden layers all of size 100, with the number of epochs = 100, the batch size = 64 and the ReLU activation function.

We can increase the number of hidden layers while the correlation is also increasing, as we can see this is no longer the case when we switch from three to four hidden layers. This is the result of overfitting the training data, making it harder to predict the validation data correctly. Overall, three layers will give the most accurate results using these settings, resulting in a correlation of 48%.

For accessing the optimal number of hidden layers, all the layers were kept the same size. However, we can change these sizes to try and improve the model further. Since we don’t know the impact of changing these sizes, we need to look at all the hidden layers again. We won’t be looking at one and five layers, since these are unlikely to perform best based on aforementioned results. The different numbers of nodes were tested in a trail and error approach, varying size and order of the nodes.

We will first be looking at two hidden layers, shown in Figure 18. In the case of two layers there is a formula to determine the optimal number of nodes in each layer for m output nodes and N training samples.36_{This outcome should be able to train the model with negligibly small error when combined} with the sigmoid activation function. The number of nodes in the first hidden layer is given by Equation 8, and for the second by Equation 9.

Equation 8: Number of nodes in the first layer of a 2-layer model. N = training set size. m = number of output nodes

𝑛𝑜𝑑𝑒𝑠1= √(𝑚 + 2)𝑁 + 2√𝑁/(𝑚 + 2)

Equation 9: Number of nodes in the second layer of a 2-layer model. N = training set size. M = number of output nodes

𝑛𝑜𝑑𝑒𝑠2= 𝑚√𝑁/(𝑚 + 2) 0 0,1 0,2 0,3 0,4 0,5 R 2

Hidden Layers of

size 100.

1 2 3 4 5

Figure 17: Average correlation of all properties for different numbers of hidden layers. Epochs = 100, batch size = 64, ReLU activation function.

(22)

21 Since we will be calculating a total of 15 properties at once, we have 15 output nodes. In addition, we have a total of 8000 training samples, leading to the sizes 412 and 325, respectively.

We can see that the impact of the different sizes is limited. There is no clear ‘best’ combination of sizes, however, the combination 50;100 seems to perform best with an average correlation of 46%. The calculated values for the sizes do not perform better than the trial and error results. That could be because those values are based on a model using the sigmoid activation function, these results are gathered from a model using the ReLU activation function.

Next, we will look at three layers. Three layers gave the most accurate results when all layers were of equal size, however, there is no formula available to help finding the optimal number of nodes. The extra layer gives us more options for varying the sizes of the layers in a trial and error approach. The results are shown in Figure 20. Again, the impact of changing the sizes of the hidden layers is limited. Many more options are available that could lead to better results, however, in practice it is not possible to go over all these options. From the tested data, we can conclude that for three layers the model will reach an average correlation of about 50% if the layers are of sizes 100, 300 and 500, respectively. This is a small improvement to the two-layer model.

Finally, we will look at the four-layer model, shown in Figure 19. For the four-layer model only a limited number of different nodes were tested. However, we can already see a clear difference from the two- and three-layer model. Having a small first hidden layer and a large fourth layer increases the average correlation to 52%.

In previous attempts the batch size and the number of epochs were set to 64 and 100, respectively. Increasing or decreasing these numbers might improve the machine learning model. Since the four-layer model showed the best correlation we will try to find the best values for these parameters with this model. The correlation for the different number of epochs, with the four-layer model, is shown in Figure 21 and for different batch sizes in Figure 22.

0 0,1 0,2 0,3 0,4 0,5 0,6 R 2

2 Hidden Layers

-Different sizes.

50;100 100;300 100;100 300;100 100;50 412;325 0 0,1 0,2 0,3 0,4 0,5 0,6 R 2

3 Hidden Layers

-Different sizes.

50;100;300 100;300;500 100;100;100 500;300;100 300;100;50 50;100;50

Figure 20: Average correlation for three hidden layers of different sizes. Epoch = 100. Batch size = 64. ReLU activation function 0 0,1 0,2 0,3 0,4 0,5 0,6 R 2

4 Hidden Layers

-Different sizes.

25;50;100;300 50;100;300;500 4*100 500;300;100;50 300;100;50;25

Figure 18: Average correlation for two hidden layers of different sizes. Epochs = 100. Batch size = 64. ReLU activation function.

Figure 19: Average correlation for four hidden layers of different sizes. Epoch = 100. Batch size = 64. ReLU activation function.

(23)

22 It is visible that the correlation decreases when we perform more than 100 epochs. This is due to overfitting of the training data. The best results are acquired with 100 epochs, reaching a correlation of 52%.

The correlation decreases with a clear trend with an increase of the batch size. This is the effect of underfitting. If the model takes in more examples at a time, it needs to generalize more leaving less space for details. The original batch size of

64 seems to be the best fit.

Finally, we can change the activation function of the model. We will investigate if changing the activation function to the sigmoid function, or a combination of the two, will improve the model. A combination of the two means that the first function will be used to connect the input layer to the first hidden layer and the second function will be used to connect the hidden layers and the final hidden layer to the output layer. The results are shown in Figure 23.

For 4 layers, using only the ReLU activation function will result in the most accurate results. However, as mentioned before, we were able to calculate the best sizes of the hidden layers when we have two hidden layers and a sigmoid activation function.

Changing the activation function in the two-layer model to the sigmoid function indeed improves the model from an average correlation of 44% to 58%. We can conclude that the model with two layers of sizes 412 and 325, 100 epochs, a batch size of 64 and a sigmoid activation function will produce the most accurate predictions if all the properties are calculated at once. 0 0,1 0,2 0,3 0,4 0,5 0,6 R 2

Different number

of epochs.

50 100 200 300 500 0 0,1 0,2 0,3 0,4 0,5 0,6 R 2

Different batch

sizes

32 64 128 256 512

Figure 23: Average correlation for different activation functions. Epochs = 100. Batch size = 64. Hidden layers = 4 of sizes 25, 50, 100, 300 and 2 of sizes 412, 325.

Figure 21: Average correlation for different number of epochs. Batch size = 64. Hidden layers = 4 of sizes 25, 50, 100, 300. ReLU activation function

Figure 22: Average correlation for different batch sizes. Number of epochs = 100. Hidden layers = 4 of sizes 25, 50, 100, 300. ReLU activation function 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 4 l aye rs R 2

Different activation functions

ReLU Sigmoid ReLU/Sigmoid Sigmoid/ReLU

2 l

aye

(24)

23

3.2.2 Second Approach: One at a time

When we trained all the properties at once, we came to a final average correlation of 58%. We can try increasing this number by training each of the properties separately. When we trained all the properties at once, we found that the two-layer model in combination with the sigmoid activation function gave the best results. We will first investigate if training the model using these settings, but training each property separately, will improve the model. The results are depicted in Figure 24.

The difference between training and predicting all the properties at once and one at a time is minimal, resulting in correlations of 58,6% and 58,9%, respectively. But we can tune the model for each property separately. Whereas previously changing a parameter would improve the average correlation for most properties, we are now able to improve correlation for each property. We will test the performance of the model using the same order as previously, beginning with epochs = 100, batch size = 64, activation function = ReLU, hidden layers = 2 and varying the size of the hidden layers. We need to recalculate the sizes of the hidden layers of the two-layer model, since we now have only one output node instead of 15. Recalculating following Equation 8 and Equation 9, we come to the sizes 152 and

51. The results of all runs are given in appendix C, the final ‘best’ settings for each property and the correlation we reach are given in Table 2.

Table 2: Best settings of the model for each property and the correlation reached when each property is trained separately. R = ReLU. S = Sigmoid

Layers Nodes Epochs Batch size Activation function Correlation (%) Diss. E (nucleofuge) 3 100, 300, 500 100 64 R 78 Diss. E (electrofuge) 3 300, 100, 300 100 64 R 73 Electroacc. power 4 25, 50, 100, 300 50 64 R 72 Electrodon. power 2 152, 51 100 64 S 68 Electroneg. 4 300, 100, 50, 25 100 64 R 76

Elec. Chem. Pot. 3 300, 100, 300 50 64 R 76

Elec. Chem. Pot. + 4 500, 300, 100, 50 300 64 R 70

Elec. Chem. Pot. - 3 500, 300, 100 50 64 R 81

Electrophil. Index 3 50, 100, 300 100 64 R 68 Deltaf+ 3 300, 100, 300 100 512 R 48 Deltaf- 3 100, 300, 500 100 64 R/S 46 Hardness 3 300, 100, 300 100 64 R 75 Hyperhardness 3 500, 300, 100 100 64 R 61 Net Electrophil. 3 50, 100, 50 100 32 R 62 Softness 4 25, 50, 100, 300 100 64 R 31

The best correlation reached is 81% for the negative electronic chemical potential, and the worst correlation is 31% for the softness. Overall, the average correlation is increased from 58% when training all the properties at once to 66% when training each separately.

0,0 0,2 0,4 0,6 0,8 1,0

Different training

methods

All at once Seperately

Figure 24: Comparing predictions running the model with all properties and with one at a time. Epochs = 100. Batch size = 64. Hidden layers = 2 of sizes 412 and 325. Sigmoid activation function.

(25)

24

3.2.3 Third approach: Grouping of the properties

When training each property separately, the correlation is increased greatly. We want to know if we can reach the same, or a better, correlation if we group properties together based on the preferred settings. From previous results, when training each of the properties separately, we can tell that changing anything other than the number and sizes of the hidden layers often only results in a small increase of the correlation. Keeping this in mind, we can group properties together considering only the number and sizes of hidden layers. This results in the groups:

- Dissociation Energy (nucleofuge) and Delta f-

- Dissociation Energy (electrofuge), Electronic chemical potential, Delta f+ and Hardness - Electro accepting power and Softness

- Electronic chemical potential- and Hyperhardness

We will rerun the model, training the properties of each group at once. The results are compared to the ‘all at once’ and ‘one at a time’ training in Figure 25.

Figure 25: Correlation reached when using different training methods. Not all properties could be grouped, only properties using similar settings derived from Error! Reference source not found..

We were not able to divide all the properties in groups. Overall, the average accuracy gets worse if we train the properties in groups instead of all at once. In addition, we can see that training the properties separately increases the correlation for all properties, compared to training them all at once, not just the average correlation.

The downside of machine learning is that it can be hard to determine beforehand what will work best for the problem that you are trying to solve and there is a lot of trial and error involved.26_{In addition,} there is also no clear reason why some settings work better than others except for the over- and underfitting of the data. With the settings we tested, the best correlation is reached when we train each property separately.

0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0 D is s. E (n u cle o fu ge ) Di ss . E ( e le ctr o fu ge ) El ec tr o ac c. p o w er El ec tr o d o n . p o w er El ec tr o n eg . El ec . Che m . P o t. El ec. Ch e m . Po t. + El ec . Che m . P o t. -El ec tr o p h il. I n d ex De ltaf+ De ltaf-H ard n es s Hyp er h ar d n es s N e t E le ctr o p h il. So ftn e ss A ve rag e

Different training methods

(26)

25

4. Conclusion and Outlook

We want to be able to harvest the sunlight to be able to replace fossil fuels. One approach would be to convert the harvested energy directly to storable energy-rich compounds with the help of dye sensitized photo-electrochemical cells. For optimal results, the excitation energy of the dye should be in the visible light region. We looked at the excitation energy of multiple dye cores, bare and monosubstituted, calculated using sTDDFT. We observed that the ligands lowered the energy of all the investigated cores and that there is a trend based on the type of ligand. Specifically, monosubstitution of the dye core with bithiophene would lower the excitation energy of all tested cores to the preferred region.

Another approach would be to convert the harvested sunlight into an electric current with the help of quantum dots. To stabilise the quantum dots, ligands have to be added. The interaction of the ligand with the core should be stronger than the interaction of the ligand with the solvent, otherwise the ligand would favour dissociation, resulting in degradation of the quantum dot. In addition, the ligand should not be too bulky, otherwise it won’t fit on the surface of the quantum dot. With conceptual DFT (cDFT) calculations we can predict properties of the ligand related to the ligand-core interaction, giving us a general idea of the ligand-core interaction independent of the core. Before cDFT calculations were performed, a sensitivity analysis was done to identify the optimal level of theory. For this analysis we took several molecules and calculated the properties with different basis sets and functionals. In the end, we concluded that the smallest tested basis set (DZP) is diffused enough when calculations are performed on net neutral molecules. For the functional, CAM-B3LYP performed best. Thus, calculations were resumed using the CAM-B3LYP/DZP level of theory.

However, the database of known ligands contains half a million molecules; thus we need an effective solution to obtain their properties in a reasonable timeframe. Machine learning could be a solution to this problem. We were able to train a neural network, using cDFT data from only a fraction of the total database. Many parameters were varied during the optimisation process, including the number of hidden layers, batch size, activation function and the number of properties trained at a time. First, all properties were trained at once and the best setup found reached an average correlation of 58%. Next, the properties were trained separately which lead to a consistent improvement of the correlation of the properties, reaching an average correlation of 66%. Finally, properties were grouped together based on the preferred settings when trained separately, to try to combine the decrease in computational cost when training all at once and the increase in correlation when training one at a time. This lead to an average correlation of 56%. We concluded that training each of the properties separately, and tuning the parameters for each property, lead to a better performance of the model. Before the model can be used in practice, the correlation needs to be improved. This can be done by enlarging the training set, more training points should lead to a better trained model and thus more accurate predictions. Also, more research can be done into other machine learning models.

(27)

26

References

(1) Hammarström, L.; Hammes-Schiffer, S. Acc. Chem. Res. 2009, 42 (12), 1859–1860. (2) Yu, Z.; Li, F.; Sun, L. Energy Environ. Sci. 2015, 8 (3), 760–775.

(3) Youngblood, W. J.; Lee, S.-H. A.; Kobayashi, Y.; Hernandez-Pagan, E. A.; Hoertz, P. G.; Moore, T. A.; Moore, A. L.; Gust, D.; Mallouk, T. E. J. Am. Chem. Soc. 2009, 131 (3), 926–927.

(4) Belić, J.; Van Beek, B.; Menzel, J. P.; Buda, F.; Visscher, L. J. Phys. Chem. A 2020, 124 (31), 6380–6388.

(5) Rüger, R.; van Lenthe, E.; Lu, Y.; Frenzel, J.; Heine, T.; Visscher, L. J. Chem. Theory Comput.

2015, 11 (1), 157–167.

(6) Bannwarth, C.; Grimme, S. Comput. Theor. Chem. 2014, 1040–1041, 45–53. (7) Jasim, K. E. Sol. Cells-New Approaches Rev. 2015, 303–331.

(8) Bera, D.; Qian, L.; Tseng, T. K.; Holloway, P. H. Materials (Basel). 2010, 3 (4), 2260–2345. (9) Kaupp, M. Book Review: A Chemist’s Guide to Density Functional Theory. By Wolfram Koch

and Max C. Holthausen.; 2001; Vol. 40.

(10) Born, M.; Oppenheimer, R. Ann. Phys. 1927, 389 (20), 457–484. (11) Zhao, Y.; Truhlar, D. G. Theor. Chem. Acc. 2008, 120 (1–3), 215–241. (12) Slater, J. C. Phys. Rev. 1951, 81 (3), 385–390.

(13) Thomas, L. H. Math. Proc. Cambridge Philos. Soc. 1927, 23 (5), 542–548. (14) Fermi, E. Zeitschrift für Phys. 1928, 48 (1–2), 73–79.

(15) Hohenberg, P.; Kohn, W. Phys. Rev. 1964, 136 (3B), B864–B871. (16) Kohn, W.; Sham, L. J. Phys. Rev. 1965, 140 (4A), A1133–A1138. (17) Parr, R. G.; Yang, W. Annu. Rev. Phys. Chem. 1995, 46 (1), 701–728.

(18) Geerlings, P.; De Proft, F.; Langenaeker, W. Chem. Rev. 2003, 103 (5), 1793–1873. (19) Koch, E.-C. Propellants, Explos. Pyrotech. 2005, 30 (1), 5–16.

(20) Chemtools. Conceptual Density Functional Theory https://chemtools.org/sci_doc_conceptual.html. (21) Pearson, R. G. J. Chem. Sci 2005, 117 (5), 369–377. (22) Koopmans, T. Physica 1934, 1 (1), 104–113.

(23) Hoffmann, G.; Tognetti, V.; Joubert, L. J. Mol. Model. 2018, 24 (10). (24) Frau, J.; Glossman-Mitnik, D. J. Mol. Model. 2018, 24 (6), 138. (25) Pedro Domingos. Commun. ACM 2012, 55 (10), 79–88. (26) Goh, A. T. C. Artif. Intell. Eng. 1995, 9 (3), 143–151.

(27) Duvenaud, D.; Maclaurin, D.; Aguilera-Iparraguirre, J.; Gómez-Bombarelli, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R. P. Adv. Neural Inf. Process. Syst. 2015, 2015-January, 2224–2232.

(28)

27 (28) Zapata, F.; Ridder, L.; Hidding, J.; Jacob, C. R.; Infante, I.; Visscher, L. J. Chem. Inf. Model.

2019, 59 (7), 3191–3197.

(29) Van Beek, B.; Belić, J. Nlesc-Nano/CAT: CAT 0.8.7. Zenodo 2020.

(30) Gaus, M.; Cui, Q.; Elstner, M. J. Chem. Theory Comput. 2011, 7 (4), 931–948. (31) Yanai, T.; Tew, D. P.; Handy, N. C. Chem. Phys. Lett. 2004, 393 (1–3), 51–57.

(32) Stephens, P. J.; Devlin, F. J.; Chabalowski, C. F.; Frisch, M. J. Phys. Chem. 1994, 98, 46. (33) Zhao, Y.; Truhlar, D. G. J. Chem. Phys. 2006, 125 (19).

(34) Adamo, C.; Barone, V. J. Chem. Phys. 1999, 110 (13), 6158–6170.

(35) Baerends, E. J.; Gritsenko, O. V.; Van Meer, R. Phys. Chem. Chem. Phys. 2013, 15 (39), 16408– 16425.

(36) Guang-Bin Huang. IEEE Trans. Neural Networks 2003, 14 (2), 274–281. (37) NIST Chemistry Webbook.

(38) Rienstra-Kiracofe, J. C.; Tschumper, G. S.; Schaefer, H. F.; Nandi, S.; Ellison, G. B. Chem. Rev.

(29)

28

5. Acknowledgements

I would like to thank Luuk Visscher for giving me the opportunity to do this project and Bernd Ensing for agreeing to be my second corrector. Of course, I want to thank Bas van Beek and Jelena Belic for supporting me, giving me feedback and for the nice talks on Friday afternoon during my project. I would like to thank Felipe Zapata for helping me understand machine learning and for putting up with my continuous stupid questions. Next, I would like to thank Ivan Infante and Francesco Zaccaria for the many, many filtered SMILES to do my computations and for trusting me to do them well. I would like to thank everyone from the Theoretical Chemistry group for all the talks and discussions on Thursday morning and the coffee breaks on Tuesday. Finally, I would like to thank Rens, who didn’t have a direct impact on my project, but stood by me whenever I was annoyed being home all day and sometimes made me dinner.

(30)

29

A. cDFT sensitivity analysis tables

Table 3: Calculated properties of Acetic Acid - Basis set Comparison

aug-TZ2P TZ2P DZP Electronic chemical potential -0,156 -0,156 -0,158

Electronegativity 0,156 0,156 0,158

Hardness 0,152 0,153 0,154

Softness 6,567 6,522 6,496

Hyperhardness 0,099 0,099 0,100

Electrophilicity index 0,079 0,079 0,081

Dissociation energy (nucleofuge) 0,000 0,000 0,000

Dissociation energy (electrofuge) 0,311 0,311 0,316

Electrodonating power 0,246 0,245 0,251

Electro accepting power 0,091 0,090 0,093

Net Electrophilicity 0,337 0,335 0,344

Global Dual Descriptor Deltaf+ 0,762 0,759 0,756

Global Dual Descriptor Deltaf- -0,762 -0,759 -0,756

Electronic chemical potential (mu+) -0,079 -0,079 -0,081

Electronic chemical potential (mu-) -0,232 -0,232 -0,235 Table 4: Calculated Properties of Acetate - Basis set Comparison

aug-TZ2P TZ2P DZP Electronic chemical potential 0,058 0,093 0,116

Electronegativity -0,058 -0,093 -0,116

Hardness 0,062 0,116 0,137

Softness 16,241 8,635 7,302

Hyperhardness 0,030 0,084 0,104

Electrophilicity index 0,027 0,038 0,049

Dissociation energy (nucleofuge) 0,115 0,189 0,233

Dissociation energy (electrofuge) 0,000 0,002 0,002

Electrodonating power 0,029 0,036 0,048

Electro accepting power 0,086 0,129 0,164

Net Electrophilicity 0,115 0,165 0,213

Global Dual Descriptor Deltaf+ 0,973 0,931 0,926

Global Dual Descriptor Deltaf- -0,973 -0,931 -0,926

Electronic chemical potential (mu+) 0,088 0,151 0,184

(31)

30 Table 5: Calculated values of Acetic Acid - Functional Comparison. For the FDL calculations the following values were used: Ea = 0. En = 0. Ei = 0,391 Hartree. B3LYP Hybrid PBE0 Hybrid CAM-B3LYP RS M06 meta Hybrid LDA M06-L Meta GGA PBE GGA BLYP GGA FDL Calcul ated37 Electronic chemical potential -0,145 -0,150 -0,152 -0,149 -0,146 -0,140 -0,139 -0,133 -0,196 Electronegativity 0,145 0,150 0,152 0,149 0,146 0,140 0,139 0,133 0,196 Hardness 0,289 0,310 0,411 0,314 0,201 0,229 0,204 0,205 0,391 Softness 3,454 3,224 2,435 3,185 4,963 4,371 4,901 4,887 2,555 Electrophilicity index 0,036 0,036 0,028 0,036 0,053 0,043 0,047 0,043 0,049 Dissociation energy (nucleofuge) 0,036 0,041 0,082 0,043 0,008 0,017 0,010 0,012 0,049 Dissociation energy (electrofuge) 0,326 0,341 0,385 0,342 0,299 0,297 0,288 0,279 0,440 Electrodonating power 0,163 0,167 0,158 0,165 0,191 0,170 0,176 0,166 0,220 Electro accepting power 0,018 0,017 0,006 0,016 0,045 0,030 0,038 0,033 0,024 Electronic chemical potential (mu+) 0,000 0,005 0,053 0,008 -0,045 -0,026 -0,037 -0,031 0,000 Electronic chemical potential (mu-) -0,290 -0,305 -0,357 -0,306 -0,246 -0,254 -0,241 -0,235 -0,391

Table 6: Calculated values of Butyric Acid - Functional Comparison. For the FDL calculations the following values were used: Ea = 0. En = 0. Ei = 0,375 Hartree. B3LYP Hybrid PBE0 Hybrid CAM-B3LYP RS M06 meta Hybrid LDA M06-L Meta GGA PBE GGA BLYP GGA FDL Calcul ated37 Electronic chemical potential -0,142 -0,147 -0,151 -0,147 -0,144 -0,137 -0,136 -0,131 -0,187 Electronegativity 0,142 0,147 0,151 0,147 0,144 0,137 0,136 0,131 0,187 Hardness 0,282 0,302 0,399 0,306 0,195 0,223 0,198 0,199 0,375 Softness 3,551 3,315 2,506 3,269 5,130 4,479 5,051 5,031 2,668 Electrophilicity index 0,036 0,036 0,028 0,035 0,053 0,042 0,047 0,043 0,047 Dissociation energy (nucleofuge) 0,034 0,039 0,077 0,041 0,007 0,017 0,010 0,012 0,047 Dissociation energy (electrofuge) 0,319 0,334 0,379 0,335 0,294 0,291 0,282 0,273 0,422 Electrodonating power 0,161 0,165 0,157 0,163 0,190 0,166 0,174 0,164 0,211 Electro accepting power 0,018 0,017 0,006 0,016 0,046 0,030 0,038 0,033 0,023 Electronic chemical potential (mu+) -0,002 0,003 0,049 0,006 -0,046 -0,025 -0,037 -0,031 0,000 Electronic chemical potential (mu-) -0,283 -0,298 -0,350 -0,300 -0,241 -0,249 -0,235 -0,230 -0,375

(32)

31 Table 7: Calculated values of Crotonic Acid - Functional Comparison. For the FDL calculations the following values were used: Ea = 0. En = 0. Ei = 0,370 Hartree. B3LYP Hybrid PBE0 Hybrid CAM-B3LYP RS M06 meta Hybrid LDA M06-L Meta GGA PBE GGA BLYP GGA FDL Calcul ated37 Electronic chemical potential -0,173 -0,179 -0,179 -0,177 -0,172 -0,168 -0,165 -0,158 -0,185 Electronegativity 0,173 0,179 0,179 0,177 0,172 0,168 0,165 0,158 0,185 Hardness 0,230 0,247 0,340 0,251 0,148 0,171 0,150 0,152 0,370 Softness 4,352 4,042 2,943 3,985 6,744 5,864 6,654 6,596 2,700 Electrophilicity index 0,065 0,065 0,047 0,063 0,099 0,083 0,090 0,082 0,046 Dissociation energy (nucleofuge) 0,007 0,009 0,038 0,011 0,002 0,000 0,001 0,000 0,046 Dissociation energy (electrofuge) 0,352 0,368 0,396 0,366 0,345 0,337 0,330 0,316 0,417 Electrodonating power 0,230 0,235 0,205 0,230 0,294 0,261 0,272 0,253 0,208 Electro accepting power 0,058 0,056 0,026 0,052 0,122 0,093 0,108 0,095 0,023 Electronic chemical potential (mu+) -0,058 -0,056 -0,009 -0,052 -0,097 -0,083 -0,090 -0,082 0,000 Electronic chemical potential (mu-) -0,288 -0,303 -0,349 -0,303 -0,246 -0,254 -0,240 -0,234 -0,370

Table 8: Calculated values of Formic Acid - Functional Comparison. For the FDL calculations the following values were used: Ea = 0. En = 0. Ei = 0,416 Hartree. B3LYP Hybrid PBE0 Hybrid CAM-B3LYP RS M06 meta Hybrid LDA M06-L Meta GGA PBE GGA BLYP GGA FDL Calcul ated37 Electronic chemical potential -0,160 -0,165 -0,167 -0,165 -0,161 -0,156 -0,154 -0,149 -0,208 Electronegativity 0,160 0,165 0,167 0,165 0,161 0,156 0,154 0,149 0,208 Hardness 0,290 0,310 0,412 0,313 0,200 0,227 0,204 0,204 0,416 Softness 3,454 3,221 2,430 3,193 5,001 4,409 4,912 4,894 2,402 Electrophilicity index 0,044 0,044 0,034 0,043 0,065 0,054 0,058 0,054 0,052 Dissociation energy (nucleofuge) 0,029 0,034 0,073 0,035 0,004 0,011 0,006 0,007 0,052 Dissociation energy (electrofuge) 0,350 0,364 0,407 0,365 0,326 0,324 0,314 0,305 0,468 Electrodonating power 0,187 0,190 0,177 0,189 0,223 0,200 0,206 0,196 0,234 Electro accepting power 0,027 0,025 0,010 0,024 0,062 0,044 0,052 0,047 0,026 Electronic chemical potential (mu+) -0,016 -0,010 0,039 -0,008 -0,061 -0,043 -0,052 -0,047 0,000 Electronic chemical potential (mu-) -0,305 -0,320 -0,373 -0,322 -0,261 -0,270 -0,256 -0,251 -0,416

(33)

32 Table 9: Calculated values of Nitric Acid - Functional Comparison. For the FDL calculations the following values were used: Ea = 0,021. En = 0. Ei = 0,440 Hartree. B3LYP Hybrid PBE0 Hybrid CAM-B3LYP RS M06 meta Hybrid LDA M06-L Meta GGA PBE GGA BLYP GGA FDL Calcul ated37, 38 Electronic chemical potential -0,207 -0,213 -0,215 -0,213 -0,201 -0,202 -0,194 -0,189 -0,209 Electronegativity 0,207 0,213 0,215 0,213 0,201 0,202 0,194 0,189 0,209 Hardness 0,265 0,287 0,387 0,288 0,175 0,199 0,178 0,177 0,460 Softness 3,780 3,480 2,581 3,471 5,721 5,020 5,625 5,649 2,172 Electrophilicity index 0,081 0,079 0,060 0,079 0,116 0,103 0,106 0,101 0,048 Dissociation energy (nucleofuge) 0,006 0,010 0,038 0,010 0,002 0,000 0,001 0,000 0,069 Dissociation energy (electrofuge) 0,420 0,435 0,469 0,436 0,404 0,404 0,388 0,378 0,487 Electrodonating power 0,282 0,282 0,252 0,282 0,343 0,319 0,319 0,307 0,229 Electro accepting power 0,075 0,069 0,036 0,069 0,142 0,117 0,125 0,118 0,019 Electronic chemical potential (mu+) -0,075 -0,069 -0,022 -0,069 -0,114 -0,103 -0,105 -0,100 0,021 Electronic chemical potential (mu-) -0,339 -0,356 -0,409 -0,357 -0,288 -0,302 -0,283 -0,277 -0,440

Table 10: Calculated values of Cytosine - Functional Comparison. For the FDL calculations the following values were used: Ea = 0,003. En = 0. Ei = 0,316 Hartree. B3LYP Hybrid PBE0 Hybrid CAM-B3LYP RS M06 meta Hybrid LDA M06-L Meta GGA PBE GGA BLYP GGA FDL Calcul ated37, 38 Electronic chemical potential -0,146 -0,152 -0,150 -0,152 -0,154 -0,148 -0,147 -0,139 -0,156 Electronegativity 0,146 0,152 0,150 0,152 0,154 0,148 0,147 0,139 0,156 Hardness 0,185 0,200 0,285 0,204 0,128 0,141 0,129 0,129 0,319 Softness 5,403 5,009 3,511 4,899 7,813 7,070 7,747 7,773 3,133 Electrophilicity index 0,057 0,058 0,040 0,057 0,093 0,078 0,084 0,075 0,038 Dissociation energy (nucleofuge) 0,004 0,006 0,032 0,007 0,003 0,000 0,001 0,000 0,041 Dissociation energy (electrofuge) 0,295 0,309 0,332 0,310 0,311 0,297 0,295 0,278 0,354 Electrodonating power 0,199 0,204 0,172 0,202 0,271 0,238 0,249 0,228 0,175 Electro accepting power 0,053 0,052 0,022 0,050 0,117 0,090 0,102 0,089 0,018 Electronic chemical potential (mu+) -0,053 -0,052 -0,008 -0,050 -0,090 -0,078 -0,082 -0,075 0,003 Electronic chemical potential (mu-) -0,238 -0,252 -0,293 -0,254 -0,218 -0,219 -0,212 -0,203 -0,316

Machine learning towards prediction of conceptual Density Functional Theory properties

Master Thesis Chemistry