The Potential of Artificial Intelligence in Analytical Sciences

(1)

Literature Thesis

The Potential of Artificial Intelligence in Analytical Sciences

- Contemporary applications and a future outlook -

By

Chris Lukken

October 2019

Student number Responsible Teacher

11019433 Prof. dr. P.J. Schoenmakers

Research Institute Daily Supervisor Van ’t Hoff Institute for Molecular Sciences drs. P. Breuer

Research Group Second Reviewer

(2)

1

Abstract

The increasing amount of data with measurements has become a problem in analytical chemistry, as data can no longer be fully analyzed. As such, new methods are required in order to extrapolate the useful data. This review has focused on the potential of such a method, namely artificial intelligence, in the field of analytical chemistry. First, an overview of artificial systems in general was given. Then, different processing and optimization algorithms were explained. From there, different application areas were considered for utilization of artificial systems. For structure elucidation, expert systems are able to predict possible structures with nuclear magnetic resonance (NMR) or mass-spectrometry (MS) data in somewhat general applications and the incorporation of database-searching with this knowledge-based approach proved effective. Neural networks are also applied with similar accuracies. For method optimization of specifically chromatography methods, model- or statistical-based approaches are currently more effective in providing method parameters that result in good separations, both with isocratic and gradient-elution methods. The increased requirements of artificial systems result in a preference for statistical approaches. For classification and prediction, expert systems, decision-tree models, neural networks, and support vector machines all show an improved performance over conventional methods with varying accuracies, depending on the specific application. With complex many-variable relations, artificial neural networks and support vector machines are preferred. However, with simple linear relations, the increased requirements of artificial systems compared to statistical approaches do not facilitate utilization of these systems and as such, statistical methods are preferred in this case. Finally, artificial systems can be utilized to improve the classification accuracy of spectral imaging. All in all, artificial intelligence shows great potential in the field of analytical sciences in several applications when sufficient data is present.

(3)

2

1 Introduction

In the field of Analytical Sciences the analysis time of single samples from insertion to detection has decreased to at most a few hours in most cases, while the data analysis of the results has increased exponentially to sometimes months for a single dataset [1]. One of the main reasons for this change is the development of multi-dimensional analytical methods, such as Liquid Chromatography coupled to tandem MS (LC-MS/MS), which show an increasing number of variables measured, while also measuring an increasing number of samples, sometimes fully automated. This not only increases the amount of data, but also the complexity, making it more difficult for the analytical chemist to interpret the results completely. This emergence of “big data” has become an ever-increasing problem in the field of Analytical Sciences and requires the solution of data science.

Data science is a field of science used to optimally obtain information from chemical systems in order to gain new insights [1–4]. By employing several different techniques, such as chemometrics or general statistics, redundant information can be left out, leaving the chemist with only the most important data, see Figure 1. Due to this fact, Chemometric techniques have become a common occurrence for multiple applications, such as method development, imaging, and general classification of data. Furthermore, chemometric techniques are even being applied in court cases with forensic evidence, where statistics are used to describe the strength of the evidence found [5]. However, this extraction of relevant information requires knowledge on both the analytical methods, as well as the chemometric methods that can be used, as data science often employs computer algorithms that require specific parameters. Choosing the wrong method or setting these parameters sub-optimally may cause loss of relevant data, inclusion of irrelevant data, or generate “nonsense data”. Furthermore, discarding the irrelevant data is often based on certain assumptions, which may lead to some loss of relevant data, regardless of choosing the optimal parameters. Also, there are cases where the utilization of statistical techniques does not provide enough selectivity or sensitivity to allow for validation. A different type of technique that can be applied for chemometrics and may solve these issues is the utilization of “artificial intelligence”.

(5)

4 Artificial intelligence, AI for short, are programs that are capable of independently making decisions based on how the program was designed. For example, an AI program may be trained to choose the correct algorithm and parameters of the aforementioned computer algorithms. By automating this procedure, it is possible to expedite the overall data-analysis greatly with reduced risk of incorporating noise or other unwanted data, as a properly trained artificial system may be able to detect noise better than general statistical methods. Furthermore, AI may be integrated into other techniques, such as imaging, to possibly improve accuracy, see Figure 2. However, AI does have shortcomings, as the creation and ‘training’ of an artificial program may take significant amounts of data, as otherwise the accuracy of artificial systems is limited. Furthermore, while some fields of chemistry have started studying the utilization of AI in their respective field, Analytical Sciences seems to have fallen behind, even though Artificial Intelligence in chemistry was highly popular in the 1960s, with the DENDRAL project [6]. The DENDRAL project was an AI system designed for a specific task in science, to aid in elucidating organic molecules using mass spectra and a general knowledge of chemical bonds [7,8]. It was considered to be the first artificial system capable of making decisions specifically designed for chemistry. The full project lasted for several decades and showed as early as the 1960s that AI might have a future in Analytical chemistry.

Figure 2: A general overview of Mass Spectrometry Imaging [9].

Despite the popularity of AI in the 1960s to 1980s, development became scarce in the 1990s. However, partially due to the success of artificial neural network systems in modern day society, artificial intelligence seems to have become a hype once again [10]. However, even though the popularity of AI has increased for some time, there has not been a clear summary on the possible applications and success of AI in analytical sciences, specifically compared to other data analysis techniques, such as statistics. Therefore, this thesis aims to study the potential of artificial intelligence in analytical sciences. To do this, a general overview of artificial Intelligence and several subfields of AI will be examined, followed by historical and present-day examples of the utilization of artificial intelligence in analytical sciences. These applications will also be compared to more well-known data-analysis methods, such as statistics. Finally, a conclusion will be given on the current and future potential of artificial intelligence in analytical sciences.

(6)

5

2 Introduction to artificial intelligence

Artificial intelligence is a specific area in computer science, in which the goal is to develop a machine that would behave and make decisions as if it was human [11,12]. There are many different applications in which AI could be utilized to either support or independently tackle problems, such as prediction, facial recognition, and imaging. A recent example that can be found in society is the use of driverless cars, which makes use of AI machines to analyze features on the road and predict and perform what to do next.

2.1 The history of AI

While driverless cars are very recent uses of AI, the foundations of the field originated from the 1930s, when Alan Turing questioned whether machines were capable of thinking, see Figure 3 [13]. This led to him devising his famous Turing Test, a series of questions asked to an unknown responder, be it computer or human, with the task of the questioner to determine what the responder is. In that time, the only form of AI utilized so-called first-order logic, which converts sentences into simple variables. Later on, in the 1940s, artificial neural network models were designed based on neuroscience, which became the most well-known system of AI. These models, however, were limited by the primitive computer powers of the early 1950s and were not developed fully until the 1980s. Yet, The pivotal point for AI did not occur until the Dartmouth-conference, now known as “the birth of AI”, in 1958. During that conference the term artificial intelligence was coined at the same time with the introduction of Lisp, a programming language developed to write first-order logic AI. From that point on, multiple new methods of reasoning for AI have been designed and are being designed today. Since 2011, “machine learning” has made a breakthrough with “neural networks”, or “deep learning”, and has since become one of the dominant ways AI is utilized today [12].

Figure 3: The history of AI from its beginning to the early 2000s with the width of the bars indicating the pervasiveness of each technique [13].

(7)

6 Since the beginning of the field of AI, many methods have been introduced and developed using different means of input, processing, and output [13]. This chapter aims to explain several points on how AI operates in general and how specific methods of AI, such as deep learning, function. Then, pros and cons of these methods will be compared and finally, these methods will be compared to other techniques that analytical chemistry employs to see if AI can provide a meaningful difference. Although not all types of artificial systems will be explained as this would not fit within the scope of this thesis, the most utilized systems will be examined.

2.2 Artificial Intelligence in general

2.2.1 Basic Artificial Agents

Although every type of AI is significantly different from each other, all artificial systems end with an agent [14]. An agent is any device or software that perceives its environment and acts upon it, using sensors to acquire the data and actuators to react to the data [14]. Any behavior between acquiring the data and reacting is performed by the agent function. This function contains all processing of data and whether to react to the data or not, but may also contain previous perceptions or possible learning elements that greatly influence how the agent will respond. Then, the agent program implements the agent function. Here, the agent program can be seen as the artificially intelligent component and this agent program is designed by the artificial intelligence. As an example, driverless cars can be seen as agents that use GPS, speedometers, cameras, and many other sensors to acquire data on the environment, in this case the roads and cars. Next, the agent program processes this data using the agent function and chooses how to respond by activating or deactivating its actuators, namely the accelerator, brake, signal lights, steering wheel, et cetera.

There are four basic kinds of agents that employ fundamental elements in almost all artificial intelligence systems [14]. These are:

• Simple reflex agents; • Model-based reflex agents; • Goal-based agents;

• Utility-based agents.

Simple reflex agents contain only basic actions based on what is currently perceived and respond directly to those perceptions using “condition-action rules”, or if-then rules, see Figure 4. In code, this would be written as a simple if-then function. Model-based reflex agents go one step further by memorizing previous observations and basing its actions not on what is perceived, but on how the perception has changed between observations. In the case of the driverless car, this could be the difference between a car in front having his rear lights on or a car in front suddenly braking. Next, the car requires information on his destination. Goal-based agents contain “goal information”, final situations that the agent strives for. These agents base their actions on reaching this goal. An advanced goal-based agent is the utility-based agent. With utility-based agents, instead of a goal, a score is assigned to any expected outcome and the agent will strive toward achieving the highest score. When there are multiple ways to reach one goal, a goal-based agent may perform randomly, while a utility-based system will choose the way that achieves the highest possible score. Also, when there are multiple goals that may conflict with one another, a utility-based agent is able to choose a tradeoff between either outcome.

(8)

7

Figure 4: A schematic of a simple reflex agent [14].

2.2.2 Artificial Learning agents

The four basic agents show how most intelligent agents are able to respond to their environment. However, they are unable to adapt their rules and preferred actions without additional elements. A learning agent contains additional elements beside its standard rules that allow the agent to change based on what is perceived, see Figure 5 [14]. This form of artificial intelligence is widely known as machine learning (ML). It consists of four elements; a critic, a learning element, a performance element and a problem generator. Data from the sensor gets sent to both the performance generator as well as the critic. The performance generator can be seen as the entire basic agent; it takes in the data and decides what actions to take. The critic compares the data to an optimal performance standard and sends feedback to the learning element based on how well the agent has performed. The learning element then makes adjustments to the performance element in hopes of improving the performance. Finally, the last component, the problem generator, suggests actions to the performance element that will ultimately lead to the acquisition of new data from which the agent can “learn”. Ultimately, by designing each of the four elements in a specific way, an agent is created that will slowly learn to adapt and behave in order to obtain the optimal performance. To create the first iteration of the desired learning agent, algorithms are being given labeled data with outcomes that are considered correct, or true. These algorithms will attempt to create the learning agents in such a way that the correct outcome is obtained most of the time. These learning agents are the most essential systems when discussing applications of artificial intelligence.

(9)

8

2.3 Information processing methods of artificial intelligence

Although almost all learning agents are comprised of this general schematic, the application of different agents is highly different depending on how the learning or performance element is designed. Some examples that will be shown in section 3.1.1 will not contain any learning elements at all, but will be based on highly advanced goal or utility agents instead. Although there are many methods and algorithms of artificial intelligence besides the ones mentioned below, explaining them all is outside the scope of this thesis. Instead, this chapter focusses on the algorithms that have been utilized the most in general applications. One of the most influential differences in design that allows for specific applications is how input is processed. One example of processing is to convert all input into propositional logic and deriving rules from this input, a form of artificial intelligence known as expert systems [15,16].

2.3.1 Expert systems

Expert systems are types of artificial intelligence often used to obtain new relations between data or to simulate expert decision-making [17]. They often incorporate vast amounts of knowledge input by experts that the algorithms use in combination with inference in order to obtain the desired results. As such, they are often based on rule-type systems or logic-type systems and the most utilized types of expert systems are propositional logic systems and first-order logic systems.

2.3.1.1 Propositional logic

Propositional logic is based around Boolean functions that assigns truth-values of either zero or one (false or true) to any variable, based on the input given [15]. This input can be any event that can be considered true or false, such as: “It is raining today”, or “if it rains, people tend to drive more”. Once sufficient variables are given, they can be connected through connectives [18]. Some examples of connectives for “rule a with propositions p and q” are:

• Conjunction (if p AND q, then a is true), denoted with the symbol ꓥ [16]; • Alternative (if p OR q, then a is true), denoted with the symbol ꓦ; • Implication (IF p THEN q), denoted with the symbol כ ;

• Negation (if a, NOT p), denoted with the symbol ~; • Equivalence (p ONLY IF q), denoted with the symbol ≡.

Once all input is stored, artificial intelligence systems can be applied to find new connectives between all variables by using inferences, such as deduction, abduction, or induction [15]. For example, the two statements mentioned above can be combined into: “People tend to drive more today”, through deduction. These inferences will be explained more thoroughly in section 2.3.1.2.

2.3.1.2 First-order logic

With first-order logic, also known as predicate logic, input is given in the form of simple sentences that contain statements, similar to propositional logic [16,18,19]. However, unlike propositional logic, these statements may contain simple declarations, such as: “Copper is a metal”, but also more complex numerical statements with multiple variables, such as: “Copper has an atomic line at 324.8 nm”. These statements then get converted into predicates and variables: “atomic line(copper, 324.8 nm) and metal(copper)”. Generally, the statements are stored as: “predicate(Attribute 1,Attribute 2, …)” [18]. Finally, when a predicate contains attributes that may be contained within other predicates, functions are created that contain these predicates and their relations.

Once the information is stored, the artificial agent attempts to find connections between the functions and predicates through inferences of deduction, abduction and induction [18]. While deduction is considered a legal inference, meaning the conclusions drawn are true, abduction and induction are not considered legal inference, as they often do not contain the full range of all examples possible. An example of a deduction is:

(10)

9 “X is a metal:

IF the density is above value Y

AND the electrical conductivity is high AND has metallic bonds”

To allow the system to understand this deduction, it is converted into logical symbols. In this case, the deduction is converted into:

“((Density > Y) ꓥ Conductivity(high) ꓥ metallic bonds(true)) כ X(metal)”

With deductions, all propositions are considered true. Abduction, however, provides conclusions that may not be true in every case. For example:

“Given: X has a high electrical conductivity And: All metals have a high conductivity Inference: “X is a metal”

This abduction is untrue in the case of conducting materials beside metals, but these inferences are useful nonetheless. In the case of a medical diagnosis, symptoms may be given as input and the most likely (but not always correct) diagnosis may be given as an abduction.

Finally, inductive inferences contain possible new rules that can be utilized to obtain new relations. As such, these inferences are most often used by learning agents in order to obtain new insights on possible connections [18]. An example of inductive inference is:

“Given: Metallic copper is solid And: Metallic iron is solid Inference: All metals are solid”

Even though this relation is untrue for warm gallium or mercury, other examples of inductive inference might provide new insights and offer predictions that were previously unknown. These new relations are designed and utilized by artificial systems and expert systems to predict outcomes, search databases through data-mining, and solve problems at an expert level. Some examples will be shown in section 3.1.1.

2.3.2 Probabilistic reasoning

While expert systems work with input that is considered numeric, probabilistic reasoning focusses on probability and has been entirely centered around Bayesian networks since the 1980s, see Figure 3 [13,20]. Bayesian networks apply Bayes’ theorem, in which the probability of an event or outcome is dependent on other information that may be related to the event. An example would be to guess an unknown person’s gender. Normally, there would only be a 50 percent certainty of knowing the person is either male or female. Then, it becomes known the person is a smoker and a previous survey shows 70 percent of males are smokers while only 20 percent of females smoke. Now, it becomes more likely that the person is male.

In general, Bayes’ theorem is mathematically written as [20,21]: 𝑃(𝐴|𝐵) = 𝑃(𝐵|𝐴) 𝑃(𝐴)

𝑃(𝐵) (1)

Where P(A|B) is the event in question: The likelihood of A occurring, when B is considered to be true. P(B|A) is the odds of event B occurring when A is true, and P(B) and P(A) are the odds of independently observing A or B. If the previous example is given as input, the odds of the random smoker being a male becomes 78 percent instead of 50 percent. This theorem is being applied in many fields, one of which is the Dutch Forensic Institute, where likelihood ratios are calculated based on forensic evidence in combination with events that have occurred in the criminal cases [22]. Bayesian networks apply this theorem in combination with logical connectives, similar to first-order logic, in order to obtain the most accurate certainty of particular events occurring. Because of this, Bayesian networks are usually applied as modeling tools when uncertainty is present [23].

(11)

10 Bayesian networks are visually represented in joint probability distributions, in which the highest events are considered independent, while all lower events are considered dependent on the events above it, see Figure 6 [21]. One characteristic of these networks is that the distribution is acyclic and that events cannot be dependent on events lower in the distribution. The networks can be determined either by hand or by an automated data system, in which the nodes and relations are created manually and the conditional probabilities are defined by the system. These probabilities can be either inferred from data or from randomized numbers [23].

Figure 6: A digraph of a Bayesian Network [21].

Once a distribution is produced, probabilistic inferences can be made in the same way that logical inferences are made with first-order logic. However, with probabilistic inferences, instead of discovering new relations, probabilistic inferences compute the probabilities of unobserved events, given the observed events [23]. For example, looking at Figure 6, V1 and V2 can be observed events of symptoms of a patient, while V3 may be a possible disease, which cannot be observed directly. While these examples are relatively small, Bayesian networks produced entirely by artificial systems can encompass entire databases of information and attempt to find any possible relations between any two variables [23].

2.3.3 Decision tree learning

Decision tree learning is another type of machine learning that, as the name implies, focusses on decision-making or prediction, based on the data given. As with first-order logic and probabilistic reasoning, it is comprised of conjunctions and is capable of using inductive inference in order to gain insights on the problems it is applied to [24]. Today, these algorithms are being applied in several essential fields with a high accuracy, such as medical case diagnoses or credit risk assessment of loan applicants. Figure 7 shows an example of a decision tree whether a person should play a tennis match [24]. The decision is classified by sorting through the tree from outlook toward the final leaf nodes that contain the results, in this case, yes or no. As with the other logic systems, the tree is processed by the system as:

“return Yes if: Outlook(Overcast) ꓦ

Outlook(Sunny) ꓥ Humidity(Normal) ꓦ Outlook(Rain) ꓥ Wind(Weak)

Else return No”

Even though this example returns Boolean values of 1 (yes) or 0 (no), in general, decision tree algorithms are capable of learning functions that output more than two values [24].

(12)

11

Figure 7: An example of a decision tree on whether a person should go to a tennis match [24].

Multiple algorithms exist that are utilized to build decision trees using only the raw data and some training data given [25]. As raw data, all variables and their possibilities are given. Looking at the previous example, the input would be: “Outlook{Overcast, Sunny, Rain}, Humidity{High, Normal}, Wind{Strong, Weak}”. Next, training data is given, in the form of example outputs given randomized variables, see Table 1. From that data, different algorithms apply different strategies in order to obtain the most optimal decision tree, which can then be utilized on new data [13,25].

Table 1: An example of training data for a decision tree algorithm [26].

A variant of tree learning is called the random forest (RF) algorithm, which employs decision-trees at its core [27]. Instead of designing one decision-tree with the system, the AI constructs a multitude of decision-trees with each of them having slightly altered “importance values” assigned to each variable. Then, instead of relying on one output value, all values are averaged and in the case of a yes or no result, a majority vote takes place and the most value output the most is taken.

Day Outlook Humidity Wind Play

Day 1 Sunny High Low No

Day 2 Sunny High High No

Day 3 Overcast High Low Yes

Day 4 Rain High Low Yes

Day 5 Rain Normal Low Yes

Day 6 Rain Normal High No

Day 7 Overcast Normal High Yes

Day 8 Sunny High Low No

Day 9 Sunny Normal Low Yes

Day 10 Rain Normal Low Yes

Day 11 Sunny Normal High Yes

Day 12 Overcast High High Yes

Day 13 Overcast Normal Low Yes

(13)

12

2.3.4 Fuzzy logic

Fuzzy logic, as the name suggest, is another type of machine learning artificial intelligence based on logic. However, unlike first-order logic or propositional logic, fuzzy logic systems are designed to deal with any input given, namely with so called linguistic variables [28]. Linguistic variables are characterized by imprecise numeric information, such as cheap, fast, or near zero [29]. With fuzzy logic, input is instead assigned a score from one to zero based on close the value is to the linguistic variable, see Figure 8. Once the meaning of the linguistic variable is determined, logistic inferences, such as deduction or induction, can be applied, just as with other logical systems [28]. Due to the nature of fuzzy logic machines, they are often incorporated into other types of machine learning, such as artificial neural networks, as pre-processing methods in order to allow the second artificial system to be able to interpret the data [30].

Figure 8: A standardized triangular fuzzy variable with the input “near a” [29]. The values of α and β depicting the limits of the variable. Outside the values of a+α and a+β the fuzzy number is considered 0.

2.3.5 Artificial Neural Networks

Another specific type of machine learning artificial intelligence that will be discussed in this thesis is the Artificial Neural Network (ANN), also known as deep learning when unsupervised data is used. ANNs take inspiration from the synapses in the human brain and consists of layers of nodes, connected by artificial synapses, see Figure 9 [31]. A general neural network consists of three types of layers:

• An input layer that contains all initial data;

• One or multiple hidden layers that perform all the computation;

• An output layer that shows results based on the input given by the last layer of hidden nodes. Generally, every node in a hidden layer obtains the value of every input, multiplied by a “weight” value of the synapse that connect the two nodes and an optional bias. These multiplied values are then summed up, see Figure 10 [32]. Furthermore, both the input as well as the output can consist of different types of data, such as single numbers, vectors, or even matrices, depending on the application [33]. Once the value is calculated, the hidden node inputs this value into what is called an activation function that defines whether the node should be “activated”, based on the summation value. Multiple types of activation functions exist and these functions are one of the main variables in manually designing the composition of the network. Activation functions use the weighted sum of the hidden node as input for a mathematical function. For example, a step function, shown in Figure 10, would output 1 of the weighted sum is above a certain threshold and output 0 if not. More complex functions, such as linear or sigmoid functions, are able to output any value between 0 and 1 based on how the function is implemented, even allowing for a probability function. Because of this benefit, sigmoid functions are one of the more utilized activation functions. As the system is trained, these activation functions are altered in such a way that specific neurons are activated with particular types of input, allowing for classification and prediction. Finally, the results of these functions are once again multiplied by a weight and summed together into the final output nodes (if only one hidden layer exists) as the final results. The entire functioning of a neural network from input to output is called forward propagation.

(14)

13

Figure 9: An example of an Artificial Neural Network, consisting of a layer of three input nodes, a layer of four hidden nodes, and a layer of three output nodes [31].

Figure 10: An example of how a single hidden node processes its input [34]. The circle containing the summation mark is the hidden node, while the circles containing x are the input nodes. The node containing 1 can be considered a bias value.

In essence, once the network has been fully trained, the hidden nodes explain the relations between certain features in the data and the expected output. Because of this, weights of certain hidden nodes can also be negative toward certain outputs. For example, if a neural network for facial recognition has a hidden node that is related to features found only in males, then the synapse from that node toward the “male output” would have a positive weight, while the synapse from that node toward the “female output” would have a negative weight. This in turn decreases the chance of the “female feature” hidden nodes to activate. The final result is an increased chance of the system predicting the picture to be of a male.

While the composition of the neural network is created manually, the weights of the synapses and the functions after the hidden nodes are altered as the system learns over time [35]. This alteration can be done in multiple ways, but the most utilized methods are backward propagation, and genetic alteration. As the name suggests, backward propagation is the opposite of forward propagation and refers to the training of the neural network from output to input. With this algorithm, the error between the predicted output and the true solution is calculated first in what is known as a cost function. The cost function can be simply described as the sum of all errors squared, see equation (2). In this equation, C is the error for a single data input, n is the total amount of output nodes, ŷ is the predicted output and y is the true output. With for example facial recognition neural networks, you can create a composition with two output nodes, one for a “male output” and one for a “female output” Next, the system is given input with a label, such as “male”. If in this case, the output turns out to be 0.88 for male and 0.45 for female, the total error for this input would be (0.88-1)2₊ (0.45-0)2_{, as 1 would be considered true and 0 would be considered false. This calculation is then repeated} for all of your training data in order to calculate the total “cost” of the system.

𝐶 = ∑𝑛 (ŷ(𝑚) − 𝑦(𝑚))2

(15)

14 As explained earlier, these values of 0.88 and 0.45 are calculated using the sum of the activation function results multiplied by the weights of the previous layer of nodes. By derivatizing the cost of one training input over all individual components in the previous layer, you can calculate how much each component of the previous layer contributes to the results and change them to reduce the cost. For example, functions and weights of the previous layer that contribute a highly positively number to 0.88 while contributing a highly negative number to 0.45 would be given increased influence, while functions and weights that contribute to reduce 0.88 or increase 0.45 would be given decreased influence. Next, this derivatization step is repeated for the single last layer, in order to determine the contributions of the layer before that one. This is repeated until the layer before the first layer is reached, as the input data cannot be altered beforehand. This process ultimately leads to an array of negative and positive numbers that correspond to the translation that each individual weight and function of all layers are to undergo in order to achieve the desired output of 1,0. However, this is only for a single training example. To obtain the translations for the system to be able to improve its prediction on the entire dataset, the process of obtaining this array of numbers is repeated for all training examples and all translation values for individual weights and functions are averaged. The final result is an array of averaged translations for the weights and functions, in order for the system to be able to predict all training examples somewhat better. The specific process of determining component contributions from output to input is called backwards propagation and the process of averaging translations to the system in order to incrementally improve the characterization is called gradient descent.

Nowadays, a second algorithm besides back-propagation is also being used more frequently, namely genetic algorithms [36]. With a genetic algorithm, the number of synapses and nodes stays the same, but multiple copies with different weights and functions are generated at the same time. This is called a population of one network. This population is then all tested using the same data and the best performing networks with the lowest scores are selected and copied multiple times with slight mutations, such that the weights of the selected networks are altered. This new generation is tested in the same way as the first and once again, the best networks are selected. This process is repeated until a proper result is achieved with one of the networks. This algorithm can also be applied to different networks with altering amounts of layers and nodes within each layer, for even more optimization, although this often takes more time as well. Because both learning algorithms focus on creating systems that perform exceptionally on generating the exact output, these networks are often used in prediction.

2.3.6 Support Vector Machine

Support vector machine (SVM) is a machine-learning algorithm that is often used in classification, as the objective of the system is to find a plane in a dimensional space with an axis for every variable, such that all data points are distinctly classified, see Figure 11 [37]. These systems are given labeled data points with every measured variable and then attempt to generate a new plane such that the margins between two (or more) groups is highest. These variables can include almost any measurable parameter, such as mass spectrometric or ultraviolet data, which the machine specifically learns from to determine differences between groups in a multi-dimensional space.

(16)

15

Figure 11: An example of a plane (in this case line) generated by the SVM algorithm such that the margins between the circles and crosses are highest .

This separation is somewhat simple when there are two distinct groups that allows for a single linear line to cross between. However, more complex data requires data transformation, which is performed by SVM kernels. SVM kernels attempt to find new planes that contain the original axes, such as a new plane z = x2_{+ y}2_{, or, in the case of Figure 12, a new plane y = x}2 _{[38]. If a linear line is not sufficient in} correctly classifying regardless of any transformations, then a different SVM parameter can be adjusted, the regularization factor (or C factor) [39]. With a high C factor, the algorithm will allow for smaller-margin hyperplanes, while a low C value will cause larger-margin hyperplanes to be generated, even if misclassification starts occurring, see Figure 13. This causes a trade-off between choosing a C value that tends to misclassify, while generating the system much more swiftly, while choosing a high C value increases the time to generate the model while risking overfitting the data. In addition, low C values tend to allow misclassified data points, or outliers, to be ignored. The final parameter, Gamma, determines what data points have influence for the classification line that SVM generates. Looking at Figure 13, a low Gamma value would only calculate the margin of the data points nearest to the line, while a high gamma value would consider all margins. While this parameter is insignificant for the sample dataset in Figure 13, highly complex data with many variables often require kernel transformations into higher dimensions in order to allow a single line for classification. In this case, the Gamma value controls the how much of the spread in data is accounted for with lower gamma values accounting for almost no variance, while higher gamma values account for a high variance in data.

(17)

16

Figure 13: A example of an SVM optimization using a low C value compared to a high C value [40].

Furthermore, different SVM algorithms are capable of adapting the data in such a way that the differences become clearer and a single axis becomes sufficient for classification, see Figure 14 c [38]. Although the procedure of SVM limits the application range of the systems to classification and prediction, examples in section 3.3.4. will show that SVM can become a powerful tool in analytical chemistry. Furthermore, these examples also show the possibility for SVM to be combined with other tools such as chemometrics in order to increase accuracy.

Figure 14: An example of an SVM algorithm defining a plane through a dataset containing two gene expressions and subsequent conversion of the dataset into a single axis for simplified analysis[38].

(18)

17

3 Applications of Artificial Intelligence in Analytical Chemistry

This chapter will describe multiple applications of artificial intelligence that have already been utilized in analytical chemistry, starting with structure elucidation. Next, experimental optimization will be discussed. Finally, classification and prediction will be discussed together with imaging.

3.1 Structure Elucidation

3.1.1 Expert systems

One of the oldest examples of artificial intelligence being used in analytical chemistry is the development and utilization of the DENDRAL project as a knowledge-based expert system, designed for the purpose of structure elucidation [41]. In general, these artificial systems are called Computer-Aided Structure Elucidators (CASE). The project consisted of multiple programs and spanned over several decades [7,42]. The name itself stands for Dendritic Algorithm, the algorithm that stands at the center of the project. The algorithm is based on first-order logic and contains vast amounts of chemical knowledge that the program utilizes to determine the set of possible arrangements when Mass Spectrum (MS) abundances are given as input. This set of possible compounds is then used as input for the heuristic DENDRAL program, which combines the programs knowledge of chemistry with mass spectra of the unknown compound to determine the most likely compound. Furthermore, it allows for input of other experimental data from the unknown compound to increase the accuracy of the method. A second program within the project, named meta-DENDRAL, is given the solved mass spectra/structures combinations as input and further utilizes first-order logic with inference to obtain new relations and insights that the heuristic DENDRAL program can use to improve itself. When considering a learning agent from section 2.2.2, the heuristic program can be considered as the performance element, while the meta program can be considered as the learning element.

The DENDRAL project showed as early as the 1960s how, over many learning iterations, an expert system can become a useful asset for an analytical chemist, even capable of possibly finding new relations that experts had not considered. The project did, however, contain shortcomings [7,41–43]. Most noticeably, the system was only designed for structures that contained fewer than 100 atoms, as computer processing power was limited at that time [7]. The usage of heuristics in combination with first-order logic also imposed that numerous assumptions were required in order to properly filter and process the data, which could lead to a loss of accuracy when larger molecules are considered, regardless of the processing power. Furthermore, the project required over a decade of research in order to become a viable elucidation strategy, which is a duration that is often not available for general research in analytical chemistry. However, in the end the system was able to resolve unknown structures without the use of a database comparison, illustrating the potential expert systems may have in aiding analytical chemists for structure elucidation and automated hypothesis formation [7].

After the DENDRAL project, more artificial systems were developed for mass spectrometry structure elucidation, either based on rule-based fragmentation models, like the DENDRAL project, or based on combinatorial fragmentation models [44]. While the rule-based fragmentation models work similarly to the DENDRAL project and employ knowledge with the data to generate fragment possibilities, combinatorial models attempt to generate so called fragmentation trees from MS/MS spectra, see Figure 15. With these models, important peaks are first sorted on their relative intensities compared to the base peak. Then, the model attempts to generate the graduation in fragments in groups and finally, the software will calculate the molecular formula of every peak in order to obtain the final formula. Of both types of systems, multiple models have been developed and are being utilized today. A study by Böcker et al. from 2017 Describes such a rule-based expert system that collects fragmentation reactions from literature, named Mass Frontier [45]. From these reactions and the

(19)

18 knowledge built into its logic system, the artificial machine teaches itself fragmentation rules. These rules are then applied to predict fragmentations of unknown compounds that are used as input for the system [46]. So far, the system supports both electron ionization as well as collision induced dissociation and has been commercially available for some time. Furthermore, the system is capable of searching its internal database of reactions for direct comparison. However, the author does not mention the accuracy of the method when new compounds are measured, making it unclear how well the system can perform. Nonetheless, the mass frontier software has continually been updated and is currently one of the most utilized commercial software packages for rule-based fragmentation [47]. An example of a fragmentation-tree model, named Multistage Elemental Formula (MEF), is described by Rojas-Chertó et al. and utilizes multi-stage mass spectrometry (MSn_{) to assign compositions of} unknown compounds, their fragment ions, and all neutral losses [48]. The system was applied to real samples of several metabolites and correctly generated the composition of the parent ion, its fragment ions, and all neutral losses, although the system does show the need for an exponential increase in accuracy when the mass to charge ratio of the compounds increase. As such, the system seems incapable of reliably resolving compounds with mass to charge ratios over 500.

Figure 15: A schematic illustrating the steps utilized to automatically generate a fragmentation tree from MS/MS spectra

[44].

Besides artificial intelligence, other strategies were also developed to identify compounds by mass spectrometry, specifically MS databases [44]. Many databases have been developed, depending on the ionization techniques and, for MS/MS spectra, the collision energies used and these databases are the primary comparison technique being used today to identify unknown compounds by MS. It is important to note that while some of these databases contain search algorithms, they are often not considered artificial intelligence unless they contain features utilizing AI. However, there has been a recent development to replace spectral libraries with molecular structure databases that combine either rule-based systems or other machine learning systems with database searching [44]. One study by Dührkop et al. from 2015 describes the combination of fragment tree models with such a molecular structure database named PubChem [49]. In this study, the machine was taught how to generate fragmentation trees by processing compounds in a molecular structure library with known references. Once the machine had learned the method, unknown compounds could be given as input and the model would compute the fragment tree of that compound and utilize it to compute a “fingerprint” that classifies certain characteristics of the compound based on the tree generated. These characteristics can range from aromatic groups to alcohols or acids being present. The fragment tree is then compared with the molecular structure database in order to obtain plausible structures and the system then finally converts the plausible structures into similar fingerprints for comparison, with the least amount of difference in fingerprints scoring highest.

(20)

19 While this study shows considerable improvements over other elucidation models with similar strategies, the model itself is still in its infancy. When using the PubChem database, only a roughly 35 percent accuracy was achieved. Furthermore, when searching novel compounds in the database, only 17.7 percent correct identifications are reached. Compared to the earlier models, which only reached 19 percent and 8.35 percent, respectively, the improvements are significant, but nonetheless insufficient for automated independent identification. When observing the 10 or 20 highest scores, however, the system contains the correct compound between 70 and 80 percent of the time. Compared to previous models, which only reached 45 to 55 percent identification rates with 10 to 20 compound considerations, this is yet again a significant improvement. Also, once the correct compound had been identified, the molecular structure of the compound was added to the library, increasing the amount of data the system can learn from, which in turn leads to an increase in accuracy over time.

Overall, it seems the implementation of expert systems utilizing mass spectrometry for structure elucidation has seen noteworthy improvements over the years and offers a somewhat reasonable alternative to database searching, more so if the specific problem cannot be solved using databases. However, all systems mentioned are not capable of providing automated structure elucidation without confirmation and evaluation by an expert. Nevertheless, the systems were capable of significantly reducing the amount of possible structures, which would benefit the expert into reaching full identification more rapidly and reliably [44].

While the DENDRAL project and the expert systems mentioned above focus heavily on spectral data from a mass spectrometer, other rule-based CASE systems employ two-dimensional nuclear magnetic resonance (2D-NMR) data in combination with a one-dimensional 13_{C NMR database in order to obtain} the possible structures [50–54]. Studies by Elyashberg et al. describe the investigation of the expert system Structure Elucidator (StrucEluc), one of the most reviewed CASE systems that primarily focusses on a NMR fragment database [50,51,55]. However, if the fragments are not known in the database, the system automatically attempts to extract connectivity information from the 2D-NMR data and utilizes this data in combination with one-dimensional NMR data to generate Molecular Connectivity Diagrams (MCDs), which gradually builds the chemical structure from a starting point in the NMR spectrum, see Figure 16 [44]. These MCDs are then used to generate the plausible structures and fragments. Finally, if successful, the system attempts to determine the stereochemistry of the molecule while providing a 3D model of the most likely structure and adding the (possibly) newly acquired fragments to the database. Later, as the system was improved, a fuzzy logic element was added, capable of providing approximations when nonstandard correlations are present in the spectra [54].

(21)

20 Even though the StrucEluc system was developed in 2002, it contains several elements that allow it to outperform even manual elucidation at times [56]. Furthermore, the system was capable of performing with limited NMR data, such as proton-deficient molecules that lack connectivity information [51]. Overall, the system has become a viable automatic elucidation strategy with NMR data [57]. However, besides this commercial system, freeware CASE techniques using NMR have been developed, such as the Logic Structure Determination (LSD) system [58]. This freeware software, developed in 1991, first assigns correlations of both of the 2D-NMR dimensions for bond generation [59]. The system takes into consideration how many bonds every atom can have and considers them first. Once all correlations are made, atoms without their preferred amount of bonds are paired together. Even if the structure is not completely assigned, the substructure can be searched in a database for identification. So far, the system has previously shown selective success with specific types of molecules, namely flavonoids, alkaloids, and steroids and has shown more recent success with cembranoids molecules, and more general small molecules [57,60–62]. Several other freeware methods, such as FOCUS, NMRShiftDB, or ACD, have also been developed. However, due to the incorporation of chemical shift knowledge, the StrucEluc system remains the most successful and most utilized system [44]. One problem with these freeware systems that limits their accuracy is that these learning expert systems lack the vast amount of data usually required with learning systems for comparison. Because of this, studies have described creating new databases or even methods to obtain a universal data format that allows the merging of the numerous amounts of databases currently present, which in turn makes it possible for the systems to acquire much more learning data to increase its accuracy [63,64].

3.1.2 Decision-tree learning

Although decision-tree learning is not being used as a complete CASE system, decision-tree systems are utilized as modules inside CASE systems in order to predict spectra for hypothetical structures [27]. The CASE system NMRshiftDB, mentioned above, is one of several system that utilize prediction modules in order to predict spectra of hypothetical structures. These spectra can then be compared to the unknown compound and a high overlap would strongly suggest that the correct compound was predicted. The study by Kuhn et al. from 2008 showed that several prediction models were compared, among them both random forests and regular decision-tree learning, and overall, the mean absolute error of prediction was 0.18 ppm in the NMR shift range of 0 to 11 ppm, with the random forest algorithm displaying a slight improved performance compared to the normal decision-tree algorithm, see Figure 17. Another study from 2018 by Lim et al. describes a CASE system that tries to predict substructures from mass spectra and its chemical formula using several algorithms, one of which is decision tree selection [65]. This algorithm is then combined with a neural network that scores candidate structures in order to obtain the complete structure.

Overall, however, decision-tree algorithms and all of its derivatives are generally suited toward prediction and are not capable of providing complete structure elucidation on their own, as they require additional systems to compare their predictions to the spectra of interest. Furthermore, to derive the possible compound structures for prediction, other systems are also required. Other prediction type AI will be discussed in section 3.3.

(22)

21

Figure 17: The actual versus predicted 1H shift (in ppm) of the random forest algorithm [27].

3.1.3 Artificial neural networks

As mentioned in the previous section, a study by Lim et al. from 2018 examined a CASE system that primarily utilizes neural networks in order to elucidate compounds from MS spectra [65]. The system applies another CASE system called MOLGEN, that has already shown successful results with identifying structures of MS spectra with relatively small molecules, although also semi-automatically [66]. In this case, the MOLGEN system is used to generate candidate structures of large molecules from mass spectra, with which MOLGEN alone would most likely fail, as the highest scored molecule is often not the correct structure. However, of all the candidate structures, the correct structure is always present. The system itself attempts to generate substructures from the mass spectra using a neural network classifier and compares the probability of every substructure being present in a candidate structure. This is then repeated for all structures. Finally, the system ranks all structures with the highest probability ranking first. A different study from 2018, by Qiu et al., describes MetExpert, a neural network based expert system for structure elucidation of metabolites using Gas-Chromatography-Mass Spectrometry (GC-MS) data [67]. The system consists of four modules that are combined to elucidate structures with or without a database. The first module generated derivatized versions of the metabolites based on the peaks present in the mass spectra that allows easier comparison with database searching. The second module is able to differentiate metabolites from synthetic molecules, which reduces the overall number of candidate structures and thus the computation time. The third module utilizes neural networks in order to predict the retention index of a metabolite, which brackets the metabolite with the closest straight hydrocarbon in terms of retention time. The fourth and final module contains similar features as the study mentioned above and also attempts to predict substructures from the EI-MS spectra that are utilized in combination with the predicted retention indices to rank the candidate structures. Another example of neural networks is described by a study from 2018 by Sandak et al., in which several different expert systems, one of which utilized neural networks, were applied to Near-Infrared (NIR) spectral classification of particleboards. In this case, both the neural network and the statistical approach reached over 98.9 percent correct classification for four possible particleboards.

(23)

22 The neural network classifier shows that, in the most optimal settings, the system was able to rank the correct metabolite in the top 20 for 88.8 to 89.8 percent with over 250 thousand candidates. Using more unsupervised settings, the system was able to reach percentages of 62.8 to 77.6 percent for a top 5 ranking and 70.8 to 87.8 percent for a top 20 ranking, whereas Metexpert shows the correct compound in the top 5 for 75 percent of the time and 85 percent of the time in the top 20. Furthermore, when the derivatization module was used, Metexpert shows the correct compound in the top 5 95 percent of the time and nearly 100 percent of the time in the top 20. The results of the study also show that the derivatization module could be in combination with current spectral comparison systems, as the derivatization module increased the correct rank 1 ranking of the compound with spectral comparison from 88 percent to 93 percent and allowed the spectral comparison to reach 100 percent inclusion of the correct compound with just a top 3. Compared to the fragment tree model system, which reached percentages of 70 to 80 percent, both systems show a direct improvement, but are still unable to compete to spectral database comparison. The overall improvement compared to the other MS models is partially due to the vast amount of data in the database from which the neural networks can learn, as neural networks are known to perform much better in the presence of large amounts of data. Subsequently, these types of artificial intelligence often underperform when there is an insufficient amount of data. Nonetheless, these results show that over time, structure elucidation with AI has improved even further with the inclusion of neural network based expert systems.

In short, expert systems have been developed almost as early as AI itself and many examples of rule-based systems for structure elucidation have been developed over time. However, because rule-rule-based expert systems rely on knowledge bases that have been developed by experts and can change significantly over time, general purpose expert systems require constant adaptation. Fortunately, applications of expert systems that incorporate database searching have been developed with built-in learnbuilt-ing elements that utilize their own results built-into their trabuilt-inbuilt-ing data built-in order to improve themselves as the knowledge changes over time, although not all learning expert systems show success [49,50]. So, while not all systems can be designed to incorporate general applications, some models are becoming increasingly more accurate in general-purpose applications, as long as the analytical method is possible. The incorporation of decision-tree prediction for comparison between generated and measured spectra has shown a useful addition in structure elucidation systems with relatively high accuracy and the recent development of neural networks has improved the performance of structure elucidation systems even further, but only when large amounts of training data are present. All in all, artificial intelligence for structure elucidation shows potential for implementation and improvement and although the systems may never operate independently, they may become capable of providing valuable assistance to the analytical experts.

(24)

23

3.2 Experimental optimization

3.2.1 Expert systems

Besides the problem solving of structure elucidation, expert systems have also been utilized in optimizing experimental parameters for chromatographic separations. A larger project based around optimization is the Expert System in Chemical Analysis (ESCA), which was finalized in 1993 [17,68]. This project is comprised of several stand-alone- and integrated expert systems that were designed to build general expert systems capable of aiding and suggesting methods in all stages of method development for chromatography, including retention optimization, selectivity optimization, system optimization and validation [69]. One example of an expert system used as a component of ESCA is the System Optimization System (SOS), which can select the optimal column from any set of columns given by the user, provided the user also inputs constraints, such as the minimum resolution. Using these constraints and the set of columns, SOS recommends the optimal column and flow rate. One result that utilized SOS is shown in Figure 18.

Figure 18: Chromatograms of a sample before and after optimization using the ESCA expert system [17].

Overall, the ESCA project implemented over a dozen subsystems that allowed for the construction of expert systems with different applications in chromatography [68]. However, despite the fact that the project has been covered in over 25 papers and the ambition of the project authors and subsequent feasibility studies, usage of the ESCA system to develop novel High-Performance Liquid Chromatography (HPLC) methods remain to be seen [17,36,68,70]. Although the project did produce several prototypes that proved to generate parameters capable of efficient separation, follow-up research on the system was scarce and beyond 1994, no coverage of the system has been found [71]. This is possibly caused by one of the problems described in a feasibility study by Hamoir et al., in which it was deemed nearly impossible to create a machine that could cover the whole area of chromatography, or even HPLC [70]. The study describes that, due to the rapid changes of technology, the ESCA system would need constant incorporation of new information on how method development is performed by the chemists themselves, as this also changes over time and the ESCA system itself does not contain a learning element. These limitations are further emphasized in the study of a different expert system for method development, EluEx [72]. In the study by Fekete et al. from 1994, the EluEx system is examined for method development of a sample containing neutral, acidic, and basic compounds. By inputting the structures of the compounds, the system can predict the organic

(25)

24 content of each molecule and predict the pKa values of the acidic and basic compounds. In most cases, the system was able to generate a gradient composition that would suffice as a good starting point and after three experiments, the system would usually be provided with enough information to generate parameters for complete separation. However, in some cases the prediction order was different from the predicted hydrophobicity, which may cause incorrect identification. Furthermore, pKa values of mixed solvents were often incorrect, leading to suggestions that results in incomplete separations. Ultimately, the design of a general expert system capable of covering all of chromatography, or even HPLC, seems impossible.

Fortunately, the design of expert systems for optimization of a single application does seem manageable and these systems have been designed somewhat recently [73,74]. In 2009, a study by Yu-Lei et al. Described an expert system designed to establish methods for Gas Chromatography (GC) fingerprint analysis of volatile oils from a specific plant species [73]. These methods were then utilized to reliably separate the volatile oils. A different expert system from 2008, described by Li-Li et al., was used to recommend the optimal preparation and experimental conditions to separate analytes in liquorice samples and the methods described by the systems were found effective [74]. Unfortunately, the timespan required to generate these systems is not described and if this timespan is longer than the time required to optimize the method with normal procedures, it offers no benefit.

3.2.2 Artificial neural networks

In recent years, the use of artificial neural networks for method optimization has also been explored. Two studies by Golubovic et al. from 2015 and 2016 describe the use of artificial neural networks in combination with quantitative structure-retention relationships (QSRRs) for method optimization for liquid chromatography [75,76]. Although there are multiple QSRR methods, the main method employs a large number of molecular descriptors, which are characteristics of the molecular structure of the compound of interest. These descriptors can describe anything related to the molecule such as pH or electronegativity. Normally, statistics is applied to these descriptors in order to obtain linear relations between certain parameters, which can then be used to predict the retention of the molecule with certain instrumental settings. These studies have now utilized artificial neural networks instead of statistics in order to obtain nonlinear relations and allow for altering of the instrumental parameters. This new approach, QSRR-ANN, was then used to optimize the separation methods of two samples containing highly similar structures.

The studies both utilize different networks with a different composition and a different input of descriptors, as the choice of descriptors for the network are vital for the accuracy of the method and often depend on the analytes that are being measured [75]. The first study was used to develop a method for separating a drug molecule and its degradation products [76]. Nine descriptors were selected and used to train the network into accurately predicting the retention factor. Then, response surface plots were predicted using the ANN in order to find the optimal instrumental parameters in such a way that all compounds would have significantly different retention factors, see Figure 19. The system was successfully able to predict an isocratic method that fully separated all compounds. In addition, the correlation coefficient for the retention factor was found to be 0.9984 with the ANN, while linear statistical methods only achieved a correlation coefficient of 0.715 between predicted and measured retention factors. However, while the system performed better than the statistical approach, the utilization of an isocratic method resulted in a run time of 27 minutes for a separation of only seven compounds. Unfortunately, this specific network was unable to generate a gradient-elution method.

(26)

25

Figure 19: A response surface plot predicted by the neural network for two compounds under different instrumental parameters [76].

The second study performed similarly to the first, but was able to generate a gradient-elution method for the separation of six drugs [75]. Again, nine descriptors were selected for training the system and a correlation coefficient between predicted and measured values was determined at 0.985 for the test set. This trained network was then utilized to select the optimal parameters for the separation and a full separation was achieved in 15.5 minutes, which is a shorter run time compared to other published methods for separating these compounds. They were, however, limited to altering only a select few parameters, such as gradient time, buffer pH, and buffer molarity. Other important parameters, such as column types, length, type of buffer, and type of organic solvent, were set. Overall, these systems proved the feasibility of using neural networks in method optimization, even though there is much room for improvement.

The methods that are currently being employed for method optimization and show more success all utilize model-based or statistical approaches. The book “software-assisted method development in high performance liquid chromatography” from 2018 does mention the use of intelligent systems for retention prediction, referring to the use of expert systems, but besides this example the book mentions only model-based or statistical methods [77]. Of the methods mentioned in the book, the most utilized approaches are Chromsword® and Drylab, of which Drylab is the more popular

technique. As both methods are commercially available software, not much is known on how these programs operate, besides that their main operation is based on models and statistics. The Drylab software has been available since 1992 and has since been updated numerous times [78]. Nowadays, the newest version of Drylab4 has been developed and is currently being used in developing different HPLC methods [79–81]. The method requires as little as four different exploratory methods with varying gradient times and flow rates in order to optimize the method with all peaks separated (if possible). However, with the upcoming utilization of two-dimensional liquid chromatography (2D-LC), these systems have not been proven to be useful. Instead, a new model-based approach was developed in 2016 by Pirok et al. that can incorporate gradient-elution in both dimensions [82–84]. This system was proven useful by successfully separating a complex mixture of 54 synthetic dye components using ion-exchange chromatography and ion pair chromatography. The system was also successfully applied to a mixture of 72 reference standards also using ion-exchange and ion pair chromatography.

All in all, statistical approaches are currently the most common method optimization approaches and the utilization of software such as Chromsword® and Drylab has allowed for method development

with minimal time and cost. Furthermore, the methods that both models use in determining optimal parameters requires minimal effort of simply performing four runs of the analytes with differing parameters. Compared to AI, which often requires feeding it a database of training data, these model-based approaches seem superior, as they are already built up. However, AI built for specific applications is close to offering a reasonable alternative, albeit in the presence of an existing database.

The Potential of Artificial Intelligence in Analytical Sciences

Literature Thesis