Characterizing Exoplanet Transits and Stellar Activity with Scalable Gaussian Processes

(1)

Universiteit van Amsterdam

MSc Artificial Intelligence

Master Thesis

Characterizing Exoplanet Transits and Stellar

Activity with Scalable Gaussian Processes

by

Victoria Foing

11773391

October 17, 2020

36 ECTS February 2020 - October 2020

Main Supervisor:

Ana M. Heras (European Space Agency)

Co-Supervisor:

(2)

Abstract

Exoplanets are planets orbiting around stars other than our sun. The field of exoplanet science has boomed since the 1990s [1]. Space missions like CoRoT, Kepler and TESS, and ground-based observatories have collected millions of lightcurves and detected more than 3,200 exoplanets using transit photometry, an approach that involves measuring the brightness of a star over time and looking for periodic dips in the brightness caused by a planet repeatedly passing in front of the star. Once a transit is detected, astronomers seek to characterize the transit signal in the lightcurve to learn about properties of the planet. This can be challenging due to noise from stellar activity, namely rotational modulation patterns in the lightcurve caused by starspots and plages appearing in and out of view as the host star rotates. In the past decade, Gaussian Processes have become popular models for modelling both instrumental and astrophysical noise in lightcurves to aid the detection and characterization of exoplanet transits. Gaussian Processes are well-suited to this task because they are flexible, interpretable, and probabilistic estimates of the model parameters can be obtained with Markov Chain Monte Carlo sampling. However, they have been disadvantaged by slow computation speeds which has precluded their application to the vast datasets provided by Kepler and TESS. Recently, a new software, celerite, has introduced special kernel functions that are not only well-suited to model stellar activity but also scale linearly with the dataset by taking advantage of semi-separable matrices [2].

In this work, we apply scalable Gaussian Process models using the software celerite in order to characterize exoplanet transits and stellar activity in Kepler and TESS lightcurves. By "charac-terizing", we mean the retrieval of accurate transit and rotation parameters, with a focus on the radius of the planet and the rotation period of the star. We develop a pipeline for preprocessing the lightcurves, defining prior distributions using Physics techniques, building a Gaussian Process model with a transit mean function and a quasi-periodic Rotation kernel and a noise kernel, and sampling from the model with Markov Chain Monte Carlo. To assess the benefits of jointly mod-elling the rotation and the transits, we compare the performances of three Gaussian Process models: RotGP (only rotation), ExoGP (only transit), and ExoRotGP (rotation and transit). The mod-els are applied to 15 TESS stars and 9 Kepler stars with confirmed planets and rotation periods, and the parameter estimates obtained are compared to those from refereed publications. For the lightcurves classified as having unambiguous rotation periods by previous authors, our best GP models obtain rotation periods within one one day of the reference rotation periods. For four of the TESS lightcurves classified as having ambiguous rotation periods by previous authors, the GP models are able to find rotation periods, outperforming the analysis of other researchers using tra-ditional techniques. For the recovery of planet radii, our best GP models are able to estimate the planet radii within one standard deviation of the radii for all Kepler planets and all but two TESS planets. Moreover, we highlight the importance of preprocessing decisions, demonstrating that the pipeline correction method can worsen the retrieval of the rotation period and temporal binning can dramatically worsen the retrieval of planet radii. Finally, we discover that joint modelling improves rotation period estimates for a few stars, due to the extra noise kernel, but only marginally improves the planet radii estimates, suggesting that celerite’s noise kernel can be effective for removing stellar activity. Our results further indicate that the joint model could be a promising tool for the charac-terisation of small exoplanets, with sizes between Neptune and Earth. Our method provides a solid basis for a future application and extension of the model to these objects.

Code: https://github.com/victoriafoing/ExoRotGP

(3)

Acknowledgements

I would like to express my sincere gratitude to my supervisor at the European Space Agency, Ana M. Heras, for her wonderful guidance and excellent support throughout the research project. I would also like to thank my supervisor at the University of Amsterdam, Patrick Forre, for the great feedback about the Artificial Intelligence side of the project. I would like to thank my family for their unwavering support and advice, in particular Bernard Foing. Furthermore, I would like to thank the authors of celerite software, which I used throughout the project, in particular Dan Foreman-Mackey, who was very helpful in answering my questions and guiding me in the right direction. Lastly, I would like to thank my boyfriend Adam, my friends, and my classmates in the AI masters program for teaching me a lot throughout the AI master and making this experience very enjoyable.

(4)

1

Executive Summary

In this work, we apply scalable Gaussian Process models using the software celerite in order to characterize exoplanet transits and stellar activity in Kepler and TESS lightcurves. Our goal is to use this machine learning method to obtain accurate characteristics of the planet and its orbit, and of the stellar light variations, with a focus on the radius of the planet and the rotation period of the star.

We begin by presenting our research questions: (1) How accurately can we model the stellar activity with Gaussian processes in Kepler and TESS lightcurves? (2) How accurately can we model the exoplanet transits with Gaussian processes in Kepler and TESS lightcurves? (3) Does joint modelling improve the characterization of rotation and transit parameters in Kepler and TESS lightcurves?

In the motivation section, we outline the motivation for studying the transit signals and rotation signals in Kepler and TESS lightcurves. The motivation is to help astronomers study and understand the planetary systems observed by these missions and to investigate one of the most fundamental questions in science: is our solar system unique? We further highlight how important finding accurate transit parameters is for deriving the properties of planets, which can give clues about the habitability of the planet. Modelling stellar activity can provide valuable insights about stellar physics (i.e. the evolution, age, and magnetic fields of stars) and the potential habitability of orbiting planets. Stellar activity models can also be used to detrend lightcurves, which can improve the detection of smaller Earth-sized planets, which are more likely to be habitable, and improve the characterization of transit parameters.

Next, we explore the motivation for using Gaussian Process to solve the problem of finding accurate transit and rotation parameters. In the past decade, Gaussian Processes have become popular models for modelling both instrumental and astrophysical noise in lightcurves because they are flexible and interpretable. However, they have been largely limited by slow computation speeds. Recently, a new software, celerite, has introduced special kernel functions that are not only well-suited to model stellar activity but also scale linearly with the dataset by taking advantage of semi-separable matrices [2].

We provide a brief summary of the history of exoplanet science and an introduction of the two astronomy concepts that we will study with our machine learning method: the exoplanet transits and the stellar activity. We discuss the boom in exoplanet science and the importance of space missions like Kepler and TESS, which have collected millions of lightcurves. Furthermore, we emphasize how the lightcurves produced by the Kepler and TESS missions can be studied to provide valuable insights about the detected exoplanets and their host stars. In particular, the transit signals in the lightcurves can be characterized to derive the radii of the planets, while the stellar activity can be characterized to infer the rotation periods of the stars

In the literature review section, we outline traditional techniques in Physics for estimating rotation periods, such as the Lomb-Scargle periodogram, and transit models for fitting lightcurves. We give an overview of the machine learning literature that exists on the application of Gaussian Processes on the stellar light curve analysis and why they are well-suited to this problem. Finally, we define our position in the state of the art, highlighting that our approach is innovative in that we work with novel TESS data, explore the joint modelling of transit and stellar signals, and develop a general pipeline that can be applied to many targets.

In the method section, we explain the theory behind our approach. We explain how Gaussian Processes work, how they can be used to model instrumental and stellar astrophysical noise, how they use Bayesian settings to find parameter estimates, and how the models can be optimized. Following that, we break down the kernel functions of the software celerite and explain how they can be used to model stellar variability and why they are fast to compute [2]. We further describe the transit model we used to fit the transits and the various features that are used to define it: limb darkening, orbit types, exposure time integration. Finally, we describe Markov Chain Monte Carlo Sampling and how it can be used to derive uncertainty estimates from the Gaussian Process posterior [3].

In the algorithm section, we explain how we implemented the theory using the software exoplanet, pymc3, starry, and celerite [4][5],[6], with the help of pseudocode examples. Essentially, we develop a pipeline for

(7)

CHAPTER 1. EXECUTIVE SUMMARY

processing the lightcurves, defining prior distributions using traditional Physics techniques, building a Gaussian Process model with a transit mean function and a quasi-periodic Rotation kernel, and sampling from the model with Markov Chain Monte Carlo. With certain arguments, this flexible pipeline can run experiments for three models: three Gaussian Process models: RotGP (only rotation), ExoGP (only transit), and ExoRotGP (rota-tion and transit). We summarize the prior distribu(rota-tions that we used for our models and the mean func(rota-tions and kernel functions used for each Gaussian Process model.

In the experiments section, we begin by giving an overview of the data from the Kepler and TESS mission and how they differ. Next, we describe the selection of targets and visualize them so the reader can refer to them when interpreting the results. We describe the preprocessing steps, in particular the lightcurve correction methods, and we set up the experiments for investigating the research questions. To assess the benefits of jointly modelling the rotation and the transits, we compare the performances of three Gaussian Process models: RotGP (only rotation), ExoGP (only transit), and ExoRotGP (rotation and transit). We evaluate our method by first performing a simulated experiment, and then applying the models to 9 Kepler stars and 15 TESS stars. For the real data, we calculate the mean and quantiles of the parameter estimates and compare them to values from refereed publications. Moreover, we assess convergence of the model using the number of effective samples and Gelman-Rhubin statistic, and assess the efficiency of the model by looking at the computation time.

In the results and discussion section, we present the results of our experiments on simulated and real lightcurves. Our model is validated using a simulated experiment, where it finds the injected rotation period and the injected planet radius in a synthetic lightcurve. We discuss how with our GP models, we were able to estimate rotation periods within 1 day for all the Kepler targets and rotation periods within 1 day of all the TESS targets with unambiguous rotation, ranging from 1.2 days to 28.5 days. For four TESS lightcurves which were deemed to have “dubious rotation” or “ambiguous variability” by Martins et al., we find rotation periods, one of which is a new rotation period [7]. Our method was able to recover planet radii ranging from small super Earths (0.1 R_jup) to large Jupiters (1.5 R_jup). We highlight the decrease in performance when the correction method flattens the rotation signal and when temporal binning smears the transit curve. Moreover, we conclude that the joint ExoRotGP only marginally performs better than the modular RotGP and ExoGP models, at the expense of much longer MCMC computation time. Thus, we conclude that the joint model should only be used when there is a clear rotation signal present and if it strongly interferes with the transit signal (i.e. when the stellar rotation period and the planet orbital period overlap, or when the planet is very small)

In the conclusion, we wrap up the contributions of the thesis, which are the answers to our research questions, and the development of a software tool which automates multiple stages of data analysis: correcting lightcurves, transit detection, defining prior distributions based on the data, modelling of the stellar activity and exoplanet transits with a Gaussian Process, and sampling from the model with MCMC. For each star, our code outputs a document with figures and tables summarizing different stages of the pipeline. The tool we developed and the documents produced will be used by astronomy students for the analysis of more stars and planets. We have also published results at two conferences: the first, a virtual poster at the European Astronomical Society (EAS), and the second, a recorded talk at the Europlanet Science Congress (EPSC) (see Appendix).

(8)

CHAPTER

2

Introduction

2.1 Problem Statement

“Three decades ago, astronomers could not say reliably whether there were planets around other stars”, and now we know that “the universe is home to more planets than stars, with billions of potentially habitable planets just in our own galaxy” [8]. Since the first exoplanet discovery in 1992, the field of exoplanet science has boomed and over 4,000 exoplanets have been discovered [1]. These discoveries have been facilitated by improved instrumentation of ground-based telescope observations and the launch of exoplanet-hunting missions, which have collected vast amounts of lightcurve data. Lightcurves are time series data which measure the brightness ("flux") of a star over time and can be used detect exoplanets transits, periodic dips in the brightness caused by a planet passing in front of the star as it orbits [9]. To date, the most successful exoplanet-hunting mission is Kepler/K2, which observed 530,000 stars and was responsible for discovering 3/4th of the known exoplanets using the transit method [8]. In 2018, the baton was a passed to a new space mission called Transiting Exoplanet Survey Satellite (TESS). In contrast to the Kepler mission, TESS is performing a transit survey of the full sky and is observing 200,000 stars that are closer and brighter. To date, TESS has discovered 79 exoplanets and is expected to discover several thousands more [10]. The lightcurves produced by the Kepler and TESS missions can provide valuable insights about the detected exoplanets and their host stars. In particular, the transit signals in the lightcurves can be characterized to derive various properties of the planet, such as the radius, while the stellar activity in the lightcurves can be characterized to infer the rotation periods of the stars. There is a flood of astronomy data, and a need to develop reliable, scalable and automated methods for characterizing the stars and planets observed by these missions. The characterization of exoplanet transits in lightcurves is a non-trivial problem and is plagued by instrumental and astrophysical noise. Thus, we need methods capable of modelling noise and disentangling signals. In the past decade, Gaussian Processes have become popular models for modelling both instrumental and astrophysical noise in lightcurves because they are flexible and interpretable, but they have been largely limited by slow computation speeds. Recently, a new software, celerite, has introduced special kernel functions that are not only well-suited to model stellar activity, but also scale linearly with the dataset by taking advantage of semi-separable matrices. Few studies have worked with real TESS data and no studies have applied Gaussian Process for modelling stellar activity in TESS data. Thus, the objective of this research is to jointly characterize exoplanet transits and stellar activity in TESS and Kepler lightcurves using scalable Gaussian Processes. Simply put, we want to demonstrate that this machine learning method can obtain accurate estimates of the transit and rotation parameters, with a focus on the planet radii and stellar rotation periods, and that it can be applied to many star-planet systems.

2.2 Motivation

The motivation for characterizing exoplanet transits and stellar activity is to help astronomers study and understand the planetary systems observed by these missions and to continue the "quest to end humankind’s cosmic loneliness" [8]. The study of exoplanets allows us to investigate one of the most fundamental questions in science: is our solar system unique? By studying other exoplanets, we can learn about the evolution of other planetary systems and better understand the history of own solar system and its place in the universe. Finding accurate transit parameters is important for deriving the properties of planets. When the radius and the mass of a planet are obtained, astronomers can calculate the density of the planet, which can provide insights about the composition and the atmosphere of the planet and whether it is habitable [11]. Modelling stellar activity can provide valuable insights about stellar physics (i.e. the evolution, age, and magnetic fields of stars) and the potential habitability of orbiting planets. According to Martins et al. (2020), it is valuable to study stellar

(9)

2.3. RESEARCH QUESTIONS CHAPTER 2. INTRODUCTION

rotation because it is "a fundamental observable driving stellar and planetary evolution, including planetary atmospheres and impacting on habitability conditions and the genesis of life around stars" [7]. Stellar activity models can also be used to detrend the lightcurves, which can improve the detection of Earth-sized planets, which are more likely to be habitable, and improve the characterization of their transit parameters.

The motivation for using scalable Gaussian Processes is that they are well-equipped to deal with the prob-lem of characterizing stellar activity and exoplanet transits without the disadvantage of being limited to small datasets. First, they are non-parameteric and flexible, allowing them to model both instrumental and astro-physical processes, which can help separate signals. Second, they provide probabilistic uncertainty estimates for parameters, which is useful for hierarchical modeling in astronomy (i.e. making "scientific inferences about a population" of stars and planets) [12]. Few studies have jointly modelled the exoplanet transits and stellar activity. We explore the joint modelling of stellar activity, noise, and exoplanet transits, as it has been suggested to give better, more holistic results [13]. Typically, these are treated as two separate problems and one signal is removed to amplify the other. Rotation periods can be obtained by first removing the transit signals and then studying the parts of the lightcurve that are out of transit. Transit parameters can be obtained by first detrending the lightcurve to remove long-term astrophysical signals and then fitting the transit parameters. Though these approaches have proven to be successful, they remove information from the data which can be important for characterization. By jointly modelling the components, the model can work with all the data available and learn how the rotation and transit parameters influence each other, ultimately leading to better characterization. Additionally, joint modelling permits the study of two objects at once: the planet and the star.

The motivation for applying this method to TESS lightcurves is that the data from the TESS mission is extremely novel. First, few studies have worked with real TESS data and no studies have applied Gaussian Processes for modelling stellar activity in TESS data. This is because the data from the TESS mission is new, having been released in increments. This presents an opportunity to test new methods on the data and make new scientific discoveries which can be published. For example, the stellar rotation periods of the majority of the stars observed by TESS have not been established and testing a method which can accurately derive them would be useful for studying the stars and detrending the lightcurves. The motivation for working with Kepler lightcurves is to validate and finetune our method. Though less novel, Kepler lightcurves are less noisy and have been studied extensively, leading to a lot of reference radii and rotation periods in literature for comparison.

The motivation for turning to machine learning is because there is a strong need for developing reliable, scalable and automated methods for analyzing these large datasets. The datasets provided by both Kepler and TESS are vast, and there will be many more missions launched in the near future. Manually searching through these datasets will be a challenge for astronomers. Few studies have worked on developing a generalizable method for transit characterization which can be applied to multiple planets. From a machine learning perspective, we want to develop a pipeline which can work on a handful of targets. The reduction in computation time offered by the scalable kernels introduced by the celerite software offers an opportunity to characterize multiple star-planet systems.

All in all, with the advent of new exoplanet-hunting missions and the flood of astronomy data, we are truly living at one of the peaks of exoplanet science. Our aim is to contribute to the field of astronomy during these exciting times, using artificial intelligence as a tool.

2.3 Research questions

To the best of our knowledge, no studies have applied Gaussian Process methods to TESS data for the task of modelling stellar activity and few studies have examined the joint modelling of exoplanet transits and stellar activity. Considering the novelty of the TESS data and the research gaps in scientific literature, this thesis will focus characterizing exoplanet transits and stellar activity in TESS and Kepler lightcurves with scalable Gaussian Processes. We propose three research questions:

• RQ1: How accurately can we model the stellar activity with Gaussian processes in Kepler and TESS lightcurves?

• RQ2: How accurately can we model the exoplanet transits with Gaussian processes in Kepler and TESS lightcurves?

• RQ3: Does joint modelling improve the characterization of rotation and transit parameters in Kepler and TESS lightcurves?

(10)

2.4. HISTORY OF EXOPLANET SCIENCE CHAPTER 2. INTRODUCTION

2.4 History of Exoplanet Science

We provide a brief summary of the history of exoplanet science and a brief overview of the two astronomy concepts that we will study with our machine learning method: the exoplanet transits and the stellar activity. We discuss the boom in exoplanet science and the importance of space missions like Kepler and TESS, which have collected millions of lightcurves. Furthermore, we emphasize how the lightcurves produced by the Kepler and TESS missions can be studied to provide valuable insights about the detected exoplanets and their host stars. In particular, the transit signals in the lightcurves can be characterized to derive the radii of the planets, while the stellar activity can be characterized to infer the rotation periods of the stars.

Figure 2.1: Total number of exoplanet detections from 1989 to 2016 by different methods. Source: NASA Exoplanet Archive [14]

The field of exoplanet science has boomed since the 1990s and has revolutionized our vie of the universe. Since the first exoplanet discovery in 1992 (around a pulsar), over 4,000 exoplanets have been discovered. These discoveries have been facilitated by improved instrumentation of ground-based telescope observations and the launch of exoplanet-hunting missions, which have collected vast amounts of lightcurve data.

Kepler and TESS

The most successful mission was Kepler/K2, which was a space satellite in operation from 2009 to 2018. The Kepler mission observed 530,000 stars in a distant and narrow section of the sky using the transit method and was responsible for discovering 3/4th of the known exoplanets, namely 2,662 exoplanets [8]. In 2018, the baton was a passed to a new space mission called Transiting Exoplanet Survey Satellite (TESS). In contrast to the Kepler mission, TESS is performing a transit survey of the full sky and is observing 200,000 stars that are closer and brighter. To date, TESS has discovered 79 exoplanets and is expected to discover several thousands more [10]. In the upcoming decade, several new missions will be go into operation. Following the JWST, the PLAnetary Transits and Oscillations of stars (PLATO) mission will be launched in 2026. The PLATO mission will not only study exoplanets that appear in the habitable zone of solar-like stars but also study the host stars to learn more about planetary systems [10].

(11)

2.4. HISTORY OF EXOPLANET SCIENCE CHAPTER 2. INTRODUCTION

2.4.1 Exoplanet Transits

Figure 2.2: The transit method measures the brightness of a star over time and checks for periodic dips in the brightness caused by exoplanet passing in front of it. Source: PyPi Exotic [15]

Exoplanets are lightyears away and the signals can be very faint. Researchers have devised a variety of indirect methods to detect exoplanets from the ground and in space. The most successful method now is the transit method, which measures the brightness of a star over time and checks if there is a periodic dip caused by an exoplanet transiting in front of it. Once a transit signal is detected, astronomers seek to characterize the signal as accurately as possible using transit parameters. Some transit parameters can be directly measured from the geometry of the transit: transit period (how often it appears), transit duration (how long it is visible), and

transit depth (how much of the star’s light it blocks). Other more complex parameters can be derived using

physical equations and fitting models: the radius of the planet, the distance to the star, and the eccentricity of the orbit, the impact parameter, which indicates whether the planet is crossing the center of the star or grazing the edge of the star. Based on the transit depth, the radius of the planet can be determined if the radius of the star is known. Once the radius is found, the transit method is often followed up by the Radial Velocity method, which determines the mass of the planet and allows the density of the planet to be calculated [16].

2.4.2 Stellar activity

Figure 2.3: Rotational modulation patterns in a lightcurve caused by a starspot on the surface of the star appear in and out of view as the star rotates. Source: Armagh Observatory [17]

Stellar activity refers to magnetic field activity on the surface of stars shown by the emergence and decay of active regions such as starspots (dark spots), faculae (bright spots), and plages (bright regions). When the star rotates, these active regions appear in and out of view, resulting in variations in the stellar flux called rotational modulation. Active regions vary in size, location, and lifetime, leading to a variety of complex rotational modulation patterns. If rotational modulation is treated as a wave, it can be described by the following parameters: rotation period, amplitude of rotation, and quality factor, which indicates how quickly the modulating signal dies out.

(12)

2.5. CHALLENGES CHAPTER 2. INTRODUCTION

2.5 Challenges

2.5.1 Detection of Signals

The problem of recovering accurate transit and rotation parameters is non-trivial and comes with many chal-lenges. To begin with, the transit signal and/or the rotation signal must be detected in the data. An exoplanet can have a size ranging from smaller than the Earth to larger than Jupiter (which is 11.2 Earth radii) and the orbital period of an exoplanet can range from a few hours to thousands of years. A transit signal may not be detected if the signal is too faint (i.e. the planet is too small and/or the instrumental noise is too high) or if the transit does not appear in the observation window (i.e. the orbital period is too long). The situation is similar for the rotation signal, which can have a period ranging from shorter than one day to longer than 100 days. A rotation signal may not be detected in the data if the amplitude of rotation is too low (i.e. star spots on the surface of the star are small) or if the rotation period is too long for the time frame studied. In general, rotation signals are hard to characterize because even though every star has a rotation period, a rotation signal is not always present or consistent. There may be no starspots or many starspots covering both sides of the star, leading to the lack of a rotation modulation signal. The starspots are also constantly growing and shrinking, leading to changes in the pattern of rotational modulation. Furthermore, since stars are fluid and not solid, they exhibit differential rotation, which means that different parts of the surface of the star are rotating at different velocities. As you can see, there is a huge diversity in cases if we consider the range of sizes and periods. This makes it difficult to develop a robust method for finding rotation and transit parameters that works for all stars.

2.5.2 Noise: photon, instrumental, astrophysical

The main challenge in detecting and characterizing signals in lightcurves, however, is the presence of noise in data which interfere with the signals. There are various kinds of noise: 1) photon noise, caused by varying amounts of photons being collected by the telescope at different intervals, 2) instrumental noise, caused by instability of the instrument, (i.e. space craft drift or jumps) and 3) astrophysical noise (stellar activity), caused by the magnetic activity of the star. Instrumental noise in lightcurves and affects the retrieval of both transit and rotation parameters. For transit characterization, noise is a problem because it can mimic exoplanet signals, leading to false positives, and can also disguise exoplanet signals, in particular the signals of smaller earth-sized planets with longer orbital periods. Ford (2016) argues that with the improvement of instruments, the main obstacle for transit characterization is stellar activity [18]. All in all, noise and the entanglement of many signals makes it difficult to recover accurate transit and rotation parameters [19][20]. In our case the stellar activity can be both signal, when we want to detect it in the joint model to determine the rotation, or noise, when we want to eliminate it in the transit-only model.

2.5.3 Stellar Limb Darkening

(a) Limb Darkening (b) Effects of Limb Darkening on the shape of the Transit Curve

(c) Limb Darkening Laws

Figure 2.4: Sources: Left: Richmond [21]; Middle: University of Toronto [22]; Right: Kreidberg [23] In order to obtain accurate planetary parameters, researchers have to account for stellar limb darkening. Limb darkening is a phenomenon that causes the outer edges of the star (called the limb) to appear darker than the inner part of the disk. This means that from the perspective of the viewer, the light is not uniform across the surface of the star, the reason being that the light near the limb of the star is emitted from colder regions. Limb darkening affects the shape of the downwards and upwards slopes of the transit curve, resulting in a transit shape that is round instead of rectangular. There are several laws for capturing limb darkening effects in lightcurves (i.e. uniform, linear, quadratic, non-linear) which account for different transit shapes [22].

(13)

CHAPTER

3

Literature Review

In the literature review section, we outline the traditional techniques in Physics for estimating rotation periods and the traditional techniques for detecting and fitting transits. We give an overview of the machine learning literature that exists on the application of Gaussian Processes on the stellar light curve analysis and why they are well-suited to this problem. Finally, we define our position in the state of the art, highlighting that our approach is innovative in that we work with novel TESS data, explore the joint modelling of transit and stellar signals, and develop a general pipeline that can applied to many targets.

3.1 Traditional Methods in Physics

3.1.1 Transit Fitting

There are several approaches in Physics which are used to detect and characterize transits in lightcurves.

Figure 3.1: Left: Transit Least Squares; Right: Box Least Squares; Source: https://github.com/hippke/tls

Box Least Squares (BLS)

Since 2002, the Box Least Squares (BLS) method has been commonly used to detect transit signals in lightcurves [24]. The BLS method detects transit signals by fitting a periodic box shape (or an "upside down top hat") to the data [25]. The box shape inherently describes four parameters: the orbital period (time between boxes), the duration of the transit (width of the box), the depth of the transit (height of the box), and the reference time (the first time the box appears) [25]. A disadvantage of the BLS method is that the box shape does not accurately capture the curved shape of a transit signal caused by limb darkening of the star.

Transit Least Squares (TLS)

In 2018, Hippke et al. introduce the Transit Least Squares (TLS) method, an improved version of BLS which uses a curved transit shape instead of a box shape to search for transits. The TLS method retrieves the limb darkening coefficients of the star from a catalog and then calculates the impact of the coefficients on the shape of the transit. As a result, TLS is more reliable than BLS for detecting transit signals and obtaining accurate transit parameters, especially for those belonging to smaller planets [26].

(14)

3.1. TRADITIONAL METHODS IN PHYSICS CHAPTER 3. LITERATURE REVIEW

Transit Fitting Models

The derivation of parameters describing the geometry of the transit (i.e. depth, shape, duration) is a straight-forward task. However, the derivation of other planet parameters (e.g. radius, impact parameter, orbital inclination) requires an advanced transit fitting model based on analytical solutions. Furthermore, in order to get an accurate estimate of the area of light that is blocked out by the planet transiting the star, limb-darkening laws need to be incorporated in the analytical solutions [27]. In 2002, Mandel and Agol presented "exact analytic formulae for the eclipse of a star described by quadratic or nonlinear limb darkening" [28]. In 2006, Gimenez et al. also present equations to compute transit lightcurves based on the "fractional radii of the planet and the parent star, the inclination of the orbit, and the limb-darkening coefficients of the star" [27]. Since then, many tools for modelling and fitting exoplanet light curves have been developed by the astronomy community which are based on the analytical solutions of Mandel or Gimenez. These include but are not limited to: batman (BAsic Transit Model cAlculatioN in Python) [29], pytransit [30], and starry [31].

3.1.2 Recovering Rotation Periods

There are several traditional methods in Physics for recovering rotation periods from lightcurves.

(a) Lomb-Scargle (LS) Periodogram (b) Autocorrelation Function (ACF)

Figure 3.2

Fourier Transform

The Fourier transform converts the lightcurve from the time domain to the frequency domain and analyzes the spectral characteristics of the lightcurve. When the lightcurve is converted to the frequency domain, the rotation period can be found by identifying peaks in the power spectrum.

Lomb-Scargle (LS) Periodogram

The Lomb-Scargle (LS) periodogram is a method for "detecting and characterizing periodic signals in unevenly sampled data" [32]. This method can find the rotation period by using least squares to find the sinusoid that best fits the lightcurve.

Autocorrelation Function (ACF):

The autocorrelation function can find the rotation period by measuring the covariance between different time points in the lightcurve "at a distance or lag k, for all different values of k" [33].

ACF (k) = cov(xn, xn+k) = E[(xn− µ)(xn+k− µ)]

Rotation Periods in Literature

In the past decade, many papers have been published about finding the rotation periods of many Kepler stars at once using the traditional physics methods mentioned above. In 2013, Nielsen et al. measure the rotation periods of 12,151 stars observed by the Kepler mission using the LS Periodogram [34]. They confirm rotation periods by ensuring the rotation period for the majority of the sections of the lightcurve are within one day of the median rotation period [34]. In the same year, McQuillan et al. release three important papers about finding the rotation periods in stars observed by the Kepler mission using the autocorrelation function. In the first, they find rotation periods for 34,030 Kepler stars, which is the "largest sample of stellar rotation periods" to date [35]. In the second, they find the rotation periods for 737 main sequence Kepler stars that are known

(15)

3.2. APPLICATIONS IN MACHINE LEARNING CHAPTER 3. LITERATURE REVIEW

to host exoplanets [36]. In the third paper, McQuillan et al. find the rotation periods of 1570 Kepler stars and observe that the distribution of rotation periods is bimodal, where the most common rotation periods occur at 19 and 33 days [37]. More recently, in 2020, Martins et al. try to find the rotation periods of 1000 TESS stars by comparing rotation periods retrieved from Fast Fourier, LS Periodogram, and wavelet techniques [7]. Out of the 1000 stars, they find 131 stars with "unambiguous rotation" periods and 31 stars with "dubious rotation" periods, while the remaining 838 stars are classified as displaying ambiguous variability or being too noisy. The rotation periods found are noticeably short, ranging from 0.321 to 13.219 days, which is likely due to the fact that the sectors of TESS are short (27 days) and many stars have only one or two sectors of data available [7].

3.2 Applications in Machine Learning

3.2.1 Gaussian Processes

A Gaussian Process (GP) is a probability distribution over functions. A GP is defined by a mean function and a covariance ("kernel") function which models the covariance structure of the data [38]. A GP is non-parametric and flexible, so it can be suitable for modelling various kinds of processes. The most important decision when creating a GP model is which kernel function to use and this depends on the data you have and the process you want to model. One of the most well-known kernels is the squared exponential kernel, which assumes "data points that are nearby each other in input space are highly correlated, and data points that are far from each other in input space are relatively uncorrelated", making it good for modelling instrumental noise [39]. Another well-known kernel is the quasi-periodic kernel, which is good for modelling periodic processes [12].

In the last decade, many astronomers have used GPs to model instrumental noise or stellar activity in lightcurves. In 2012, Gibson et al. were the first to apply GPs to transit lightcurves for instrumental correction, by using a squared exponential kernel [39]. In 2016, Aigrain et al. use GPs to jointly model the instrumental systematics and astrophysical variability in K2 stars using squared-exponential and quasi-periodic kernels. They show that joint modelling of instrumental and stellar systematics leads to better overall systematics correction and that their detrending method leads to more transit detections than other correction methods [13]. Angus et al. (2017) use GPs with a quasi-periodic kernel function to infer stellar rotation periods for 1102 Kepler stars [12]. They demonstrate that the GP method performs well on simulated and real data and performs better than the LS periodogram and the autocorrelation function. The disadvantage of all these studies is that the GPs used are not scalable.

In 2017, there is big news in the exoplanet community regarding GPs. Foreman-Mackey et al. (2017) introduce the software celerite, which enables fast and scalable GP modeling with applications for light curve data, including the probabilistic inference of stellar rotation periods, stellar oscillations, and transiting planet parameters [2]. Their novel method for GP modeling scales linearly with the size of the dataset rather than as the cube of the number of datapoints. This works because they restrict the form of the kernels to a mixture of exponentials, which create semi-separable matrices which can be taken advantage of for faster computation. Foreman-Mackey et al. define celerite kernels to model a variety of stellar signals, from granulation, to oscillation, to rotation [2]. Since then, celerite has been adopted by many astronomers, to aid the modelling of stellar activity in transit lightcurves. In 2019, David et al. 2019 use the celerite GP model to jointly model the stellar variability and the transit signals in a specific star. Using the celerite rotation kernel and a transit mean model, they recover transit parameters for the small planets orbiting the star [40]. In 2019, Pereira et al. use the celerite GP model to model granulation and oscillations in artificial TESS lightcurves and real Kepler lightcurves of red-giant stars. They use the celerite’s granulation, oscillation, and white noise kernels and see that their method is able to retrieve a good estimate of the oscillation frequency and improves the recovery of the "planetary and stellar radius", highlighting the advantages of celerite [41]. In 2020, Barros et al. use the celerite GP to model stellar variability in transit lightcurves from the CoRoT mission [42]. They combine multiple celerite kernels to capture different types of variability (i.e. oscillation, granulation, and rotational modulation) and then remove the variability model from the lightcurve to retrieve transit parameters. They show that the multi-component GP model (with oscillation, granulation, and rotation kernels) outperforms single component GPs and non-GP methods, and that transit parameters are more precise when the stellar variability model is better [42].

Random Forests and Neural Networks

Other machine learning methods such as random forests and deep learning methods have been applied to stellar lightcurves in the past two years. Hinners et al. (2018) apply machine learning to predict and classify stellar properties from noisy and sparse time series data [43]. They are one of the first studies to work with real lightcurves and compare a representation learning approach with a feature engineering approach. They demonstrate that a Recurrent Neural Network (RNN) struggles due to data quality while the feature engineering manages to make good predictions for some parameters when using a Random Forest ensemble model [43]. Breton et al. (2019) use a Random Forest to classify Kepler targets into different star categories, and a second

(16)

3.3. POSITION IN THE STATE OF THE ART CHAPTER 3. LITERATURE REVIEW

classifier on the targets categorized as rotating main sequence (sun-like) stars to determine what is the best instrumental filter to get the rotation periods [44]. In the last month of the thesis, two articles were released on the application of machine learning to the problem of finding rotation periods in Kepler and TESS lightcurves. In the first article, Lu and Angus et al. use a machine learning method involving random forests called Astraea, which tries to predict long rotation period from short lightcurves. They train a Random Forest classifier to first determine whether a rotation period is "measurable" or "unmeasurable" in a lightcurve, and if it is, they then use a Random Forest regressor to predict rotation periods up to 60 days on TESS’s short sectors by learning the relationship between stellar parameters and changes in brightness of the star [45]. They apply the method to Kepler and TESS data and get 9% uncertainty in Kepler and 55% uncertainty in TESS, pointing out that it is hard to measure rotation periods in TESS with traditional physics methods because the sectors are too short [45]. In the second paper, Angus et al. use a convolutional neural network on Kepler photometric data to recover stellar properties, such as the rotation period for main sequence stars (with an uncertainty of 5.2 days) [46]. Though these are exciting applications, in the thesis we focus on Gaussian Processes as they allow us to explore the joint characterization of stellar activity and exoplanet transits and provide probabilistic estimates of the rotation and transit parameters.

3.3 Position in the State of the Art

Despite the success of the aforementioned applications, there are research gaps and opportunities for improve-ment in the scientific literature that we seek to address with our research. First, few studies have worked with real TESS data and no studies have applied Gaussian Processes for modelling stellar activity in TESS data. This is because the data from the TESS mission is extremely novel, currently being released in increments. This presents an opportunity to test new methods on the data and make new scientific discoveries which can be published. For example, the stellar rotation periods of the majority of the stars observed by TESS have not been established and testing a method which can derive them would be useful for studying the stars and detrending the lightcurves. In this work, we apply our method to real TESS data.

Second, few studies have jointly modelled the exoplanet transits and stellar activity. Typically, these are treated as two separate problems and one signal is removed to amplify the other. Rotation periods can be obtained by first removing the transit signals and then studying the parts of the lightcurve that are out of transit. Transit parameters can be obtained by first detrending the lightcurve to remove long-term astrophysical signals and then fitting the transit parameters. Though these approaches have proven to be successful, they remove information from the data which can be important for the task. By jointly modelling the components, the model can work with all the data available and learn how the rotation and transit parameters influence each other, ultimately leading to better characterization. However, joint modelling leads to more parameters, which can increase computation speed. In this work, we test the joint method and investigate the trade-off between speed and accuracy.

Third, few studies have worked on developing a generalizable method for transit characterization which can be applied to multiple stars. Many papers in astrophysics will focus on one star-planet system and all steps of the method (i.e. instrumental correction, initialization, sigma clipping) are specifically tuned to that star. From a machine learning perspective, we want to develop a pipeline which can work on a handful of targets. The reduction in computation time offered by the scalable kernels introduced by the celerite software offers an opportunity to characterize multiple star-planet systems. In this work, we develop a pipeline for automating the method. Using the pipeline, we apply the method to 24 targets and assess how well the method can be generalized.

In summary, no studies have applied Gaussian Process methods to TESS data for the task of modelling stellar activity and few studies have examined the joint modelling of exoplanet transits and stellar activity. Considering the novelty of the TESS data and the research gaps in scientific literature, this thesis will focus on characterizing exoplanet transits and stellar activity in TESS and Kepler lightcurves with scalable Gaussian Processes. The final research questions are:

• RQ1: How accurately can we model the stellar activity with Gaussian processes? • RQ2: How accurately can we model the exoplanet transits with Gaussian processes?

• RQ3: Does joint modelling improve the characterization of rotation and transit parameters?

To address these questions, we use three Gaussian Process models: (1) ExoGP, which uses a transit mean function and a noise kernel, (2) RotGP, which uses a zero mean function and a rotation kernel, and (3) Ex-oRotGP, which uses a transit mean function and a rotation and noise kernel. We compare the parameter results obtained by the models to those from refereed publications.

(17)

CHAPTER

4

Methodology

In the method section, we explain the theory behind our approach. We provide background information on Gaussian Processes, how they can be used to model instrumental and stellar astrophysical noise, how they use Bayesian settings to find parameter estimates, and how the models can be optimized to maximize the posterior. Following that, we break down the kernel functions of the software celerite and explain how they can be used to model stellar variability and why they are fast to compute. We move on to describing the transit model that we use to fit the transits and the various features that define it: limb darkening, orbit types, exposure time integration. Finally, we explain how Markov Chain Monte Carlo Sampling can be used to provide uncertainty estimates for the transit and rotation parameters.

4.1 Bayesian Inference

Bayesian inference involves making probabilistic estimates of "unobserved quantities condition on observed data" using probabilistic models [3]. We can make estimates of parameters, or fit our model to the data, by "infer[ring] the joint probability distribution for the model parameters" based on the data and the prior beliefs of the parameters [3]. The joint probability distribution is "the posterior distribution (posterior) and it is obtained by updating a prior distribution (prior) with a sampling distribution (likelihood)" [3]. The posterior is "the information about the model parameters given the prior information and the likelihood from observations."[3].

P (θ|y) =P (θ)P (y|θ) P (y)

where P (θ) is the prior distribution and P (y|θ) is the likelihood and P(y) is the model evidence.

The prior represents our "assumptions about a model parameter" and the likelihood is the probability that the observed data follows from our model given the model parameters [3]. When observations are passed to the model, the prior is "updated by the likelihood to produce a posterior distribution" [3]. Priors can be "informative", meaning they are "based on previous research" and are tightly constrained. For example, for the rotation period parameter, we can define the prior as a normal distribution where the mean and the standard deviation are based on estimates by other researchers [3]. Priors can also be uninformative, which is useful if we have no previous knowledge about the parameter. Uninformative priors can be defined so that they do not have a strong effect on the posterior, "allowing the data to ’speak for itself’" [3]. If we want to estimate the posterior distribution for a specific model parameter θi (e.g. the radius or the rotation period), we have to integrate the

joint posterior "over all other parameters" (i.e marginalization) [3].

P (θi|y) =

Z

P (θ|y)dθ_j,i

In our case, the model is a Gaussian Process (GP), and we can perform parameter estimation using the following steps. First, we specify the GP model with a mean function and a kernel function, which gives us the prior distributions. Second, we condition the GP model on the data (i.e. lightcurve), which gives us the likelihood. Third, we fit the model to the data (i.e. find the posterior distributions of the parameters of interest). Since our model is complex, it is not feasible to "calculate the posterior estimates analytically" and we have to rely on other methods [47]. The first step of model fitting involves finding the maximum a posteriori (MAP) parameters "using optimization methods" [47]. The MAP parameters will represent the "mode of the posterior distribution", which may be biased and can only give a point estimate of each parameter [47]. In our case, we use L-BFGS-B optimization to find the MAP parameters. The second step of model fitting involves drawing samples from the posterior distribution with Markov Chain Monte Carlo (MCMC) [47]. This allows us

(18)

4.2. GAUSSIAN PROCESSES CHAPTER 4. METHODOLOGY

to quantify the uncertainty of the model [47]. In our case, we use the NO U-Turn (NUTS) sampling method to approximate the posterior.

4.2 Gaussian Processes

A Gaussian Process is a stochastic model which consists of a mean function µθ(x) and a covariance function kα(xn, xm), where θ and α are parameters of the model and xnand xmare the coordinates of two datapoints in

the dataset [2] [38]. When we perform GP regression, we want to choose a kernel that describes the covariance structure of the data and finds "the set of parameters that best represent the observed data" [41]. Given a dataset y = (y1, ..., yN) with coordinates X = (x1, ..., xN), we can define the following log likelihood equation:

lnL(θ, α) = lnp(y|X, θ, α) = −1 2r T θK −1 α rθ− 1 2lndetKα− N 2ln(2π)

where r is a vector of residuals and K is the covariance matrix. The covariance (kernel) function k maps the time points to a covariance value and defines the elements of the covariance matrix:

Ki,j= k(xi, xj) + σ2δij

where σ2_{is white noise and δ is the Kronecker delta function (1 if the inputs are equal, 0 if they are not), which}

essentially means that white noise is added to the diagonal of the covariance matrix [3]. Once we have the log likelihood equation, we can maximize it to get the values for the model parameters θ and α that maximize the probability of drawing the dataset. We can then quantify "the uncertainties on θ and α ... by multiplying the likelihood by a prior p(θ,α) and using a Markov Chain Monte Carle (MCMC) algorithm to sample the joint posterior probability density" [2].

Typically, Gaussian Process models are trained on a set of training data to perform classification and regression on new inputs. In our case, we use the Gaussian Process to describe the data directly, which in our case is the lightcurve of the star, for the task of parameter estimation. Instead of computing a predictive posterior distribution based on new inputs, we use the posterior defined by the data and prior beliefs to obtain estimates of the model parameters and their uncertainty intervals. We are interested in the model parameters because we can define the Gaussian Process model in such a way that the model parameters summarize information about the stellar activity and the exoplanet transits in the lightcurve. We define the mean function of the model as the transit model, which is the noise free part of the lightcurve that captures the transits of the planet. We set up the covariance function to model stellar activity.

Advantages and Disadvantages

The advantages of Gaussian Process models is that they can directly quantify the uncertainty in predictions and that they are non-parametric, which means "we do not have to worry about whether it is possible for the model to fit the data" [38]. Gaussian Processes are versatile and flexible, so we can adapt them to the data and the problem by choosing special priors and kernel functions. As a result, they can model a variety of processes, ranging from "aperiodic, periodic, and quasiperiodic behavior" to various kinds of noise signals [3]. The disadvantages of Gaussian Process models is that the "computation time scales with the number of observations cubed" [2]. As a result, Gaussian Process are "generally limited to small datasets", which is a problem because many of the datasets produced by space missions such as TESS and Kepler are enormous [2]. In order for Gaussian Processes to be useful for this application, they need to be scalable. Luckily, special covariance functions and solvers have been designed to address the problem of scaling.

4.3 Celerite

Normally, the "cost of computing the inverse and determinant of a general matrix Kα" in the log likelihood

equation (number) is O(N3_{) [2]. Foreman-Mackey et al. introduce a software package Celerite that addresses}

the "cubic scaling" by limiting the choice of kernels to certain forms that are faster to compute [2]. However, a drawback of these kernel functions is that they only work with "one-dimensional datasets" [2]. In our case, this is not a problem, since we are working with one-dimensional time series data.

What is the celerite kernel?

The celerite kernel is a "mixture of exponential functions": [41]:

kα(τnm) = σ2δnm+ J

X

j=1

(19)

4.3. CELERITE CHAPTER 4. METHODOLOGY

where α = {aj, cj}

Given the relations between the exponential, cosine and sine functions we can rewrite the exponentials in the celerite kernel as "sums of sines and cosines" [41]. If we also incorporate complex kernel parameters "aj → aj±ibj

and cj→ cj± idj" then the celerite kernel becomes "a mixture of quasi-periodic oscillators" [41]:

kα(τnm) = J

X

j=1

[ajexp(−cjτnm)cos(djτnm) + bjexp(−cjτnm)sin(djτnm)]

where α = {aj, bj, cj, dj} The power spectrum of the celerite kernel is:

S(ω) = J X j=1 r 2 π (ajcj+ bjdj)(c2j+ d2j) + (ajcj− bjdj)ω2 ω4_{+ 2(c}2 j− d2j)w2+ (c2j+ d2j)2

Relation between celerite kernel and stochastically-driven, damped harmonic oscillator

Foreman-Mackey et al. demonstrate that the power spectrum of the celerite kernel function matches the power spectrum of a "stochastically-driven, damped harmonic oscillator" (SHO) [41][2].

Figure 4.1: Source: https://www.matlab-monkey.com/ODE/resonance/resonance.html

A stochastically-driven damped harmonic oscillator (SHO) is a system that exhibits repetitive variations around some central point, which are driven by a external force that is stochastic by nature, and the variations slowly dissipate over time due to energy being lost from the system. This SHO can be used to model all kinds of sinusoidal vibrations and waves. The power spectrum of the SHO is as follows:

S(w) = r 2 π S0w40 (w2_{− w} 02)2+ w02 w 2 Q2

As we can see, it is characterized by the following parameters:

• ω0: is the angular frequency of the undamped oscillator (2π divided by the period)

• Q: the quality factor of the oscillator (which describes how little the oscillator is damped)

• S0: the normalization constant of the power spectrum, which "is proportional to the power at w = w0". When the time lag, τ , is 0, the kernel is SHO(0) = S0w0Q, which represents variance [2]

The power spectrum of the SHO matches the power spectrum of the celerite kernel if we set the following values (see calculation in Appendix):

aj = S0ω0Q bj = S0ω0Q p 4Q2_{− 1} cj = ω0 2Q dj = w0 2Q p 4Q2_{− 1}

(20)

How is the celerite kernel related to stellar variability?

4.3.1 Stochastically-Driven Dampled Harmonic Oscillator (SHO) Kernel

Celerite has a SHO Kernel, which is a kernel that represents the stochastically-driven, damped harmonic oscil-lator and is characterized by the same parameters mentioned above:

• Angular frequency (ω0)

• Quality factor (Q)

• Normalization constant (S0): [2] When the quality factor is equal to √1

2, the power spectrum of the SHO can be reduced to:

S(ω) = r 2 π S0 (_ωω 0) 4_{+ 1}

And the kernel function becomes:

k(τ ) = S0ω0exp −√1 2ω0τcos(ω0τ√ 2 − π 4)

In a seminal astrophysics paper by Harvey (1985), it was demonstrated that this equation can model stellar granulation (the effect of motions of convective cells at the surface of the star that are hot and brighter at their center and darker at their downflow edges) [48]. The SHO kernel with a quality factor of √1

2 can model stellar

granulation, but it can also detect other types of variability or noise coming from the star and the instrument, making it very versatile [42].

4.3.2 Rotation Kernel

The Rotation kernel in celerite is "a mixture of two SHO terms that can be used to model stellar rotation" [49]. The kernel looks at two waves "in Fourier space: one at period and one at 0.5 * period" [49]. Why look at two waves? When we look in Fourier space, a strong periodic signal will a have a peak at its given period, but there will also likely be a peak at the harmonic of the signal, at 1/2 of the period. By looking at two waves, where one period is the harmonic of the other, we can infer a stellar rotation period.

The parameters of the Rotation Kernel are as follows:

• Amplitude (amp): the amplitude of the rotation, reflecting the size of the starspot • Rotation period (period): the rotation period of the main wave

• Quality factor (Q0): the quality factor of the second wave, which indicates how quickly the signal dies out • Difference between Quality Factors (deltaQ): the difference between the quality factors of the first and

second waves

• Fractional amplitude (mix): the fractional amplitude of the second wave compared to the first wave. The second wave should have an amplitude that is greater than 0 but less than that of the second wave. With these parameters, we can model two oscillations, which can be represented as SHO kernels: One SHO term with a period of period

Q1= 0.5 + Q0+ deltaQ ω1= 4πQ1 period ∗p4Q2 1− 1 S1= amp (ω1Q1)

Another SHO term at half the period

Q2= 0.5 + Q0 ω2= 8πQ2 period ∗p4Q2 2− 1 S2= mix ∗ amp w2Q2

Then the RotationTerm is the sum of the following two SHOterms:

(21)

How come the celerite kernel is faster to compute?

4.3.3 Semi-Separable Matrics

Foreman-Mackey et al. 2017 introduce a direct solver for the celerite covariance matrices which "exploits the semiseparable structure of these matrices to compute the Cholesky factorization in O(NJ2_{)" operations, where}

J refers to the number of kernels [2]. To be able to understand the solver, let us break down the concepts.

What are semi-separable matrices?

Semi-separable matrices are matrices that can described by a diagonal matrix with dimension N x N and two smaller matrices with dimensions N x R. R refers to the rank of the semi-separable matrix, which means that "all submatrices which can be taken out of the lower (upper) triangular part of the matrix have rank ≤ r" [50]. If K is a semi-seperable matrix with rank R, then the elements of the matrix are:

Kn,m=      PR r=1Un,rVm,r, if m < n An,n, if m = n PR r=1Um,rVn,r, otherwise (4.1)

An easier way to visualize it may be as the sum of a diagonal matrix A and strictly lower triangular matrix tril(U VT_{) and a strictly upper triangular matrix triu(V U}T_{) (Note: tril (triu) is a function for selecting the}

strictly lower (upper) triangular part of a matrix):

K = A + tril(U VT) + triu(V UT) For example, if R = 1 and N = 3, we have the following matrices:

A =   a1,1 0 0 0 a2,2 0 0 0 a3,3   U =   u1 u2 u3   V =   v1 v2 v3   U VT =  

u1v1 u1v2 u1v3 u2v1 u2v2 u2v3 u3v1 u3v2 u3v3





And a semi-separable matrix K can be defined as a sum of the following matrices:

K =   0 0 0 u2v1 0 0 u3v1 u3v2 0  +   a1,1 0 0 0 a2,2 0 0 0 a3,3  +   0 u2v1 u3v1 0 0 u3v2 0 0 0  

The advantage of semi-separable matrices is that they allow for linear "matrix transformation algorithms", meaning that "a system of linear equations Ax = b can be solved with linear complexity in the size of the matrix" [51]. One of the matrix transformation algorithms that can be computed in O(N) operations is Cholesky factorization.

What is Cholesky factorization?

Cholesky factorization is the "decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose" [52]. Here we focus on the LDL Cholesky factorization, which breaks down a matrix K into a product of a lower triangular matrix with a unit diagonal L, a diagonal matrix D, and the conjugate transpose of L:

K = LDLT

If we make the mathematical assumption that L can be defined by the identity matrix I, the matrix U defining the semi-separable matrix, and a new unknown matrix W with dimensions NxR, then we get:

K = I + tril(U WT)

Now we have two equations defining K and we can set them equal to each other to determine the elements in the unknown matrix W.

A + tril(U VT) + triu(V UT) = [I + tril(U WT)]D[I + tril(U WT)]T

From this equation, we can derive a recursive algorithm which computes the Cholesky Factorization of a semi-separable matrix in O(N R2) operations [2].

(22)

Cholesky Factorization Algorithm

To calculate the elements of D and W, the algorithm introduces another matrix, Sn, which is a symmetric RxR

matrix for each data point. The base case of the recursive algorithm is when n=1, and every element S1,j,k= 0. Sn,j,k= [Sn−1,j,k+ Dn−1,n−1Wn−1,jWn− 1, k]

Elements of the diagonal matrix D can be calculated using:

Dn,n= An,n− R X j=1 R X k=1 Un,jSn,j,kUn,k

Elements of the matrix W can be calculated using:

Wn,j= 1 Dn,n [Vn,j− R X k=1 Un,kSn,j,k]

Calculating the elements for n=1 requires O(R2_{) operations. Thus, when the algorithm is applied to the}

whole dataset, it has a run time of O(N R2).

Inverse of K

The inverse of a semi-separable matrix can be solved in O(RN) operations.

z = K−1y = (LT)−1D−1L−1y

Forward substitution can be used for lower triangular matrices. Let’s apply it here to solve z0= L−1y with

the base case f0,j = 0:

fn,j= fn−1,j+ Wn−1,jz0n−1 z_n0 = yn− R X j=1 Un,jfn,j

Back substitution can be used for upper triangular matrix. Let’s apply it here to solve the final equation for

z = (LT)−1D−1z0, using the solution for z0 = L−1y with the base case gN +1,j = 0: gn,j = gn+1,j+ Un+1,jzn+1 zn = z_n0 Dn,n − R X j=1 Wn,jgn,j

For each n, the run time is O(R). As a result, we can apply the inverse in O(NR) operations.

Log Determinant of K

The log determinant of a semi-separable matrix can be solved in O(N) operations.

lndetK = ln(det(LDLT)) = lndet(L) + lndet(D) + lndet(LT)

The determinant of a triangular matrix and the determinant of a diagonal matrix is the product of a diagonal.

lndetK = ln(1) + lndet(D) + ln(1) = N X n=1 lnDn,n Celerite Solver

Recall that the celerite kernel function "can be written as a mixture of quasi-periodic oscillators" [41]:

kα(τnm) = J

X

j=1

[ajexp(−cjτnm)cos(djτnm) + bjexp(−cjτnm)sin(djτnm)]

where α = {aj, bj, cj, dj} and τnm ≡ |tn− tm|. The covariance matrix K associated with this kernel function

can be written as a semi-separable matrix with rank R = 2J if:

Characterizing Exoplanet Transits and Stellar Activity with Scalable Gaussian Processes

Universiteit van Amsterdam

MSc Artificial Intelligence

Master Thesis

Characterizing Exoplanet Transits and Stellar

Activity with Scalable Gaussian Processes

Victoria Foing

October 17, 2020

Main Supervisor:

Ana M. Heras (European Space Agency)

Co-Supervisor:

Abstract

Acknowledgements

Contents

1

Executive Summary

2

Introduction

2.1

Problem Statement

2.2

Motivation

2.3

Research questions

2.4

History of Exoplanet Science

2.4.1

Exoplanet Transits

2.4.2

Stellar activity

2.5

Challenges

2.5.1

Detection of Signals

2.5.2

Noise: photon, instrumental, astrophysical

2.5.3

Stellar Limb Darkening

3

Literature Review

3.1

Traditional Methods in Physics

3.1.1

Transit Fitting

3.1.2

Recovering Rotation Periods

3.2

Applications in Machine Learning

3.2.1

Gaussian Processes

3.3

Position in the State of the Art

4

Methodology

4.1

Bayesian Inference

4.2

Gaussian Processes

4.3

Celerite

4.3.1

Stochastically-Driven Dampled Harmonic Oscillator (SHO) Kernel

4.3.2

Rotation Kernel

4.3.3

Semi-Separable Matrics