Physics of Behavior : Quantifying the Dynamics of Body Postures

(1)

MSc Physics Theoretical Physics

Master Thesis

Physics of Behavior

Quantifying the Dynamics of Body Postures

by

Ant´onio Carlos Costa 10687742

Supervisor: Greg J Stephens

Examiner: Bernard Nienhuis

Department of Physics of Living Systems Theoretical Physics of Complex Systems

(2)

Abstract

Understanding the connections between the machinery behind gene expression, the amazing computational power of the brain, and the way animals behave, has always fascinated us. In this thesis we aim to provide tools to improve our knowledge, using C. Elegans for this analysis. The understanding of its neuronal network, for which the complete wiring diagram is known, combined with genetic tooling, make it the perfect framework for our study. However, although we have clear quantitative measures of the expression of genes or neuronal activity, studies on behavior are remarkably less quantitative. Therefore, the goal of our research is to provide a better data driven understanding of behavior.

Using a top-down approach, recent studies have shown that the postures of the worm can be reconstructed using a four dimensional representation: the eigenworms. This gave a static description of the nematode’s body shapes, which are dynamically represented as a four dimensional time series. Our understanding of behavior is based on the analysis of these time series, which translate static postures into movement.

In this thesis we make the link between the postural dynamics and the worm behavior by building linear dynamical systems locally in time. A novel algorithm is developed, in which locality is derived directly from data, by adaptively choosing the windows that can be described by the same linear dynamics. This method is applied to a known behavioral setting, in which a heat shock is administred to the worm’s head at a specific time, and it accurately captures the known dynamical breaks. Furthermore, discrete behavioral motifs are found by clustering these local linear dynamics across worms, thus obtaining a fully data driven description of the behavioral landscape of C. Elegans.

(3)

Acknowledgements

First of all, I would like to thank my supervisor, Dr. Greg Stephens, for the amazing support throughout the research. His expertise combined with personal attachment were definitely the ground that made the results of this thesis possible. I’d also like to thank my second supervisor, Dr. Bernard Nienhuis, for his availability. Special thanks to Dr. Stephen Helms for his kindness and willingness to help and discuss the topics of the research. Thanks also to Onno, Mehran, Abel and Arno for all the discussions and interesting input.

I’d also like to thank all my previous teachers, from primary school up to now, who pro-vided me with the right tools to tackle the technical and conceptual difficulties associated with scientific research.

My MSc experience wouldn’t have been the same without my fellow students. Special thanks to Davide, Giulio, Zan and Lefkos, to name a few, for all the goat related breaks and non sense moments of joy, but also for the sudden bursts of intelligence and serious team work.

The personal gratitude list is possibly endless. However, I’d like to thank everyone who supported me over these past two years: the palpitas, the bilderdijkers, the conservas, everyone... I’d also like to give special thanks to Tozé, without whom these studies would have been much more difficult. Furthermore, I’d like to thank my “family” back in Coimbra, specially the people at SF/AAC and Tio Nelo, for making me the Caco I am today. Thanks Anilha, Trinca and Inês for bringing some of Coimbra to Amsterdam. Last, but not least, I’d like to thank my family. It’s hard to put into words the im-portance you have in my life. The respectful mockery of my brother, combined with the friendly pushiness of my mother and my father’s peace of mind make my life easier every single day. Avó Delfina, I never stopped feeling your support and kindness, and it definitely guided me towards my goal. Thanks!

(4)

(5)

C Local Linear Fit 55 D Metrics and Test Statistics 57 D.1 Jensen-Shannon Divergence . . . 57 D.2 Shapiro-Wilk Test . . . 57 D.3 Ljung-Box Test . . . 58 E Order misspecification 59 F Robustness Check 61 F.1 Window Sizes . . . 61 F.2 Percentiles . . . 62 F.3 Rolling Windows . . . 62 G Methods 63 G.1 Worm Preparation . . . 63

G.1.1 First Data Set . . . 63

G.1.2 Second Data Set . . . 63

G.1.3 Third Data Set . . . 64

G.2 Tracking . . . 64

G.2.1 First Data Set . . . 64

G.2.2 Second Data Set . . . 64

G.2.3 Third Data Set . . . 65

G.3 Image Analysis . . . 65

H Clustering 67 H.1 t-SNE . . . 67

H.2 Peak finding . . . 69

(7)

Chapter 1

Introduction

Physicists have long been interested in questions that arise in living systems. As Mo-rowitz [1] points out, it was specially after the II World War, that physicists returning to piece times thought about applying their skills to biology. Influential books, such as “What is life?” by Erwin Schr¨odinger [2] or “Mathematical Biophysics” by Nicolas Rashevsky [3], are examples of this phenomenon. This interest in biology is completely understandable, specially because modern physics is more and more about analyzing and quantifying emergent phenomena, and life can be totally thought as emerging from more fundamental mechanisms that, put together, build the complexity and beauty of life as we see it.

Besides all the technical breakthroughs that physics has brought into biology over the past few decades, such as X-ray spectroscopy applied to biomolecules or magnetic meth-ods applied to neuroimaging, the physicist way of thinking and the development of new physics, specially in statistical and quantum mechanics, has changed the way biology has developed. However, this symbiosis goes both ways, as biology provides astonishing amounts of complex systems to be studied by physicists, who strive to find the unifying and fundamental principles behind biological phenomena. New physics is also built from the study of biological systems that gives new and interesting insights into the under-standing of inanimated systems, for which data is scarce. This is specially the case in non equilibrium statistical physics.

Studies in molecular biology, such as the discovery of conformational substates in pro-teins [4] (fundamental for their motion), the work in bacteria, such as Bacillus subtilis [5], which is a perfect example of a system that is stochastic at the level of individual cells but deterministic at the population level, or the research in bird flocking phenom-ena [6], in which a maximum entropy model with local correlations between birds1 is

1_{Equivalent to the Heisenberg model of magnetism.}

(8)

Introduction 2

used to predict order propagation throughout flocks of starlings, are just a few examples in which theoretical physics played a major role in the development and understanding of biological systems.

In our research, we will focus on motor behavior at the level of the individual, hoping to get a sense into the underlying principles behind the complexity of movements and shapes that such a system can undertake. With this goal, random walk models have been applied to the motion of different organisms. At short time scales, for instance, the motility of E. coli bacteria is particularly interesting [7], revealing a “run-and-tumble” strategy, in which straight paths (runs) are interrupted by sudden reorientations of the body (tumbles). On the other hand, over long time scales, random walk models unraveled new non-Brownian diffuse dynamics, also known as Levy flights, in which the distribution of flight segments follows a power law, instead of the exponential distribution of random Brownian walkers [8].

This top-down approach was also applied to more complex living systems, driven by a neuronal network. For instance, the “pirouette model” [9] described the motility of an earth worm, the C. elegans (which will be studied in more depth in this thesis), in a way similar to the “run-and-tumble” model, with the tumbles becoming reversals or Ω turns2, that reorient the worm. The problem with such an analysis is that, even though data driven methods are being used to quantify the motile behavior, as in [10–12], there are still assumptions, regarding the stereotyped classes of behavior, that are qualitative and driven by subjective reasoning. The fact that the behavioral categories were user-defined, means that we were assuming, without showing from the data, that the organism movements can be categorized in a discrete manner. In order to get around this problem, the behavioral categories should manifest themselves directly from observation.

In [13], Stephens et al. proposed to look at the body shapes of C. elegans as the means to unravel its behavioral states by purely quantitative methods. In our research, we will also use a top-down approach to characterize the behavior of C. elegans. This living system has long been studied by scientists [14, 15], and it is considered by many as the “hydrogen atom of neuroscience”. Its neuronal network as been fully mapped and knowledge about its genetics and philogenetic tree is deep. On the other hand, it is perfectly suited for studies of motile behavior, having features, such as its slow average speed (∼ 100µs−1) and small size (∼ 1 mm in length), that make it possible for an effective sampling of its locomotive motion in the laboratory, and also its short generation time (∼ 2 days), combined with other features of the worm, that make it easy to grow and keep under laboratory conditions. Besides these characteristics, that make

2

Ω turns usually occur when the worm is subject to outside stimuli, as part of an an escape response. The worm changes its direction by performing a large body bend, in which the shape of the body resembles the Greek letter Ω.

(9)

Introduction 3

it the ideal organism for this kind of study, C. elegans also opens up the possibility to understand the links between what goes on in the brain, how genes are expressed and how the animal behaves. In fact, the only missing piece of the puzzle is a standardized, fully quantitative and unbiased description of its behavior, so that we can start to unravel the connections between these three fundamental parts of life.

In this thesis, we will achieve such a description of the worm behavior. First of all, previous attempts at characterizing C. elegans’ motile behavior will be explored and linked to the current work, focusing specially in the postural approach by Stephens et al. [13, 16, 17]. This will be the ground work to our behavioral analysis, in which we use the postural description to study the dynamics of the worm. Finding the equations of motion that govern these postural changes in time will be the key to go from a static, postural description, to behavior. Linear and non linear dynamical systems will be described and an innovative adaptive local linear model will be built, that has the power to characterize the behavioral landscape of the worm and, possibly, other organisms.

(10)

(11)

Chapter 2

From Postures to Behavior

The first step towards a fully unbiased and data driven description of behavior was taken by Stephens et al. [13], who were able to obtain a static description of the body postures using a purely quantitative method. In [13], the complexity of shapes of the worm was decoupled into only 4 modes, the eigenworms, which capture 95% of the full postural variance of the worm. Using an ethologist approach, by video microscopy of freely moving worms on an Agar plate, a low dimensional yet complete description of the worm shapes was achieved.

Using computational methods, the 2D shape of the worm was extracted from the images, at each time step, and an artificial backbone of the worm was drawn (see Appendix G), Figure 2.1. This was made possible by the fact that the thickness of the worm does not change much while it is moving, meaning that we can approximate its 2D motile behavior to the movement of its skeleton.

(12)

Chapter 2. From Postures to Behavior 6

Figure 2.1: A - Image from the tracking microscopy. B - 2D shape of the worm (extracted from the background) and corresponding back bone (the black dot indicates the head). C - Decomposition of the back bone into 100 subsequent points and mea-surement of the angles, θ. D - θ for each of the subsequent points, as a function of

where we are in the backbone of the worm, after rotation.

This backbone was then subdivided into 100 equally spaced points and the angles be-tween these subsequent points were measured, Figure 2.1(c). By rotating these angles in order to have hθ(s)i = 0, we can obtain a worm centered description that fully character-izes the shape of this backbone at each time step. Using Principal Component Analysis (PCA) on this angular description, a rotation of this angle space was found that used only 4 vectors to capture around 95% of the variance of the worm postures. Since only 4 eigenvalues of the covariance matrix of the angles, C(s, s0) = h(θ(s) − hθi)(θ(s0) − hθi)i, were significant, we can reconstruct the postural landscape of the worm by superposition of only 4 shapes, θ(s) ≈ 4 X µ=1 aµuµ (2.1)

Where uµare the eigenvectors of the new space and aµare the amplitudes of each of the

modes. However, this is only a static description, that gives the complete set of postures at each time step. Our goal is to go from postures to behavior and the key to do so will be to understand the time series that describes the dynamics of these postures in time, Figure 2.2.

(13)

Figure 2.2: From postures to behavior. In this figure you can see a sketch of our behavioral analysis. The dynamics of each of the postures (a) is described in terms of a four dimensional time series (b). A parametrization of this times series represents a quantitative description of the postural dynamics or, in other words, of behavior (c). Figure (c) was obtained from a snapshot of Data Set 3 (see Appendix G for more

information).

Figure 2.3: Joint probability distribution of the firs two modes, ρ(a1, a2). The phase

angle, φ is defined by φ = tan−1a2 a1

.

First of all, let’s explore some of the discoveries of this postural description. In [13], Stephens et al. focused on the oscillation captured by the first two modes and, by fitting a Langevin equation to the phase angle, φ (see Figure 2.3), they were able to describe the deterministic dynamics of this oscillation and reveal 4 discrete attractors that correspond to defined classes of behavior that the real worm visits, with the noise term being responsible for the transitions between these attractors. Later work, focused on the reversal of the worm direction. In [16], the decay time, τ , of the survival probability, P (τ ), that a reversal has not occurred after a delay τ was computed for a Langevin model, showing high correspondence with the real worm. In 2010, [17] pointed out the fact that “run-and-tumble” like models are inadequate for C. elegans. Focusing on the regime in which the body of the worm does not bend much, Stephens et. al established the link between the body movement and the postural modes via a nonlinear model. This resulted in the finding that, besides the discrete changes in direction, including the

(14)

Ω turns, continuous re-orientations between these sharp turns are equally important. Therefore, C. Elegans seems to control more aspects of its motion than just the abrupt events captured by “run-and-tumble” like models.

Thus, a more complete description, as an extension of the previous work based on the postural decomposition, needs to be found, so that we finally have a fully unbiased and quantitative way of discovering the classes of stereotyped behavior of an organism. One such approach was made by Berman et. al [18], who used the body postures of Drosophila Melanogaster to understand its behavioral modes. In this work, PCA analysis was used to decompose the fly shapes into eigenpostures and then by wavelet spectrogram analysis, and non linear dimensional reduction techniques, stereotyped behaviors of the Drosophila were found.

In our research, an alternative approach is described, in which dynamical systems for the postures are built in order to find stereotyped behaviors. We could have also conceived the use of non-parametric methods to characterize some features of the time series, such as its dimensionality. However, the use of parametric methods allows for a much more compete understanding of behavior. Once a parametric model is found, we can unleash its quantitative power by, for instance, making simulations, comparing different individ-uals and strains, finding stereotyped behaviors by purely data driven methods, linking our model of behavior with models for the neuronal network, etc. The possibilities are immense.

(15)

Chapter 3

Linear Postural Dynamics

In order to parametrize the postural time series, we can either use linear or nonlinear methods. We will start in the realm of linear dynamical systems, which settle the ground for more complex analysis.

3.1 Linear Dynamics

In this section we will explore the dynamics captured using linear models, of the kind of equation 3.1.

dx(t)

dt = Ax(t) xm+1 = Axm

(3.1)

This framework only allows for trajectories of the form,

x(t) = eλtu (3.2)

where λ and u 6= 0 need to be determined. In fact, u is an eigenvector of A, with corresponding eigenvalue, λ. This can be found by substituting equation 3.2 into equa-tion 3.1, λeλtu = A e λt_u λu = Au (3.3) 9

(16)

Chapter 3. Linear Postural Dynamics 10

Therefore the eigenvalues and eigenvectors of A are sufficient to characterize the dynam-ics of linear systems, which consists only of growing or decaying exponential solutions. An extensive analysis of this type of dynamical systems is made in [19] and a summary of the possible linear dynamics is drawn in Figure 3.1.

Figure 3.1: Linear dynamics according to the eigenvalues.

3.2 Autoregressive Model

In the time series framework, understanding the linear dynamics is effectively the same as fitting an autoregressive model. Work by Fao & Yao [20] and Box & Jenkins [21], on time series analysis were the ground for our understanding of the postural dynamics of the worm.

Autoregressive models of arbitrary order, AR(p), have been deeply studied and applied in many different fields of research, ranging from geosciences to finance [22, 23]. In simple terms, each point in time is modeled as a function of the p previous time steps, plus a noise term, assumed to be Gaussian white1. In our case, since we have a mul-tidimensional system, we will need a vector autoregressive process of arbitrary order, VAR(p). In 2001, Neumaier and Schneider [24] extended the results of the first order vector autoregressive process to arbitrary orders, building the dim-variate AR(p) model of equation 3.4. at= c + p X l=1 Alat−l+ ηt (3.4)

where a is a dim dimensional vector with dim time series, A is a dim × dim matrix of the coefficients, η is a dim dimensional error term, with zero mean and covariance

(17)

matrix Σ ∈ Rdim×dim and c is the intercept vector, which accounts for the non zero mean of the time series.

Information about the dynamical properties of the model is stored in the eigenvalues of the coefficient matrices Al, l ∈ {1, ..., p}. Consider a dim-variate AR(1) model in which

the mean is subtracted from the vectors (so that we can assume the intercept vector to be null),

at= Aat−1+ ηt (3.5)

A can be diagonalized into A = V ΛV−1, in which V be a matrix that has eigenvectors, Vk, as columns and Λ is a matrix in which the diagonal elements are the eigenvalues,

λk. In terms of the eigenvectors, the state and noise vectors can be written as,

at= dim X k=1 a(k)_t Vk= V a0t ηt= dim X k=1 η_t(k)Vk= V η0t (3.6)

Substituting this is 3.5 yields,

a0_t= Λa0_t−1+ η_t0 (3.7)

This can be thought of as a system of univariate models coupled only by the covariances of the noise coefficients,

a(k)_t = λka(k)t−1+ η (k)

t , k=1,...,dim (3.8)

Since the noise term is assumed to have zero mean, the dynamics of the averaged coef-ficients is

D

a(k)_t E= λk

D

a(k)_t−1E (3.9)

and decouples completely. At time t0 in the future we have,

D a(k)_t+t0 E = λt_k0 D a(k)_t E (3.10)

(18)

In the complex plane we can write λk= |λk|eiargλk. Therefore we have,

λt_k0 = elogλt0k _{= e}t 0_log|λ

k|+it0argλk _(3.11)

So we can rewrite equation 3.10 as a spiral in the complex plane,

D a(k)_t+t0 E = e−t0/τk_e(argλk)it0 D a(k)_t E (3.12) where, τk≡ −1 log |λk| Tk ≡ 2π |arg λ_k| (3.13)

are the damping time and period, respectively, measured in units of the sampling rate of the time series and argλk= Im(logλk). To ensure that we have a single period for a given

pair of complex conjugate eigenvalues, we demand that −π ≤ arg λk≤ π. This way, the

linear dynamics can be completely described by the eigenvalues, λk, as in Figure 3.1.

This result can be extended to arbitrary orders. In fact, equation 3.4 can be rewritten in the same form as an AR(1) process,

at= ˜at−pωp+ ηt (3.14) where ˜at−p= (1 at−1at−2· · · at−p) (3.15) ωp=           c A1 A2 .. . Ap           (3.16)

(19)

According to [24], after obtaining the coefficient matrices, Al, l ∈ {1, 2, ..., p}, we need

to combine them in the Frobenius form, equation 3.17, in order to get the eigenvalues λk and the respective dynamics.

ˆ Ap =           A1 A2 · · · Ap 1 0 · · · 0 0 1 · · · 0 .. . ... . .. 0 0 0 1 0           (3.17)

For a stable AR process with non singular coefficient matrices, Ap, the absolute values

of the eigenvalues lie between 0 and 1, 0 < |λk| < 1. This implies that all damping times,

τk, are positive and bounded. On the other hand, depending on the real and complex

part of the eigenvalues we will have different dynamics, as you can see in Figure 3.1.

3.3 Linear Dynamics of the Worm

In this section we will build a linear dynamical system from the worm time series, by least squares fitting of equation 3.14, see Appendix A. For this analysis, data from an N2 strain sampled at 32Hz is used, see Appendix G for a description of the methods used in Data Set 2.

Just to get a sense of how the linear dynamics of the worm look like, we will examine the result of fitting a first order autoregressive process, AR(1), to the data. In fact, the use of higher order models seems tempting but, according to Appendix B, there is not much to be gained by increasing the order of the model. First of all, let’s look at the eigenvalues of the obtained coefficient matrix, Figure 3.2.

(20)

Figure 3.2: Dynamics of the AR(1) model of the worm postural data. a - Eigenvalues in the complex plane; b - Periods (T ) vs Damping times (τ ).

As you can see, we obtain two oscillatory modes that seem to be close to the instability boundary, represented by the unit circle in Figure 3.2(a). As explained in Appendix B, the effect of increasing the order of the model only adds short lived modes, so that the most important dynamics is still captured by these eigenvalues, which represent oscillations. This behavior is particularly interesting, since it links to other research in biophysics that points to the fact that biological systems have a tendency to be near a critical state [25]. Even though we do not have a clear measure of closeness, the fact that the eigenvalues seem to be close to the unit circle means that there might be some sort of self regulated dynamical criticality, promoted by evolution to make the worm more capable to react to outside stimuli and therefore survive. In Figure 3.2(b) we can see the obtained periods and damping times of the oscillations. One of them has a wave period of T = 13.06s and is damped after τ = 10.63s while the other is damped after τ = 6.24s and has a period of T = 2.61s. Therefore, the first wave is not even completed, while the second is still capable of making at least two complete oscillations. It should be noted that, even though we are using one of the simplest models possible, the oscillation period of the second oscillation we have just described agrees well with the average frequency of the locomotor wave of the worm.

3.4 Faithfulness of the Model

One way to check if our model is correctly capturing the real dynamics is to ensure its self-consistency by looking at one of the assumptions of the AR process itself, namely the properties of the noise (Gaussian and white). In Appendix B.3, we have examined the whiteness and Gaussianity of the noise, for higher order models, and we have shown that, even though the noise becomes whiter as we increase the order of the model, we

(21)

cannot get Gaussian white noise even for higher orders. This means that we are not able to reproduce one of the assumptions of the AR process itself.

Another method to check the faithfulness of the model, is check if it accurately recon-structs the oscillatory pattern of the first two modes, Figure 2.3. In Figure 3.3, you can see the reconstructed joint probability distribution of the first two modes, obtained after a simulation of an AR(1) process fitted to the data. This simulation is done by obtaining the AR(1) coefficients, θ = (c, A, Σ), for different individuals and then averaging across the worms to obtain the average AR(1) dynamics. Using these averaged parameters in equation 3.4, with η being generated by a random number generator with covariance matrix Σ, we are able to make the simulations.

Figure 3.3: Joint probability distribution of the first two modes for an AR(1) simu-lation.

It is clear that the oscillation of Figure 2.3 disappears. Since the residuals are not Gaussian white and the simulations do not capture one of the most characteristic features of the worm motion we are inclined to conclude that the global linear model is not a faithful representation of the worm behavior. Thus, it seems as if non linear analysis is needed.

(22)

(23)

Chapter 4

Linearity

Before delving into nonlinear methods, it is useful to understand if our data is generaly nonlinear. In order to do that, we have developed surrogate methods based on Fourier analysis that provide the best possible linear approximation of the data. In this section, we will explore the methods used to generate linear worms (surrogates of our observations that keep all of its linear properties) and then we will use test statistics to understand if our data is nonlinear or not.

4.1 Surrogate Methods

In order to create linearized surrogates we need to preserve the second order properties of the observed data: the mean and the autocovariance. According to the Wiener–Khinchin theorem, the power spectrum and the autocorrelation function of a stationary random process are Fourier transforms of each other. Therefore, if we take the Fourier transform of the data and we keep the squared amplitudes (power spectrum) while randomizing the phases, we will obtained surrogates of the data that keep the mean and autocovariance function of the original data, thus obtaining a linearized surrogate of the observations. Consider x = (x1, ..., xN), a time series with mean 0 and length N . Its discrete Fourier

transform is, f (ω) =√ 1 2πN N X t=1 e−iωtxt, −π < ω < π (4.1)

Assuming that the transform is obtained for discrete frequencies, ωk = 2π/N , k =

1, 2, ..., N , the original time series can be obtained by performing the inverse transfor-mation,

(24)

Chapter 4. Linearity 18 xt= r 2π N N X k=1 eiωkt_{f (ω} k), t = 1, 2, ..., N (4.2)

The FT-surrogates method constructs a new time series, yt, with the same power

spec-trum as xt, by fixing the Fourier amplitudes while randomizing the phases,

yt= r 2π N N X j=1

|f (ωk)|eiωkteiφrand(ωk) (4.3)

Therefore, the FT-surrogates method generates linearized time series with the same linear properties of the observed time series by taking the Fourier transform, keeping the amplitudes and transforming back while randomizing the phases. However, this method generates samples for the null hypothesis that the time series is a stationary Gaussian process, which might not be the case for the worm data. Thus, in order to build the most general linearized surrogates, without these assumptions regarding the nature of the time series, we need to extend this results to allow for more general processes.

4.1.1 Univariate Amplitude Adjusted Fourier Transformed Surrogates (AAFT)

One approach that deals with more general time series was proposed by Theiler et al. (1992), [26]. This algorithm generates surrogate data sets that assume that the observed time series is a monotonic nonlinear transformation of a linear Gaussian process. There-fore, there exists an underlying time series, yt, consistent with the null hypothesis of

linear gaussian noise, and an observed time series, xt, given by,

xt≈ h(yt) (4.4)

Thus, we shuffle the time-order of the data but in such a way as to preserve the linear correlations of the underlying time series, yt= h−1(xt).

The algorithm goes as follows,

(25)

Chapter 4. Linearity 19

2. Reorder the time sequence of the Gaussian noise to match the ranks1of the original time series

3. Phase randomize the reordered time series, obtaining y_t0

4. Reorder the original time series so that it follows y_t0, obtaining x0_t.

Asymptotically, for N → ∞, the second step is equivalent to the application of the true inverse function h−1. This way, we are only performing the FT-surrogates method to Gaussian stationary time series, while keeping the original amplitude distribution.

4.1.2 Improved Univariate Amplitude Adjusted Fourier Transformed Surrogates (IAAFT)

The AAFT algorithm should be correct asymptotically in the limit N → ∞. For finite N , however, x0_t and xt have the same distributions of amplitudes by construction, but

they do not usually have the same power spectra. This happens mainly due to the fact that the estimator of the inverse function, ˆh−1 does not match exactly h−1. The differences δ(xt) = ˆh−1(xt) − h−1(xt) are independent of t and possess a white spectrum

Therefore, finiteness adds uncorrelated random numbers to the original data,

ˆ

h−1(xt) = h−1(xt) + δ(xt) (4.5)

that flatten the power spectrum of the AAFT surrogates. Therefore, the two rescalings of the previous algorithm will lead to an altered spectrum. In order to solve this issue, we need to perform these steps alternatively in an iterative process. The algorithm goes as follows [27],

1. Store a sorted list of the original time series, xs_t, and the square amplitudes of the Fourier transform of xt, |Sk|2. At each iteration step (i), we have a sequence d(i)n

that has the correct distribution and a sequence a(i)n that has the correct Fourier

amplitudes

2. Start iterative process with random shuffle of the data, d(0)n

(a) Take Fourier Transform of d(i)n

(b) Transform it back, replacing the amplitudes, ˆSk, with the ones of the original

data, but keeping the phases, obtaining a(i)n

1

Matching the ranks between two time series means that the i-th smallest value of the first time series will be in the same position as the i-th smallest value of the second time series.

(26)

(c) Reorder a(i)n in order to match the temporal sequence of the original time

series, obtaining d(i+1)n . A fixed point, in which d(i+1)n ≈ d(i)n , is heuristically

expected to be achieved for large (i).

(d) Check the remaining discrepancy of the spectrum and iterate until given accuracy is reached. The discrepancy test is given by,

D = N −1 X k=0 ˆ_S(i) k − Sk 2 / N −1 X k=0 S_k2 (4.6)

Where ˆS_k(i) are the power spectra at each iteration i, Sk is the power spectrum of the

original time series and k represents the Fourier modes.

4.1.3 Multivariate Phase Randomization

In the case of a multidimensional time series, besides keeping all the linear properties of each of the individual time series we also need to reproduce the linear correlations between them. Theiler and Prichard (1994) [28] have shown that this can be achieved by adding the same random phase to each dimension. In fact, think of the following. Suppose that we have M time series, {x1(t), x2(t),...,xM(t)}, and their respective Fourier

transforms, {X1(t), X2(t),...,XM(t)}. By an extension of the Wiener-Khinchin theorem,

the Fourier transform of the cross-correlation function is the cross spectrum,

X_j∗(t)Xk(t) = Aj(t)Ak(t)ei[φk(t)−φj(t)] (4.7)

Therefore, in order to preserve all the linear auto and cross correlations we need to fix X_j∗(t)Xk(t) for all pairs j, k.This is achieved by adding the same random sequence ψ(t)

to φj(t), for all j. Therefore, the multivariate phase randomized surrogate would be

given by,

˜

xj(t) = F−1{Xj(t)eiψ(t)} (4.8)

Where F−1 represents the inverse Fourier transform.

4.1.4 Multivariate IAAFT (MIAAFT)

In this section, we focus on an extension of the improved algorithm by Schreiber and Schmitz [27], to multiple dimensions. In order to do that, we look at the cross spectra,

(27)

in addition to the power spectrum of each of the dimensions. Theiler and Prichard (1994) [28] have shown that the cross spectrum reflects relative phases only. However, they did not discuss the possibility of combining multivariate phase randomization with amplitude adjustment.

Recall from section 4.1.2 that the iterative algorithm consists of two steps which are applied alternatively until a given discrepancy is achieved. The amplitude adjustment procedure is readily applied to each dimension separately. The spectral adjustment, however, has to be modified.

Let us introduce a second index, m, to denote the M different dimensions of the time series. The change in the IAAFT algorithm described previously comes in step 2.b), where besides replacing the amplitudes by ˆSk,mwe also need to replace the phases ψk,m

with φk,mwith certain properties (note that we are not explicitly writing the superscript

that denoted each iteration by convenience).

The replacement should be minimal in the least squares sense, i.e. it should minimize

hk= M X m=1 e iφk,m_{− e}iψk,m 2 (4.9)

On the other hand, as we have seen, the new phases must reproduce the phase difference in order to preserve the cross spectrum. Therefore,

ei(φk,m2−φk,m1) _{= e}i(ρk,m2−ρk,m1) _(4.10)

where ρk,m are the phases of the original data. The last equation can be satisfied by

setting φk,m= ρk,m+ αk. Substituting in equation 4.9 yields,

hk= M

X

m=1

(2 − 2cos(αk− ψk,m+ φk,m)) (4.11)

That is extremal when,

tan(αk) = PM m=1sin(ψk,m− ρk,m) PM m=1cos(ψk,m− ρk,m) (4.12)

The minimum is selected by taking αkin the right quadrant. In order to make everything

(28)

1. Make a copy of the original data, take its Fourier Transform and save the ampli-tudes and phases

2. Starting with a random shuffle of the data, start an iterative process: (a) Fourier Transform the surrogate and get the amplitudes and phases (b) Compute αk using the new and original phases

(c) Compute the inverse Fourier Transform with the replaced amplitudes and the newly computed phases

(d) Match the rank order of the generated surrogate with the one from the original data

(e) Compute the discrepancy

D = M X m=1 1 M N −1 X k=0 ˆ s(i)_k,m− sk,m 2 / N −1 X k=0 s2_k,m ! (4.13)

Where ˆs(i)_k,mare the log amplitudes of the Fourier transform at each iteration (i) and sk,m

are the log amplitudes of the Fourier transform of the original observations, for each of the Fourier modes, k, and dimensions, m.

4.2 Testing the method using known system

Now that we have described the best method used to build reliable surrogates of our data, we will test it using the van der Pol oscillator as an example.

The Van der Pol oscillator evolves in time according with the equation,

¨

x2− µ(1 − x2) ˙x + x = 0 (4.14)

where ˙a = da/dt, x is the position, which is a funtion of time, t, and µ is a scalar parameter indicating nonlinearity and the strength of the damping. It was first identified in electrical circuits but it has also been used to model biological systems, such as nerve membranes [29].

(29)

Figure 4.1: Van der Pol oscillator for different initial conditions (all of them start at x0= 0 and different values of ˙x0), using µ = 0.5.

As you can see in Figure 4.1, for (x, ˙x) = (0, 1) we have a limit cycle, for which the joint probability distribution is drawn in Figure 4.2.

Figure 4.2: Joint probability distribution for the limit cycle of the van der Por oscil-lator. A kernel density estimate is used.

To make sure that we are generating reliable linear surrogates of our observations, we should check the auto and cross correlation functions of the obtained surrogates and compare them with the ones obtained from the original data. This analysis was made for µ = 0.5 and the result is in Figure 4.3. As you can see, the correlation functions coincide, meaning that we are recovering the linear properties of the original observation, as required.

(30)

Figure 4.3: Auto and cross correlation functions of the van der Pol data and the generated surrogates, obtained for a discrepancy level of 1 × 10−5. The diagonal plots correspond to the autocorrelation functions, while the off diagonal correspond to the

cross-correlations.

Now that we know that we can generate reliable linear surrogates of the original data, we want to use a test statistic to check if the data is linear or not. The idea is that if we measure some statistic of the time series that is dependent in the non linear properties of the data, we can differentiate between our linear surrogates and the real observations. According to previous research [27], time reversibility in combination with surrogate data is an effective method to detect nonlinearity in a time series. A time series is said to be time reversible if for each t ∈ {0, 1, ..., N } and τ ∈ R, (x0, x1, ..., xN) and

(xτ, xτ −1, ..., xτ −N) have the same probability distribution.

As shown in [30], linear processes present symmetry under time reversal. This can be easily understood, because the power spectrum of a linear process is time invariant, since all time dependence is stored in the phases of the Fourier Transform. The time reversal test statistic is defined in [31] as,

φrev(τ ) = PN n=τ +1(xn− xn−τ) 3 h PN n=τ +1(xn− xn−τ)3 i3/2 (4.15)

(31)

This is a measure of time reversibility because in case the process is time reversible, we have, N X n=τ +1 (xn− xn−τ)3 rev= N X n=τ +1 xr_n− xr_n−τ3 N X n=τ +1 (xn− xn−τ)3= N X n=τ +1 (xn−τ − xn)3 ∴ φrev(τ ) = 0 (4.16)

whererev= represents equality under an inversion of the direction of time and xr is the time reversed x. Therefore, if φrev_{(τ ) is different than 0, it must mean that the time}

series is not time reversible and is, thus, nonlinear.

We can now use our van der Pol oscillator to check if this procedure accurately captures non linearity in data. This can be done by changing the parameter µ in equation 4.14, which, as we have seen, is a measure of the nonlinearity of the dynamical system. Using as initial conditions (x, ˙x) = (0, 1) we have obtained the probability distribution of the time reversibility test statistic of N generated surrogates and N subsamples of the original data, for different values of µ. As you can see in Figure 4.4, where we plot the obtained mean values with the error bars (which are a function of the variance), looking at the first two moments of these distributions is enough to tell the difference between linearity and nonlinearity.

Figure 4.4: The linear observation was generated using µ = 0 in equation 4.14 while the nonlinear observation was generated using µ = 5. As you can see, for the linear case, the mean of the time reversibility test statistic overlaps for the observation and the surrogate. The same does not happen in the nonlinear case, meaning that this test

(32)

4.3 Linearity of the Postural Time Series

Using the same procedure as for the van der Pol oscillator, we will examine if the worm postures time series is nonlinear. The obtained distributions are shown in Figure 4.5.

Figure 4.5: Distribution of the time reversibility test statistic (using τ = 10) for obtained for the four modes and the respective linearized surrogates

As you can easily see, the distributions differ. In fact, we have obtained a Jensen Shannon divergence (Appendix D.1) between these 4-dimensional distributions of 1. Therefore, we can conclude that the time series of the worm postural movement is non inear.

(33)

Chapter 5

Nonlinear Worm

As we have seen, the time series of the worm postures is nonlinear. Therefore, if we want to model worm behavior, we need nonlinear methods. The first and simplest approach is to simply introduce nonlinear terms, such as polynomials or cross terms, into the AR process of equation 3.4. at= c + p X l=1 Alat−l+ pmax 1 X p1=1 pmax 2 X p2=1 p X l=1 Bp1p2,la p1 t−la p2 t−l+ ηt (5.1)

However, this was harder than it may seem. First of all, we lose the interpretability of linear dynamical systems, allowing for more complex solutions. In the non linear case, we no longer have a simple eigenvalue description of the dynamics, and the only way of understanding the model is to make simulations. This proved to be a difficult task, because introducing more complex terms resulted in dynamical instabilities. We tried to get rid of these instabilities by using different penalty functions in the least squares fit, such as Lasso or Ridge regression, which increases the sparseness of the coefficient matrices1. However, we had to increase the sparseness of the coefficient matrices so much, in order to have stable dynamics, that the quality of the fit, measured by the coefficient of determination, R2, was even worse than what we have obtained for the linear dynamics. Therefore, we could not find a balance between using sparse methods and our ability to simulate that would compensate adding nonlinear terms. In conclusion using a global nonlinear model of the kind of equation 5.1 would not improve our knowledge of the worm behavior because, besides losing the interpretability of linear dyamical systems, we could not achieve better qualitative understanding of how the worm behaves.

1

In a sparse matrix most of the matrix elements are zero. 27

(34)

(35)

Chapter 6

Local Linear Model

Instead of trying more general nonlinear methods, we thought about leveraging the simplicity and interpretability of linear dynamical systems, but now looking locally in time. For short enough windows, we can build local linear models that, sewed together, should reconstruct the full nonlinear dynamics of the worm. Consider that the full dynamics is characterized by a oddly shaped manifold. Building a local linear model is effectively the same as looking locally enough at the manifold surface so that we can use a flat space approximation. Then, after learning how to glue these flat spaces together we are able to reconstruct the full complex manifold. Therefore, in this section, we will built flat approximations of the postural dynamics, by looking at small enough windows of the time series, so that we can then effectively reconstruct the full nonlinear dynamics of the worm postures. As we have seen, there is not much to learn by going into higher order models in the long scale dynamics. Therefore, we will keep our analysis as simple as possible by using an AR(1) process to fit the data. In fact, in Appendix C we show that for short windows the fit behaves much better, giving faithful quantitative models of the worm behavior, even for an order 1 model. On the other hand, in Appendix E we show that, even in cases in which the underlying process generating the observations has higher orders, using an AR(1) model for the adaptive windowing procedure gives better forecast accuracy, meaning that we are better able to reproduce the real time series and, thus, the postural dynamics.

Constructing local linear models seems to be a straightforward procedure, we only need to slice the time series into short enough windows and then fit an AR(1) model, as discussed previously. Solovey et al. [32], used such an approach to study the dynamics of ECoG potentials in human brains, finding that the modes lived close to the stability threshold, coming in and out of instability and equilibrating the dynamics in a fast

(36)

Chapter 6. Local Linear Model 30

manner. This is yet another example of what was described by Mora et. al, in [25]: self-regulated dynamical criticality in biological systems.

However, one fundamental question arises: how short is short? It is clear that we need to pick a window size and there are two issues that come with this approach. The first is the fact that, by choosing a window size, we add bias to our analysis, which we want to get rid off. On the other hand, it might happen that within the same window the worm shows body movements that can be easily disentangled by naked eye. Ideally, we would like for each window to capture specific dynamics, rather than having windows in which there is a large dynamical change, so that each obtained coefficient matrix corresponds to a specific kind of dynamics and not an average between two types of movement. The best way to do this is to find a procedure that automatically selects the window sizes, in an adaptive manner, so that within each window we have a specific kind of movement of the worm’s body. Chen et al. (2013) [33] developed an adaptive windowing procedure to build local autoregressive models without a priori fixing the window size. Likelihood measures were used to find when abrupt dynamical changes occur, using Monte Carlo methods to find the right threshold over which there should be a break. In this thesis we use some of the main structural ideas of this procedure, but with some important differences, mainly in the computation of the threshold and in the window selection procedure. Another similar approach was used by

6.1 Adaptive Windowing

In our understanding, a new window is needed whenever the dynamics changes signifi-cantly, requiring a new set of coefficients to model it. Across a possible boundary, see Figure 6.1, our goal is to understand how likely it is for the larger window, Ik+1, to be

(37)

Figure 6.1: Schematic of the adaptive windowing procedure. A possible dynamical break between Ik and Ik+1is shown in red. Understanding if θkis likely to model Ik+1

will be the key to understand if a dynamical change is occurring.

In order to do that, we will fit equation 3.4, with p = 1, for a set of windows Ik,

k ∈ {1, ..., K}, giving θk = (c, A, Σ) and then we’ll look at the likelihoods, or at the

log likelihoods, equation 6.1, of θk+1 given Ik+1 and θk given Ik+1, to understand if the

change is above a certain threshold.

l(θ|Ik) = mt 2 ln|2πΣt| − 1 2 t X s=t−mt+1 ηt_sΣ−1_t ηs (6.1)

where mt is the size of the interval. Therefore, we are effectively computing likelihood

ratios:

R = L(θk+1|Ik+1)

L(θ_k|I_k+1) = l(θk+1|Ik+1) − l(θk|Ik+1) (6.2)

The idea is that if R is above a certain threshold, T, we should start a new window, because it means that the dynamics change significantly. Therefore, we are looking for the intervals of local homogeinity in the dynamics1. The problem is how to choose a threshold.

Since our goal is to provide a standardized and unbiased procedure to describe behavior, the threshold should be found while adding the least bias possible. In order to do that, we will compare the likelihood ratio obtained from the real data with a null distribution

1_{Dynamical homogeinity means that we can model all the observations of a given window with the}

(38)

of likelihood ratios under the assumption of local homogeinity in order to understand if the change in likelihoods from the real data can be explained by modeling bias. The idea is that we assume that the new window, Ik+1, is generated by the same set of

coefficients of the previous window, θk. After obtaining the distribution of likelihoods

given this assumption, we compare it with the likelihood ratio obtained from the real data to check if it is statistically reasonable that the new observations are generated by the same coefficient set or, in other words, that the new observations correspond to the same dynamics. If not, it means that we have to split the time series and start a new windowing procedure, with different dynamics. The algorithm is the following:

1. Compute the likelihood ratio

(a) Fit AR(1) model to Ik and Ik+1, giving θk and θk+1

(b) Using these values, compute the likelihood ratio, R 2. Create null distribution

(a) Generate N observations of size Ik+1 using coefficients θk, Ik+1γ

(b) Fit AR(1) model to the recently generated time series, obtaining θγ_k and θγ_k+1 (c) Compute the null likelihood ratios, Rγ for each observation. In the end, we’ll have a null distribution of how the likelihood should change under the assumption of no model change, rnull

3. Check if R ∈ rnull

(a) Compute the nthpercentile of the null distribution, which will be our

thresh-old, T. If R <T, increase the window size, otherwise start new window and store Ik. If we reach the maximum window size, IK, store IK and start a new

window.

Note that there are still some issues we need to discuss regarding this procedure. First of all, we still have to select the percentile of the null distribution. However, we have seen that the choice of this value is pretty robust, meaning that we can select a wide range of percentiles and still obtain the same result. This analysis is made in Appendix F. On the other hand, choosing a correct range of window sizes Ik is also fundamental.

We want the minimum window size to be sufficiently small so that the dynamics within it are approximately homogeneous but, at the same time, it cannot be arbitrarily small otherwise we do not have enough observations to make an accurate fit of the data. The maximum window size cannot also be too large, otherwise we would need to add too many observations in order to capture a dynamical change. On the other hand,

(39)

the step between different window sizes also needs to be adapted. In general terms, it is heuristically understandable that if we have a long time series and we compare this with another one in which we add only a small fraction of new observations, the result of the fit will approximately be the same because the new observations will have an insignificant effect and we would not capture the dynamical change. Therefore, for longer windows we need a larger step. This also establishes the need for a maximum window size, because if the step is larger than the shortest window we are no longer able of ensuring homogeneity. The choice of Ik was shown to be pretty robust, as you can

see in Appendix F.

Note, however, that there is actually no fundamental drawback in imposing a maximum window size since further analysis allows us to sew together the artificial breaks between windows that have the same dynamics. This is done by examining the dynamics between each pair of consecutive windows, in a procedure similar to the one just described, to understand if the dynamics changes when going from one window to the other. If the dynamical change is bellow a certain threshold, it means that we can glue two windows together because the break was indeed artificial. This is done in a iterative process, until the number of windows converges to a constant value, after all the artificial breaks are found. Note, however, that we have to implement some changes in our algorithm, since now, instead of comparing subsequent windows of increasing size, we are comparing the dynamics between large windows. It might happen that we actually get negative likelihood ratios between two subsequent windows, see Figure 6.2, meaning that we could concatenate two windows that are dynamically different.

(40)

Figure 6.2: In our windowing procedure, we would compare the likelihoods of the co-efficients obtained in the two windows. However, in this case for instance, the likelihood of modeling window 2 using the coefficients obtained in window 1 will be probably larger than using the coefficients obtained in window 2, because these would poorly capture the dynamics in window 1 due to the averaging that comes from the new observations after the dynamical break. This means that we would actually get a negative likelihood ratio and, therefore, according to our model these windows would be glued together

because the likelihood ratio would not be above the threshold, since it is negative.

In order to get around this issue, we take the last I0 observations of the window before

the break and the first IK−IK−1observations of the window after the break. In this slice

of the time series, we can compute likelihood ratios over subsequent windows, similarly to what was done before, to check if a real break occurs around the point in which the first windowing procedure finds a break. If the first adaptive windowing works fine, we should only have real breaks after the ones found by the algorithm, because when we are comparing subsequent windows, if a dynamical break exists when comparing Ikwith

Ik+1 we keep Ik. This means that a real break can only exist in the interval IK− IK−1

after the break found by the model, because this is the largest step we can take when comparing subsequent windows. On the other hand, we take I0 observations before the

break to account for the shortest possible window. If we took a larger Ik we could

be already getting observations from a different window. To summarize, the iterative artificial break finder goes as follows:

1. Start with the N breaks found with the adaptive windowing procedure, wi

2. Take I0 observations before wi and IK− IK−1 after wi and concatenate, creating

(41)

3. Using ik, k ∈ {0, 1, ..., L}, where iL is the largest window before we reach the size

of xb, compute likelihood ratios between subsequent windows, rkb and compare it

with the threshold obtained via the null likelihood ratio distribution, Rk_b, similarly to the adaptive windowing procedure

4. If rb > Tb a real break occurs. If not, we can safely glue the windows before and

after wi, reducing the number of breaks

5. Check if an artificial break was found by comparing the new number of breaks, N0, with N . If N = N0, terminate the loop.

Finally, the full dynamics is obtained by fitting an AR(1) model to the newly found windows.

6.2 Toy Model

We will check our adaptive windowing procedure by generating a time series in which we artificially introduce a model change. This is done by using different parameters at specific times, so that we know exactly where the break occurs, Figure 6.3.

Figure 6.3: Example of a time series for which we introduce a dynamical change by using different parameters before and after the break

(42)

Chapter 6. Local Linear Model 36 θt=                   c = (0.093, −0.314) , A =   0.989 −0.005 0.062 0.853  , Σ =   0.09 −0.0008 −0.0062 0.0085     t ≤ 200.  c = (2.8, 3.53) , A =   0.493 0.154 0.523 0.463  , Σ =   0.05 −0.108 −0.062 0.158    , 200 < t ≤ 400. (6.3)

In Figure 6.4, we plot the likelihood ratios and the threshold defined as the likelihood ratio that corresponds to the 95th percentile of the null distribution (under the assump-tion of dynamical homogenity), starting from t = 180 frames. It is easy to see that, right before the dynamical break, the value of the likelihood ratio obtained from the data falls off the distribution, meaning that the null hypothesis of dynamical homogenity can be rejected and that a new window and model are needed, with different parameters.

Figure 6.4: Log likelihood ratio of the original time series and threshold defined as the likelihood ratio that corresponds to the 95th percentile of the null distribution. Right before the break the likelihood obtained from the real data falls off the null distribution

meaning that a significant dynamical change is occurring.

The final test of this procedure will be to generate multiple time series, for which we know where the dynamical break occurs, and then look at the raster plot that represents the obtained breaks for each of the surrogates, so that we get a sense of how well this procedure behaves at capturing the break. The result is in Figure 6.5.

(43)

Figure 6.5: Raster plot of the breaks obtained by our windowing procedure on the toy model. A dynamical break was introduced at t=200 frames. Other breaks appear

due to statistical noise.

As you can see, every time series breaks at around t = 200 frames, as expected. On the other hand, it is also noticeable that some other artificial breaks still appear in the raster plot, mainly due to statistical fluctuations.

We can use the result of the adaptive windowing across multiple surrogates to get a sense of the characteristic dynamics that these have in common. In order to do that, we will cluster the obtained coefficient matrices using t-SNE, [34], and then look for the peaks on the obtained two dimensional map (see Appendix H for further explanation). The result is in Figure 6.6,

(44)

Figure 6.6: Resulting peaks of the 2D map of the obtained coefficient matrices. These clusters were obtained using a perplexity level of 10 in our t-SNE routine and the peaks were found by taking the joint probability distribution of the mapped points and looking

at the corresponding local maxima.

Since each point in this 2D map corresponds to a coefficient matrix of a given dynamics in a certain time span, we are able to trace back the windows in which these dynamics were occurring and we found that the two distinct peaks correspond to the two different parts of the simulated time series and that there is a match between the obtained dynamics and the one generating the observations, Figure 6.7.

(45)

Figure 6.7: Eigenvalues of the coefficients matrices corresponding to each of the peaks obtained after the windowing procedure. b and a symbolize the dynamics before and after the break, calculated directly from the input coefficient matrices of equation(33),

and ˆb, ˆa represent our estimation of the dynamics within the same windows.

The slight offset between the eigenvalues can be explained by finite sampling.

6.3 Worm Behavior: Escape Response

In this section, we use our windowing procedure on real data about which we know that a dynamical change should occur at a specific time step. C. elegans has been described to respond strongly to adverse stimuli, such as heat shocks. The escape response is usually made out of three distinct parts: the worm stops its forward movement, then reverses for one to two and a half body lengths and performs an Ω turn, changing direction and escaping, [35].

We build a local linear model among 92 worms that go through such an escape response at t = 200 frames, Data Set 3 (Appendix G). In Figure 6.8 you can see an example of the time series that corresponds to the escape response.

(46)

Figure 6.8: Time series of each of the modes for the escape response. Note that the heat shock is applied at t = 200 frames.

Even though we have not noticed this for the toy model, with Data Set 3, fitting in windows that were too short gave ill conditioned results, as can be notice by looking at the condition number2 of the obtained covariance matrix of the residuals. What hapens is that if the condition number of the predictor matrix of the least squares fit, ˆaT_t−pˆat−p,

is too large, we will have an almost singular matrix. This means that really small perturbations in this matrix will be amplified when we take the inverse and, therefore, our estimation of the parameters is less reliable. This can be explained by the fact that, over such short windows, the dynamics might be too stereotyped over a few dimensions, meaning that we are in presence of colinearity. Therefore, in order to trust our parameter estimation, we need to introduce a cutoff in the allowed condition number. It turns out that this cutoff actually represents a data driven way of selecting a minimum window size, because we need to increase the window size enough in order to lower the condition number bellow the defined cutoff.

In Chapter 2, we have seen that we can get a sense on the quality of our model by looking at the joint probability of the first two modes. In Figures 6.9 and 6.10, you can see the window size distribution and the joint probability distribution of the first two

2_{The condition number is defined as the product of the norm of the matrix with the norm of its}

inverse. It basically measures the distance between the largest and smallest eigenvalue. In case the condition number is too large (>∼ 10dim_{), we say that the least squares problem is ill defined because}

(47)

modes (by making simulations with the obtained windows and parameters) for different values of the condition number.

Figure 6.9: Window size distribution obtained for different condition numbers. As you can see, an increase in the condition number leads to an overall decrease in the

window size distribution.

Figure 6.10: Joint probability distribution of the first two modes obtained for simu-lations made with different values of the condition number (a,b,c) in comparison with

real data (d).

It is easy to see that the window size distribution is inversely proportional to the con-dition number, Figure 6.9. On the other hand, it is interesting to note that even for a condition number of 1000, the joint probability distribution of the first two modes, Figure 6.10, looks a lot more like the one obtained from real data than what we have obtained by fitting an AR(1) model to the global time series, in Chapter 2. Therefore, as you can see by this simple example, the adaptive window procedure gives a much more accurate model of the worm behavior than the global linear model. In order to get a more quantitative sense of the difference of the quality of the models, we have computed the Jensen-Shannon divergence between the four dimensional joint probability distribu-tions of the four modes from the obtained simuladistribu-tions of the different models and the ones obtained from real data, Table 6.1. An increase in the condition number leads to a better match between the real observations and the simulations.

(48)

Cond 100 Cond 1000 Cond 10000

JS1/4 0.245 0.076 0.017

Table 6.1: Jensen Shannon divergence (with weigths 1/4) of the four dimensional probability distribution of the modes between each of the simulations and the real

data.

Here, we will extensively explore one example of adaptive windowing procedure applied to the escape response. In this case, as opposed to the toy model, the raster plot has a lot more breaks, meaning that the linear dynamics is only capable of modelling short windows of behavior. In Figure 6.11, you can see the obtained raster plot for a condition number of 10000.

Figure 6.11: Raster plot of the dynamical breaks in the escape response data set. An adverse stimuli was administered to the worm’s head at t=200 frames, provoking a dynamical change. The window sizes tend to be shorter after the heat shock, where more dynamical changes occur. This can easily been seen by looking at the density of breaks as a function of time, for which there is an increase right after the hear shock.

There is an increase in the density of breaks after the heat shock is applied, corresponding to more frequent dynamical changes. This is in accordance with what we were expecting from observation. Before the heat shock is applied the worm is crawling forward with a steadier behavior than the one that happens after the heat shock, in which, as we have described, the escape response takes place. Variability across worms explains the difference in the break positions between individuals. In order to get a sense of the dynamics of the worm behavior and, similarly to what has been done for the toy model, we have clustered the obtained coefficient matrices across worms using a non linear

(49)

dimensional reduction technique, t-SNE. The clusters are then identified by peak finding in this low dimensional space, Figure 6.12.

Figure 6.12: Low dimensional clustering of the space of coefficient matrices (using a perplexity level of 10 in the t-SNE). Each matrix corresponds to a different kind of behavior and, thus, the peaks in this plot correspond to the most common movements

performed by the worms.

Each of these peaks corresponds to specific behavior that occurs in a specific time span. The dynamics of each of these peaks is represented using the corresponding eigenvalues in Figure 6.13.

Figure 6.13: Eigenvalues of the coefficient matrices of each of the peaks.

As we have done before, we can trace back exactly when each of the dynamics occurred. In fact, the peak represented in black in Figure 6.12 corresponds to a type of behavior that occurs mostly before the heat shock is applied and in which the worm is basically not crawling, but just performing head or tail swings. This is probably why, in Fig-ure 6.13 we have two relaxations (one of them unstable) that should correspond to the first and second modes (which are oscillatory in case the worm is crawling), and an

(50)

unstable oscillation that should correspond to quick head or tail swings. The second peak, represented in red in Figures 6.12 and 6.13 occurs right after the hear shock, and corresponds to the reversal mechanism, in which the worm crawl backwards. As you can see in Figure 6.13, we have two stable oscillations that correspond to the two pairs of complex conjugate eigenvalues. The third peak, represented in green in both figures, corresponds to the omega turn itself and, as you can see, is composed of one unstable pair of complex conjugate eigenvalues and a stable oscillation. The fourth peak, rep-resented in blue, corresponds to the body oscillation right after the omega turn. As you can see, the eigenvalues corresponding to this body movement are slightly unstable, probably because the amplitudes of the modes changes too quickly due to a deep body wave.

Figure 6.14: Snapshots of the behaviors that correspond to each of the peaks that we have found for the escape response.

(51)

Chapter 7

Conclusion

The main goal of the thesis was achieved: to find a data driven way of extracting behavioral classes directly from observation, with minimal assumptions. It should be noted, though, that in the path towards this goal, some interesting discoveries were made.

First of all, we have shown that a linear dynamical system applied to the postural dynamics dails to capture the behavior of C. elegans. In fact, we have been able to obtain the best possible linear model, by means of the innovative MIAAFT algorithm, and we have shown that the time series of the worm is, indeed, nonlinear. It should be stressed that we have developed a method to linearize any kind of time series, without any assumption regarding its features. This can be a helpful pre-analysis of any kind of time series, in order to asses if a linear analysis is useful. If the time series is clearly non linear, as was the case for the postural dynamics, we can be sure that using linear models will not capture the real dynamics. On the other hand, this method of detecting nonlinearity can still be improved, specially in giving a better measure of how far from linearity a certain time series is. This is specially the case if the differences between the distributions of the test statistic for the linearized surrogates and the real data are small.

Besides this linearization procedure, the major contribution of this thesis is a novel adaptive windowing algorithm for learning local dynamical systems. It can be used in any kind of non linear time series, from finance and economy to geological sciences or even non equilibrium statistical physics. Regarding biological systems, we believe that one of the first steps into further research will be to apply the same procedure to more general ata sets, in which the worm is freely moving. It will be interesting to see how many distinct behaviors we can capture using this method. On the other hand, we can apply the same procedure, with minor modifications in the experimental

Physics of Behavior : Quantifying the Dynamics of Body Postures

Master Thesis