Fisher information forecasting applied to neutron star pulse-profile modelling

(1)

Fisher information forecasting applied

to neutron star pulse-profile modelling

Bob de Witte

11324929

A thesis presented for the BSc degree

Physics & Astronomy

Anton Pannekoek Institute (API)

University of Amsterdam

Netherlands

(2)

Fisher information forecasting applied to neutron star

pulse-profile modelling

Bob de Witte

Popular abstract

Neutron stars are objects in the universe with extremely high densities that are formed due to dying stars. Neutron stars are interesting to observe for dense matter physicists because of

those high densities. Scientists want to know in particular the mass and radius to make a statement about what kind of matter exists in the interior of neutron stars. In this thesis a technique called pulse profile modelling is used to investigate constraints on the mass, radius

and other parameters of neutron stars. This technique involves data processing of characteristic pulses which a neutron star generates while it rotates. The data is collected by

a telescope, which observes a fast rotating neutron star to catch the pulses over time. The data is analyzed in a Python-based model to estimate the neutron star parameters, including

mass and radius. In this thesis we investigate what happens to constraints on the mass, radius and other parameters if the telescope is observing the neutron star twice as long. This

involves a mathematical tool called Fisher information forecasting. The predictions of other

researches were a scaling factor of√2 - 2 to parameter constraints, with a double observing

time. However, those researches involve simpler models than we are using. After applying the Fisher information forecasting tool, mass and radius constraints scaled by a factor of ∼ 2.5, while the average parameter scaling was ∼ 1.8. Unfortunately, we were not convinced yet that

(3)

Fisher information forecasting applied to neutron star

pulse-profile modelling

Bob de Witte

Scientific abstract

Neutron stars are extremely dense objects that are formed due to supernovae, the explosion of a dying star. The high densities in the core make them interesting to observe for dense matter

physicists, as such high densities cannot be simulated on Earth. Especially constraints on mass and radius are needed to make statements on matter composition in the core of neutron

stars. A technique to infer properties of source radiation of neutrons stars called pulse profile modelling is used. It involves Bayesian inference analyses of the characteristic pulse profile coming from a pulsar, from which probabilistic constraints on neutron star parameters are calculated. X-PSI (X-ray Pulse Simulation and Inference) is a Python package that provides a

framework to define and execute such calculations. In this thesis X-PSI will be used to analyse both observational data from NICER (Neutron star Interior Composition ExploreR),

and synthetic data that is generated from a model for generation of X-ray data. Parameter constraints are encoded in a likelihood function, which is a weighting over points in parameter

space; the relative weights measure the consistency between the data and a point in parameter space. Raaijmakers et al. (2019) were left with the question what happens to parameter constraints if the NICER telescope is able to have a double observing time on neutron star J0030+0451, since studies indicate it should improve parameter constraints by a

factor of √2 - 2 (Raaijmakers et al., 2019; Lo et al., 2013; Psaltis et al., 2014). Double

observing time means 1.94 Ms of observing time, which makes it useful to know what scaling factor for parameter constraints it gives. Fisher information forecasting is applied to address

this question. Fisher forecasting is a way of revealing parameter constraints and covariances between parameters, based on a multivariate Gaussian approximation to the likelihood

function. The big advantage of Fisher forecasting is the computing time of calculating posterior constraints. Numerical, second order partial derivatives were applied to the likelihood function. Two model setups were used; one using synthetic data to evaluate our

numerical derivatives and one with real data to analyze parameter constraints on double

observing time. For M and Req constraints are improved by a factor of ∼ 2.5 and the average

scaling factor for all parameters is ∼ 1.8. Unfortunately, results were not consistent enough to say what the scaling factor will be of a double observing time. A reason for this could be imprecise off-diagonal values in the Fisher matrix, but this is not validated yet. Second order

(4)

Chapter 1 Introduction

Dense matter physics is an industry in which a lot of progress has been made the last decades. Projects like the Large Hadron Collider (LHC) help us to get knowledge of small particles with enormous amounts of energy. However, experimental laboratories on Earth have their limits. Neutron stars are unique environments for studies in the field of cold, ultra-dense matter (Lattimer & Prakash, 2016; Oertel et al., 2017); matter we cannot create on Earth yet. In an exploration of the dense matter physics, neutron stars are interesting objects. The composition and interior of neutrons stars are one of the great unsolved questions in physics nowadays. It is believed that matter in neutron stars ranges from ions embedded in a sea of electrons at relatively low densities in the outer crust, to an unknown matter composition in the core where nuclear matter is squeezed together more tightly than in atomic nuclei (Watts et al., 2016). Neutron stars are therefore interesting objects to observe and are promising stellar objects to learn more of unanswered questions in nuclear physics and quantum chromodynamics (QCD), which is the theory of interaction between quarks and gluons. For now, QCD still remains just a theory that is extremely hard to explore because of numerical challenges (Watts, 2019).

Another unsolved question in physics is the state of matter inside neutron stars. Different types of matter, such as strange matter or nucleonic matter, result in different Equations of State (EOS) of the matter composition (Lattimer & Prakash, 2016; Baym et al., 2017). The EOS can be described as a thermodynamic equation which describes the state of a certain form of matter depending on physical conditions, such as pressure, volume and temperature. The learning about the EOS at such densities is, for instance, necessary for understanding core collapse supernova explosions, their following gravitational wave physics and neutrino emission (Janka et al., 2007). In summary Watts et al. (2016) argues: ”A reliable analysis of the EOS for zero temperature in asymmetric matter can be only achieved by neutron star measurements.” There are several techniques to elucidate the interior workings of neutron stars. It can also be the case that multiple techniques are combined and cross-checked to obtain better, statistical more meaningful results. An important goal of those techniques is to measure the mass and radius of a neutron star. The reason for this is that constraints on the mass and radius put direct constraints on the EOS inside neutron stars. In other words, if the physical properties (in this case, read mass and radius) are known, the EOS can be mapped out (see Figure 1.1).

Pulsar timing is a relatively accurate technique to determine the mass of a neutron star and

resulted in neutron star discoveries of 2M (Demorest et al., 2010). However, not much other

information can be measured from pulsar timing alone. An expected technique to obtain the radius of a neutron is Square Kilometer Array (Watts et al., 2014), but more is needed to map out the EOS. Gravitational wave observations of binary systems is another technique which in principle can measure mass and tidal deformability, but uncertainties in template models (Lackey & Wade, 2015) and lack of observations (Agathos et al., 2015) are still issues within this technique.

(6)

Figure 1.1: The relation between the core composition of the neutron star, the EOS, the mass-radius correlation and the exterior spacetime of a neutron star. Information in the form of electromagnetic waves due to distortion of spacetime can be used to tell useful things about the core composition and EOS. Source: Watts (2019).

infrared to gamma rays. Pulsars (fast rotating neutron stars) emit specific, unique pulses in X-ray, which makes this part of the electromagnetic spectrum interesting to observe and analyze. Those X-rays come from hot spots, hotter than the rest of the surface, which contain interesting information about other properties of the neutron star, including information about the radius. Those X-ray pulses can be used in various techniques. For instance, modelling of quiescent or bursting neutrons stars is an option (Shaw et al., 2015; Natillia et al., 2017), but composition and distance uncertainties affect results. Another technique which uses X-ray detections is called pulse profile modelling. This technique is promising to determine mass and radius (Watts, 2019) and is used in this thesis.

Pulse profile modelling, also called waveform or lightcurve modelling, is a technique which makes use of the fact that X-ray radiation propagates through the neutron star spacetime to an instrument on a telescope, which registers (photon) event data. It is a technique which is used in Riley et al. (2019) for parameter estimation of neutron stars. Results for the parameter constraints of neutron star J0030+0451 look promising, including mass and radius. Millisecond X-ray pulses are observed from the neutron star in the form of a pulse profile. It is believed that these pulses are produced by regions hotter than the rest of the surface, which we will refer to as ’hot spots’ from now. These regions are formed due to matter impact in so-called ’gaps’ in the magnetic field of the neutron star. In this case, this matter comes from particles created in the magnetosphere of J0030+0451. This way of creating hot spots results in much clearer pulse profiles than accrediting matter from an accompanying star (Watts, 2019). In these millisecond pulses information about the neutron star, most interestingly the mass and radius, is encoded

in the energy-dependence and shape of the pulse profile (see Figure 1.2). Approximately 106

photons are needed to analyze the pulse profile to achieve a < 5% uncertainty in the neutron star radius, which is needed for separating different EOS models (Psaltis et al., 2014). Logically,

(7)

Figure 1.2: Representation of a pulse profile of neutron star J0030+0451. The top figure displays a summation over subsequent channels per time interval. The bottom figure displays the photon counts per channel along its rotational phase (time). Source: Riley et al. (2019).

the more photons obtained by a telescope, the more accurate results will be. Gravitational redshift basically tells what the M/R ratio is, while special relativity is used to examine what the radius of the neutron star is measured by the angular velocity. This is the reason why pulsars are preferred for this technique. If the analyzed data result in a proper fit with tight constraints, the obtained parameters can be promising. This is special among techniques used for determining the mass and radius accurately (Watts et al., 2016).

An open-source Python package called X-PSI1 (X-ray Pulsation Simulation and Inference)

provides a framework for constructing and applying pulse profile models. By using Bayesian statistics a certain data set from a pulsing neutron star is analyzed to estimate various param-eters of the neutron star. A likelihood function is created from neutron star physics along with a(n) (observational) data set. This likelihood function is sensitive to a set of parameters in the context of the (observational) data set, coming from the physics in the model. In this thesis, the likelihood function will be explored by applying Fisher information theory in the form of matrices using Gaussian approximations. Using Fisher information, variances of parameters and covariances of pairs of parameters can be revealed. This is effectively the same outcome as posterior distributions (parameter constraints) of the X-PSI parameters (done in Riley et al., 2019), but Fisher information is computationally much less intense (think about 10 - 30

minutes compared to 103 _{- 10}5 _{core computer hours from model runs in Riley et al., 2019).}

The possibilities and applications of Fisher information are, in theory, broad. It is an interesting way of forecasting the power of an experiment to constrain model parameters. In

(8)

this thesis the Fisher matrices are used to estimate the impact on the parameter constraints of a double observing time of the NICER telescope, which is an open question left in Raaijmakers et al. (2019). In the case of the research discussed in Raaijmakers et al. (2019), we talk about a longer observing time of 1.94 Ms. Previous studies (Lo et al., 2013; Psaltis et al., 2014) suggest

improvement by a factor of √2 - 2, when the observing time is doubled. However, Those

estimates are done with simpler hot spots on the surface of the neutron star, so those factors are questioned in Raaijmakers et al. (2019), as X-PSI uses more complex hot spot structures. Data from the NICER telescope is used in Riley et al. (2019) for estimating parameters of neutron star J0030+0451. The exploration of Fisher information applied to X-PSI might be relevant for future decisions that will be made for X-PSI research and, in the most ambitious case, for bigger decisions in the science requirements and observing strategy for telescopes.

(9)

Chapter 2 Theoretical framework

2.1 Likelihood function

To understand the X-PSI package we must have a better understanding of the statistics and structure used in X-PSI. Bayesian statistics is the foundation for deriving a posterior probability distribution over models, most commonly as a distribution of continuous model parameters. A posterior distribution of model parameters is calculated via

P (θ | D, M ) = P (D | θ, M )P (θ | M )

P (D | M ) , (2.1)

with θ representing model parameters, D the X-ray data and M the model. The part evaluated in this thesis is P (D | θ, M ), which we from now on call the likelihood function. This function can also be defined as

L(θ, M ) := P (D | θ, M ). (2.2)

The likelihood function of parameters θ is the probability of the data given θ. It will return a scalar for a given injected parameter set, also called parameter vector. The scalar is calculated by using relative weights to modify a prior probability distribution, in the context of a data set. The prior distribution is P (θ | M ), which comes from previous knowledge of parameters. In the case of Riley et al. (2019), priors are relatively ’flat’ compared to the likelihood function. Therefore, the posterior distribution strongly depends on the likelihood function. The likelihood function represents a probability or degree of belief whether a certain model parameter set is likely in the context of the observational or synthetic data set by relative weights, as mentioned before. The part under the dividing line in Equation 2.1 is the normalizing factor, which will not be discussed in the thesis.

To have a better understanding of the likelihood function and how it is built up program-matically in the X-PSI modelling framework, Figure 2.1 is a useful figure to take a look at. A

(10)

callable likelihood is built by an object called Star and an object called Signal, where the latter can have multiple instances. The Star is built up using the Spacetime and the Photosphere of the neutron star. General relativistic gravity is adopted to construct a spacetime manifold and will tell you how the neutron star affects its surrounding spacetime. It has the ambient Schwarzschild solution. Parameters such as mass, radius, and spin are needed to calculate the Schwarzschild solution and are thus part of the model parameter vector. The model parameter vector is used to represent parameter values is the parameter space.

The other component which constructs the Star are the Photosphere instances. The Photo-sphere of the neutron star tells us what the electromagnetic spectrum of the neutron star looks like. In pulse profile modelling, relativistic ray tracing is used to analyze the pulse profile. The radiation comes from the surface of the neutron star, also called the surface radiation field. The most intense radiation, in the form of X-rays, comes from hot spots on the surface of the neutron star. These hot spots are hotter than the rest of the star (Elsewhere). The X-ray radiation propagates through the spacetime of the neutron star, which give the pulse profile its characteristic look. This results in a pulse profile (see Figure 1.2). Riley et al. (2019) decided to simulate two disjoint hot regions for neutron star J0030+0451. One of them was a crescent like shape, and one of them a circular spot. The two disjoint spots are named primary and secondary hot spot. Figure 3.1 gives a better visual picture of the neutron star photosphere and especially the hot spots. This will be explained in more detail later on.

A data set is necessary to define a useful likelihood function. The photons from the neutron star and background sources are captured by the sensitivity field of the NICER telescope. The data can be represented by a phase bin and a data count for every phase bin, called the dataspace. The data set exists in this dataspace. The photon events recorded by the telescope are registered imperfectly. This is the effect of the Instrument itself and Interstellar gasses. The model compensates for each of those ’disturbances’ to calculate a model signal. The technical aspects of those components are going into too much detail for this thesis, so will not be discussed any further (for further reading on those technical aspects, see articles linked in Wyers 2019 and especially Arzoumanian et al., 2014).

X-PSI generates a likelihood function with parameters belonging to physical parameters of the neutron star in the context of an observational or synthetic data set. In this thesis the limits, derivatives and smoothness of the likelihood function will be analyzed by using a Fisher information forecasting.

2.2 Fisher information matrix

In general, Fisher information is a measure of the information about parameters encoded in an observational experiment. It can reveal sensitivity to model parameters, so Fisher infor-mation can be useful for future experiment design and forecasting. In short, as mentioned as in the introduction section, Fisher information is powerful to forecast the outcomes of future experiments, as correlation between various parameter are encoded in a Fisher matrix. If the section below is still too abstract and hard to understand, I recommend taking a step back and start with an easier, beginners manual from Wittman (2012). It contains simple examples

with exercises to practise with. It can also be useful to watch some basic Youtube videos1_.

After getting familiar with the basics of Fisher information, Coe (2009) is useful as a clear and

(11)

concise reference guide.

Computationally, the Fisher matrix is an N xN , symmetric Hessian matrix of the negative log-likelihood: F(θ)_i,j = − ∂2 ∂θi∂θj log L(θ, M ) , (2.3)

where i and j are integer values in the range of N . Parameters are represented as θ and M is the X-PSI model where the likelihood function is formed. It is an approximation of the likelihood function as multivariate Gaussian function. The Fisher information matrix, from now on called Fisher matrix, coefficients are second order partial derivatives of the log-likelihood with respect to each (pair of) parameter(s). The inverse of the Fisher matrix is the covariance matrix. In 2D, looking at two different parameters in the parameter space of the likelihood function, the inverse Fisher matrix takes the form

[F (θ)]−1 = [C] = σ 2 i σij σij σ2j , (2.4)

where the diagonal terms in the matrix are the variances in each parameter, and the off-diagonal terms the covariance between a pair of parameters. Values in the Fisher matrix tell us something about the curvature or ’peakiness’ of the likelihood around that fiducial point. The fiducial point is a vector in parameter space, often a vector close the the maximum likelihood vector in parameter space. Bigger values in the Fisher matrix correspond to larger curvature in the likelihood function. Taking the inverse of bigger values, will end up in smaller values in the covariance matrix, which explains higher constraining power on a parameter because of a smaller variance. In theory, the covariance matrix will tell us both the constraints on each individual parameter, and the covariance between the pair of parameters. Confidence ellipses can be made with this information (for examples, see Coe, 2009), but those will not be presented in this thesis.

An important note to make is the moment when the Fisher (sub)matrix is inverted. If we want to keep a certain parameter at a fixed value, we can simply remove the row and column in the Fisher matrix belonging to that parameter. Alternatively, we can also set the on-diagonal value of a certain parameter to a very large value (Coe 2009). After that, we take the inverse of the (smaller) Fisher matrix. If we want to marginalize over any parameter, which means that the parameter is held free, we first need to inverse the complete Fisher matrix. After that when comparing a pair of parameters, we simply pick the 2x2 matrix belonging to those parameters out of the inverse Fisher matrix, which results in the 2x2 covariance matrix. More manipulations to the fisher matrix can be done like adding priors or fixing parameters. In the

case of a well known prior, we can simply add 1/σ2_i to the on-diagonal element corresponding

to that variable. Other manipulations to the Fisher matrix can be done, but the ones used are yet explained.

A good question to ask is whether coefficients in the matrices make sense if in advance we have little concrete knowledge about the structure of the likelihood function. Fisher in-formation matrices only work perfectly for multivariate Gaussian functions, supposing that (numerical) derivatives are perfectly accurate. However, the likelihood function is not a multi-variate Gaussian function. Our likelihood function being numerical and thus having numerical error detracting from smoothness is a complication. To check whether numerical derivatives are

(12)

reliable, and thus coefficients in the Fisher matrix and the inverse Fisher matrix, various tests can be done to determine this. Those test will be explained in more detail in the methodology section.

2.3 Numerical second order derivatives

Numerical differentiation is often used for estimating derivatives of a mathematical function using values of the function itself. As the likelihood function cannot be analyzed analytically, numerical derivatives are needed to compute the Fisher matrix. In Chapter 25 of Abramoqitz & Stegun (1948), different methods regarding numerical differentiation are explained. For second order partial derivatives by applying Taylor expansion,

∂2_L(θ) ∂θi∂θj = L(θi+ hi, θj+ hj) − L(θi+ hi, θj − hj) − L(θi− hi, θj + hj) + L(θi − hi, θj− hj) 4hihj + O(h2) (2.5) is used during this thesis, where L is the log-likelihood, θ is the parameter vector and h the step size taken for each parameter. This method of taking numerical derivatives is called a central method, which can be understood visually by Figure 2.2. This method came out as most accurate one out of central, forward and backward methods, looking at fractional errors when those methods where applied to a analytical function. Note that when i = j, the equation takes the form

∂2_L(θ)

∂θ_i2 =

L(θi+ 2 · hi) − 2 · L(θi) + L(θi− 2 · hi)

4h2_i + O(h

2_). _(2.6)

It is important to decide what value we pick for hi and hj. Central finite difference is being

used with a variable step size for each parameter, which will be explained in Section 3.2.3. We also explored a Python written package for automation of numerical derivatives called

numdifftools2_{. Potentially, this package is ideal for taking numerical derivatives. However, as}

we notice further on, some troubles occur when we use the likelihood function and numdifftools together.

Figure 2.2: A representation the central method used in second order partial derivatives. Four points around the fiducial point (a, b) are involved taking the derivatives. Source: https: //www.uio.no/studier/emner/matnat/math/MAT-INF1100/h10/kompendiet/kap14.pdf

2

(13)

Chapter 3 Methodology

3.1 Installation of X-PSI on Windows

Before any likelihood could be computed, the X-PSI software package must be installed. As a Windows 10 operating system is used for this thesis and X-PSI only has manuals for LINUX and MAC, a different way of installing X-PSI must be found. A step by step manual of the installation is attached in Appendix B. It came down to using the Ubuntu shell (LINUX) on the Windows computer and install X-PSI via the Ubuntu operating shell. Effectively this means making a LINUX environment on a Windows computer. X-PSI uses Python version 2.7 with several (scientific) packages and has some underlying C code used for integration over the surface radiation field to make the software faster. The model constructing page of Riley (2019b) was used to check whether X-PSI was working as it should. The last sections (Density and support checking and Nested sampling) were neglected here, as they were not used for the

thesis. A HTML copy of the installation page is attached in the Google Drive1_.

3.2 Fisher matrix decisions and checks

3.2.1 Model setup

X-PSI can be used as a general package for the pulse profile modelling technique. To make valuable statements using the Fisher matrix, a comparable model setup must be used to evaluate Fisher results. In this thesis two different model setups were used. The first model setup,

explained in this section, is coming from the model construction page1 of Riley (2019b) and

was mainly used to evaluate implemented numerical differentiation. The second model used is based on real data from J0030+0451, and will be explained into more detail in Section 3.3 of this thesis. Both models are so called ST-U (Single Temperature Unshared) models, which means that the neutron star has two circular hot spots with unshared parameter values; the hot spots can have different temperatures, sizes and places on the neutron star. The model described in this section is effectively the demonstration model in Section 3.5 of Riley (2019a). The difference is that in Section 3.5 the secondary temperature is a free parameter. This setup model was meant as a test case for X-PSI, where a synthetic data set was used.

The Star object in this model was constructed without an Elsewhere temperature. The fictitious model neutron star has two circular hot spots, which are the cause for the pulse profile. Furthermore, the neutron had a fixed spin rate of 300Hz. The model involves 12 unfixed parameters, which are listed in Table 3.1. The fiducial point in parameter space is called the parameter vector. When computing the Hessian, this vector is used while letting 2 of

1

(14)

Parameter Unit Injected value

Mass (M ) M 1.4

Radius (Req) km 12.5

Distance (D) kpc 0.2

Inclination (i) radians 1.25

Primary HS phase shift (φp) cycles 0.0

Primary HS colatitude (Θp) radians 1.0

Primary HS radius (ζp) radians 0.075

Log primary HS temperature (log10(Tp)) K 6.2

Secondary HS phase shift (φs) cycles 0.025

Secondary HS colatitude (Θs) radians π - 1.0

Secondary HS radius (ζs) radians 0.2

Log secondary HS temperature (log10(Ts)) K 6.0

Table 3.1: List of parameters in the model setup with synthetic data with belonging parameter vector values. ’HS’ is an abbreviation for Hot Spot. The injected values represent the fiducial point in parameter space. The values are the same as values in the model construction page from Riley (2019b). Parameters are explained further in Section 3.2.1.

the 12 parameters unfixed2_{. In this model the data was created synthetically}3_{. The Instrument}

used is the NICER telescope. Background and Interstellar are set to None. Explanation of parameters

The parameter M [1.0, 3.0]4 _{is the mass of the neutron star in solar masses. R}

eq [3.0, 16.0]

is the equatorial radius in kilometers, D [0.1, 1.0] is the distance from Earth to the neutron

star in kilo-parsec and i [0.001, 1₂π] is the inclination to the pulsar rotation axis looking from

Earth. Figure 3.1 is useful to look at to describe hot spot parameters. The subscripts p and s stand for primary and secondary hot spot. φ [-0.5, 0.5] is the phase shift of each hot spot, so this basically indicates where the photon count peaks are in the pulse profile. Θ [0, π] is the colatitude of the hot spot, where zero represent the north pole of the neutron star, and π

the south pole. ζ [0, 1₂π] is the radius of the circular hot spot in radians, so determines the

size of the hot spots. The temperature of each hot spots is represented in the form log10(T ).

Boundary values are based on theory and physical limits.

3.2.2 Hessian implementation

After the model setup a callable likelihood object could be made. For a given parameter vector, a scalar with the log-likelihood value was returned. The value needed to be compared with other log-likelihood values close to the fiducial point to give meaning to the scalar value.

First the numdifftools.Hessian5 _{function was used, because we expected this package to have}

2_{Except for the second order derivatives on the same parameter, then one parameter is left free.}

3_{File of customized data in X-PSI: ../../examples/data/synthetic realisation.dat,} _{exposure time =}

984307.6661 s

4_{Values in brackets are boundary values for each parameter.} 5

https://numdifftools.readthedocs.io/en/latest/reference/numdifftools.html#numdifftools. core.Hessian

(15)

Figure 3.1: Visualization of ST-U model. Two circular hot spots with different parameter values visible here. Source: Riley et al. (2019).

higher accuracy then when we build our own numerical derivatives. Numdifftools is a Python written package with different tools for numerical differentiation in multiple variables. Finite differences are used combined with a Richardson extrapolation methodology to aim for the most accurate results. The user of the package has many configuration options when taking numerical derivatives. For instance, error estimates can be given with the full output option and the differentiation method (forward, backward and central) can be switched. Unfortunately, if no manual step size for the numerical derivatives was defined, the function reached the limit of going out of bounds of the fiducial point parameter values. This was the biggest issue of numdifftools applied to the likelihood function. The numdifftools.Hessian function did not allow to give a different step size to every parameter in the parameter vector. Thus, it was then decided to copy code from the numdifftools.Hessian function and perform Hessian computation without using the numdifftools package. The adjusted code for numerical derivatives was first tested on two simple analytical functions with four variables. Fractional differences between the numerical and analytical derivatives of those functions were compared to check whether values in the numerical derivatives weren’t too far off. It was decided to use our own Hessian function for the likelihood function.

Picking the same finite difference for every parameter was not preferred. A function to set a different step size for each parameter in parameter space was made, which is called h-array from now on. The reason for this decision is that we wanted different step sizes for every parameter value. In appendix A code of the functions h-array and the Hessian function are given. For the second order partial derivatives a central finite difference is used as explained in the theoretical framework by Equations 2.5 and 2.6. Regarding the h-array function, a step of

(16)

Figure 3.2: An example of numerical accuracy using different step sizes for different injected values for a variable x. The fractional difference between analytical and numerical derivatives is displayed on the y-axis. An optimal step size with highest accuracy is always aimed for when taking numerical derivatives. The rounding error occurs because of computer precision and the formula error occurs because the function is not approximated well enough by taking too big steps. Source: https://en.wikipedia.org/wiki/Numerical_differentiation

was used, with hi the finite difference taken, pithe belonging parameter value and s the constant

factor for step size, which will be explained in Section 3.2.3. The h-array was then callable for the Hessian function.

3.2.3 Step size

For numerical differentiation, choosing the right value hi is important. Rounding errors because

of small values of hi and formula errors because of bigger values for hi are the main trade-offs

to be evaluated, see Figure 3.2. Every hi for the parameters was defined by formula 3.1. It

is considered that hi is the step taken, and s is defined to be the constant step size factor for

every parameter. Numdifftools is using the square or third root of the machine epsilon6, so

this option was explored first. After that, different values for s in the range of 1e-3 to 1e-7 with steps of powers of 10 were probed to achieve the optimal value for the likelihood function. The evaluation was done by 1D comparison (see Section 3.2.4) for mass and radius. Different log-likelihood values were calculated by letting mass and radius free at the time. Results were compared with standard deviations coming from the Fisher matrix according to Equation 3.2. Step sizes resulted in smallest fractional errors were taken.

3.2.4 1D and 2D comparison

To get an idea whether coefficients in the Fisher matrix made sense at all, some checks were executed. Diagonal values in the Fisher matrix correspond to standard deviations of each parameter (Equation 3.2). The fact that the likelihood function in itself is a probability function and expected to be Gaussian for one free parameter, a Gaussian comparison was logical. Note that the likelihood function is not a probability density distribution of parameters, as the prior

6

(17)

and normalizing factors are not involved in the likelihood function (see Equation 2.1). For comparing the likelihood probability with an analytical Gaussian function, this is not relevant, as long as in both cases a prior is not used and we do not normalize!

1D

For setting up a 1D Gaussian equation for every parameter in the parameter space, a standard deviation was extracted from the Fisher matrix. The standard deviation of each parameter can be found by

σi =

1

F [i][i]0.5, (3.2)

where F [i][i] are the on-diagonal terms in the Fisher matrix. What was done here is basically setting every parameter but one to a fixed value. The standard deviation is implemented in the non-normalized 1D Gaussian equation. Then a probability distribution is made of the likelihood function. Again, all but one parameter is fixed at the same time, making it possible to evaluate the likelihood around a chosen parameter. The upper and lower bound to make the results visually most attractive were picked manually. Typically around 50 steps for the log-likelihood are taken for every parameter. Taking more steps will be computationally tougher (as every likelihood calculation will cost around 2-3 seconds) and will not end up in significantly better (visual) results. Every log-likelihood value calculated was added with the number of the highest log-likelihood value from every log-likelihood value-set. What this does, is rescaling the log-likelihood values so that the maximum log-likelihood has a value of 0. After that, the

log-likelihood values were exponentiated to obtain likelihood values7_{. The shift along the x-axis}

was done by picking the x-value of the maximum likelihood and then shift the Gaussian with this value. As log-likelihood values normally look like a parabola, likelihood values looked like a Gaussian, which can now be compared with the analytical Gaussian obtained by the inverse Fisher matrix. This method was done for all 12 parameters for the setup model.

2D

The idea of the 2D slicing through the Gaussian and likelihood surface is equal to the 1D case. This time, two parameters were held free, the rest fixed again. To give a 2D visualization (as 3D overlaying of likelihoods and Gaussian was not attractive), parameters with linear correlation were sought. Figure 3.3 gives an idea of what the log-likelihood surface looks like with two free parameters. A change in variables was made to get a linear line through the surface, which was expected to end up to be a Gaussian kind of shape again. On the basis of Figure 3.5 of Riley (2019a) pairs were selected which seem to have positive, linear correlation. We choose parameter pairs which were linearly correlated so the covariance would be finite and sufficiently large to check off-diagonal Fisher terms.

It came down to the following 5 pairs of parameters: mass & radius, distance & primary HS (hot spot) radius, distance & secondary HS radius, primary HS radius & secondary HS radius and primary HS temperature & secondary HS temperature. To slice through the log-likelihood surface, the first parameter of each pair was divided by the second parameter of each pair.

7_{Typical values of the log-likelihood are in the order of range of −3 · 10}4_{. When this number is exponentiated,}

(18)

Figure 3.3: Representation of the log-likelihood surface with variables mass and radius free. The dark blue area is the area where the log-likelihood is peaking. This visually verifies that mass and radius are degenerated.

This resulted in a constant value which was kept at the same value for each pair. Effectively this is a change of variables going from two individual parameters to one combined parameter,

ending up with a linear equation of two variables. Then for every pair approximately 80

steps around the fiducial point of each pair of parameter were taken to evaluate the likelihood around the changed variable (the linear line). The effect of this means that we still have two free parameters every evaluation, but we visualize it by looking like one parameter. The same steps with log-likelihood values were taken as the 1D case; adding maximum log-likelihood so that the value likelihood value is 0, exponentiated log-likelihood values and shift the Gaussian along the x-axis, which was done manually in this case.

Exponentiated log-likelihood values on their turn were compared with the bivariate Gaussian

equation. Standard deviations (σi and σj) and covariances (σij) were extracted from the 2x2

covariance matrix. The inverse was taken after picking the 2x2 Fisher sub-matrix . Those were on their turn used in

f (θi, θj) = exp − 1 2(1 − ρ2₎ (θi− µi)2 σ2 i +(θj − µj) 2 σ2 j − 2ρ(θi− µi)(θj − µj) σiσj , (3.3)

which is the bivariate Gaussian equation. θ are the model parameters, µ are fiducial point mean values, σ values correspond to extracted values from the 2x2 covariance matrix and ρ is the

correlation coefficient defined as _σiσjσij . To make up the Gaussian equation, the same intervals as

the log-likelihood evaluation were taken, only 10 times more steps were taken (as computation time here is not an issue, where it is with computing hundreds of likelihoods).

(19)

3.2.5 Multivariate Gaussian

For a general evaluation of the code used to take second order partial derivatives (see appendix A), a analytical, multivariate Gaussian equation was used to see whether approximations made in the derivatives are valid. A multivariate Gaussian means a Gaussian equation in more dimensions (in our case 12 dimensions, as we are still using the model setup as described as in Section 3.2.1). This was done as a result of Fisher matrix inversion, because those results were not as expected (more about this in the discussion section). An analytical multivariate Gaussian function was made in the form of

fθ(θ1, . . . , θN) =

1

2(θ − µ)

T_{· Σ}−1_{· (θ − µ).}

(3.4)

The vector θ is the vector in parameter space where is evaluated around, µ is the mean vector in parameter space and Σ is the covariance matrix. The dots in between parts in the equation represent matrix dot products. In this case, mean values in parameter space where copied from the posteriors of 3.5 of Riley (2019a), and variances were estimated as the mean of upper and lower boundaries of those posteriors. The exponent was neglected, because, as in the normal log-likelihood function, the log-multivariate Gaussian was taken. Also the normalizing factor was neglected to obtain values in the correct order. Thereafter, the multivariate equation was used in both the own constructed Hessian function and the numdifftools.Hessian function. Then fractional differences were taken for each Hessian function to each parameter. Results were compared to look which code is the more accurate one.

3.3 Test case: effects double observing time on J0030+0451

For the test case of the Hessian implementation and evaluation another model setup in X-PSI was used. This again is the so called ST-U model (see Figure 3.1) with two circular hot spots with unshared parameter values. This model is described in Chapter 4 of Riley (2019a). It has similarities with the model setup used for the Hessian code tests, but the reason why this model is used for Fisher evaluation is that posterior Monte Carlo sampling values and standard deviations can be compared to standard deviations resulting from the Fisher method. The biggest difference between the model setup from 3.2.1 and this model setup is that this model used real data from the NICER telescope. This results in the fact that four other parameters

are added; one parameter Nh [0, 5] for interstellar column density, and three parameters for

the calibration of the NICER telescope data: α [0.5, 1.5], β [0.0, 1.0] and γ [0.5, 1.5]. The latter parameters are technical calibration parameters regarding the telescope. Those will not be explained, but were necessary to mention because they are involved in this model.

Therefore, this model contains 16 parameters. The model setup was done via a BitBucket

repository8 _{to have the most recent updates of X-PSI and to get easier external help to solve}

small errors while installing the model. As mentioned in the introduction section, a question raised in Raaijmakers et al. (2019) was what the effect of doubling the observing time is on parameter constraints in X-PSI evaluation, which means 1.94 Ms of observing time. Better

results in the order of a factor √2 to a factor of 2 are expected, but not verified (Raaijmakers

et al., 2019; Lo et al., 2013; Psaltis et al., 2014). Fisher information, in theory, is an elegant,

8

(20)

computationally not intense way to analyze this. We doubled our data set, which effectively is doubling the photon data count, and ran the ST-U model using this data set. The data set for both original and doubled data were attached in the Google Drive. We analyzed what the differences regarding Fisher matrices were between the original data set and the doubled data set. Our own implemented Hessian code was used here. It was expected that parameter constraints with a doubled data set were getting better with a certain scaling factor.

For Fisher evaluation and parameter constraint values we needed to invert the whole Fisher matrix. This was done with the numpy.linalg.inv function. First, results were not as expected, as inversion of the Fisher matrix resulted in negative squared standard deviations, which did not make sense at all. So we tried tricks to enhance results. First, priors were added. For distance (D), NICER alpha (α) and NICER gamma (γ) we have priors of respectively 0.009, 0.1 and 0.1. Add this value inversed squared to its corresponding diagonal coefficient in the Fisher matrix (Coe, 2009), as explained in Section 2.2.

We furthermore fixed the primary and secondary hot spot radii (ζp and ζs) by adding a

value of 1e7 9 _{to the corresponding diagonals in the Fisher matrix. The reason for this was}

because T. Riley suggested that those parameters were a trouble for the inversion of the Fisher matrix, as derivatives were not stable. No conclusion regarding improvement on constraints on the radii hot spot parameters could be made, but it ensured that the Fisher matrix was invertable and giving less NaN values. For the step size s, powers of 10 were probed ranging from 1e-2 to 1e-7.

(21)

Chapter 4 Results

4.1 Fisher information results

To be clear, all results from Section 4.1 are coming from the model setup 3.2.1 with synthetic data. Results of real data and synthetically doubled data for imitating double observing time (Section 3.3) are presented in Section 4.2.

4.1.1 Step size

The steps to find the s value from Equation 3.1 were taken. s values in the range of 1e-5 to 1e-7 are favoured. In fact, Fisher matrix coefficients using this range of step size, resulted in Gaussian approximations which overlapped the likelihood probability density for both mass and radius (see Figure 4.1). This is due to the fact that likelihood function slices are highly Gaussian. Evaluating this range of step sizes with fractional differences, a value of 1e-6 was taken because this one was slightly more accurate if fractional differences were calculated (see Figure 4.2). Step sizing in the range of 1e-3 and 1e-7 doesn’t seem to have a radical effect on the diagonal values of the Fisher matrix, which is where we look at in the first instance; no more than a couple of percent difference in curvature value is measured. The reason for the bigger fractional difference in Figure 4.2 if we look at values going to 0, is that dividing by a small number will give high(er) fractional errors.

Figure 4.1: Different step sizes used for mass and radius. Step size ranging from 1e-3 to 1e-7 are overlapping the Gaussian from the Fisher results for both mass and radius.

4.1.2 1D results

Evaluating each parameter around the chosen fiducial point (see Table 3.1) resulted in Gaussian overlapping shown in Figure 4.3. Standard deviations derived from the Fisher matrix according to Equation 3.2 are apparently in the correct order of magnitude. Some Gaussians are shifted by a certain value with respect to the x-axis. For instance, the middle figure of the second row of Figure 4.3. This is the effect of choosing the mean value to be the x-value where the maximum

(22)

Figure 4.2: Fractional difference between different step sizes taken and Gaussian Fisher equation for mass and radius.

(log-)likelihood is of the likelihood function. As the intervals of the likelihood function are bigger than the ones from the analytical Gaussian, the shift can be of the order of the interval taken between log-likelihood calculations of the likelihood function. Looking only to the width (standard deviations) of the Gaussian, this is not a problem.

Figure 4.3: 1D slicing results for all 12 parameters of the model setup with synthetic data. Likelihood slicing versus on-diagonal standard deviations from the Fisher matrix are compared here.

(23)

4.1.3 2D results

Five pairs of parameters were analyzed with 2D slicing. This was done by looking at posterior values from Chapter 3 Riley (2019a), taking the covariance into account for the two parameters evaluated. We choose parameter pairs which were linearly correlated so the covariance would be finite and sufficiently large to check off-diagonal Fisher terms. Another reason for this is that slicing to the likelihood surface is than expected to be Gaussian. Results are displayed in Figure 4.4.

Figure 4.4: 2D slicing results for the 5 pairs of parameters chosen.

4.1.4 Multivariate Gaussian results

For the evaluation of the multivariate Gaussian equation, the Hessian function of numdifftools seems to be more accurate than the own made Hessian function (see Table 4.1). Numdifftools typically has fractional errors of ∼ 1e-16, whereas the own Hessian code has fractional errors ∼ 1e-8. This can be mostly attributed to the fact that numdifftools has optimal step sizing methods and more iterations per derivative. Unfortunately, numdifftools couldn’t be used for our bounded, numerical likelihood function for now.

(24)

Parameter Injected std. Fractional difference Fractional difference

deviations (σ) own Hessian numdifftools.Hessian

M 0.06 6.05e-08 2.31e-16 Req 1.1 1.12e-07 8.07e-16 D 0.03 4.13e-08 8.09e-16 i 0.26 1.32e-06 0.0 φp 0.0019 1.27e-10 1.14e-16 Θp 0.22 3.11e-06 2.52e-16 ζp 0.013 2.78e-07 4.00e-16

log10(Tp) 0.009 7.90e-11 3.85e-16

φs 0.0016 1.69e-07 2.71e-16

Θs 0.29 3.58e-07 3.82e-16

ζs 0.03 4.13e-08 5.78e-16

log10(Ts) 0.009 9.66e-11 3.85e-16

Table 4.1: Multivariate Gaussian comparison with own implemented Hessian code and hessian code from numdifftools. Injected standard deviations are estimates of 68% credible intervals of posterior values from Chapter 3 of Riley (2019a).

4.2 Test case results

To be clear, all results from now are coming from the ST-U model applied to neutron star J0030+0451. Results of both runs on original data and synthetic doubled data are done. The

matrices can be downloaded from the Google Drive1 _{for both original and doubled data sets.}

Standard deviations of posteriors of the parameters are displayed in Table 4.2 on page 24. First we looked at standard deviations with all parameters fixed but one. A clear improvement of a

factor in the range of√2 (≈ 1.41) is the result. Those numbers are extracted from the diagonal

of the Fisher information matrix via Equation 3.2. Thus, diagonal values in the Fisher matrix seem to act as expected. Those values cannot be compared with posterior values from Table C.3 of Riley (2019a), but is just an indication of diagonal coefficient behaviour of the Fisher matrix.

Looking at the third column, where we marginalize over the parameters with original data, all standard deviations were able to be computed. Marginalizing over the parameters means that we let every parameter free, or, that we don’t fix any of the parameters. Adding priors and fixing both hot spot radii were the cause of this. Most standard deviations are of between a factor of 1.0-1.5 compared to the posteriors from the second column. Looking at how Fisher information handles the doubled data set, standard deviations mostly shrink, which is good. However, the factor varies between parameters, which isn’t as expected. For the β and γ parameter, the constraint even gets less tight. Along the way we noticed that instrument parameters (α, β and γ) with especially β are causing troubles with derivatives and inversions. The doubled data set has one NaN value which was even not solvable with different step sizing, added priors or fixing the primary and secondary hot spot radii. In the discussion section we will talk about the implications of Table 4.2.

1

(25)

F ull marginalization All parameters fixed (1-σ v alue s) (1-σ v alu es) P arameter P osteriors a Original data Doubled data O r ig inal data D o u bl eddata Original data Doubled data O r ig inal data D oubl eddata M 1.09 +0 .11 − 0 .07 1.55e-1 6.11e-2 2.54 4.22e-3 3.01e-3 1.40 Req 10.44 +1 .10 − 0 .86 1.04 4.22e-1 2.46 2.70e-2 1.91e-2 1.41 D 0.33 +0 .01 − 0 .01 9.05e-3 8.48e-3 1.07 9.77e-4 6.96e-4 1.40 i 1.04 +0 .07 − 0 .08 5.63e-2 NaN NaN 1.46e-3 1.04e-3 1.40 φp 0.46 +0 .00 − 0 .00 1.46e-3 9.97e-4 1.46 6.77e-4 4.78e-4 1.42 Θp 2.48 +0 .06 − 0 .06 6.41e-2 3.65e-2 1.76 2.05e-3 1.45e-3 1.41 ζp 0.14 +0 .02 − 0 .02 FIXED FIXED -FIXED FIXED -log 10 (T p ) 6.11 +0 .01 − 0 .01 1.34e-2 4.00e-3 3.35 5.93e-4 4.19e-4 1.42 φs -0.50 +0 .00 − 0 .00 1.16e-3 7.01e-4 1.65 5.61e-4 3.97e-4 1.41 Θs 2.78 +0 .02 − 0 .02 2.35e-2 1.38e-2 1.70 1.06e-3 7.49e-4 1.42 ζs 0.29 +0 .04 − 0 .03 FIXED FIXED -FIXED FIXED -log 10 (T s ) 6.10 +0 .01 − 0 .01 8.36e-3 2.07e-3 4.04 4.46e-4 3.16e-4 1.41 α 0.96 +0 .10 − 0 .10 1.08e-1 1.03e-1 1.05 7.59e-2 6.36e-2 1.19 β 0.23 +0 .23 − 0 .16 7.73e-2 1.14e-1 0.68 1.79e-2 1.27e-2 1.41 γ 0.91 +0 .10 − 0 .09 1.91e-2 5.05e-2 0.38 3.97e-3 2.81e-3 1.41 NH 1.23 +0 .17 − 0 .17 1.85e-1 1.39e-1 1.33 6.25e-2 4.37e-2 1.43 T able 4.2: Standard deviation v alues of the original data set and do ubled data set for full marginalization and non-marginaliz ation (all parameters fixed). P osteriors of ev ery parameter are displa y ed in the second column and come from T able C.3 of Riley (2019a). The 5 th and the 8 th column are att ac hed to giv e an idea w hat the ratio is b et w een the standard deviations of the original and d oubled data; also called sc aling factor . A step size of 1e-3 w as used for b oth original and doubled data, b ecause w e noticed along the w a y that bigger step sizing w as preferred for b etter results; whic h mean t less NaN v alues and more consisten t standa rd devi ation v alues in the p o st eriors com pared with p osteriors from the Mon te Carlo sampling. The a v erag e scaling factor of all parameters (5 t h column) is ∼ 1.8. a P osterior 68% credible in terv als from ST-U Mon te Carlo mo del sampling recorded b y Riley (2019a); T able C.3

(26)

Chapter 5 Discussion

5.1 Fisher information results

The decided step size s of 1e-6 for model setup of Section 3.2.1, was based on 1D slicing of mass and radius. Fractional differences between step sizes ranging from 1e-5 and 1e-7 were hardly distinguishable (see Figure 4.2). The question rises whether it is of such an importance which step size exactly to use. Especially in the case when the Fisher information was applied to the test case from Chapter 4 from Riley (2019a). We noticed that going towards smaller step size values results in significant smoothness errors of the log-likelihood surface (in the range of hundreds of percents difference). Going towards bigger step size values suggests ending up in rounding errors of the likelihood function, but seem less radical than changes using smaller step sizing (in the range of a few percent difference).

A firm conclusion that can be taken from results of the 1D slicing, is that on-diagonal values from the Fisher matrix are reliable looking at Figure 4.3. Standard deviations do not differ more than a couple of percent, which means that second order derivatives to the same parameter are relatively accurate. It is, however, useful to have slicing automated. This is even more relevant for 2D slicing. We now looked at 5 pairs of parameters, which, on forehand, were expected to have linear correlation. Those parameters are more likely to not be the trouble when inverting the matrix, because those pairs were correlated so the covariance would be finite and sufficiently large to check off-diagonal Fisher terms. Looking at second order partial derivatives of parameters which do not have such a clear linear correlation, could reveal errors that bother the inversion. It therefore is important that an automated tool is made for 2D slicing, so that covariances can be checked more easily. This may reveal the parameters which have problems while taking numerical derivatives and will explore solutions to the inversion of the Fisher matrix.

Regarding the check on the Hessian code we used, no relevant errors seem to occur. Numd-ifftools has obviously better precision (in the order of 1e-16) looking at the multivariate Gaussian comparison, but difference in fractional errors of numdifftools and our own code are for now probably irrelevant comparing errors in the derivatives, which are bigger. Fractional errors with our Hessian implementation resulted in values around 1e-8 (see Table 4.1). A note to make here is that the codes were compared using an analytical, smooth (multivariate) function. It is expected that Numdifftools has advantages for analyzing that function, as it is a automated package for taking derivatives.

That doesn’t mean we should neglect numdifftools in further implementation and evaluation of the likelihood function. It would be useful to find a way that the package could be used. Numdifftools uses more steps to calculate derivatives, whereas the current code used only uses one. Numdifftools also uses optimal step sizing, which we do not use because of boundary problems. This can be problematic for the evaluation of the likelihood surface, so could be explored in more detail.

(27)

The last remarkable note I want to make is that we expect the same fractional error for each code used if we inject the same value for the standard deviation. For parameters D and

ζs, the injected standard deviation in the multivariate Gaussian equation is 0.03. Our code has

same fractional error for both parameters (see 3th _{column of Table 4.1), but the numdifftools}

code has different values (see 4th _{column of Table 4.1). On the other hand, for parameters T}

p

and Ts, the opposite happens. It is not clear yet why this is happening, but for now we blame

it on computer precision errors.

5.2 Test case and problems regarding Fisher matrix

in-version

In the case of the evaluated ST-U model with real data from Chapter 4 of Riley (2019a), diagonal values seem to act as expected. A step size of 1e-3 is used to compute the Hessian matrices for original and doubled data. This is different than the model described in Section 3.2.1, but was more accurate than a step size of 1e-6 if we looked at 1D slicing. We don’t have a clear explanation for this yet. With the doubled data set, diagonal values on the Fisher matrix were approximately twice as big as the original data set (see on-diagonal values in the Fisher matrices attached in the Google Drive). The ’peakiness’ or curvature of the likelihood function,

holding one parameter free, is sharper. This results in tighter constraints with a factor of √1

2,

according to Equation 3.2. This improving factor can be seen in Table 4.2. Results in the 8th

column are all close to the factor √2.

However, we cannot say anything yet about constraints on posterior values with this in-formation. Therefore, we need to look at standard deviation values while marginalizing over

all parameters1, which means letting parameters free. Looking at the original data set,

stan-dard deviations are off by a factor of 1.0 - 1.5 compared to posterior Monte Carlo sampling constraints of Riley (2019a). However, standard deviations from the doubled data set are not different with a certain factor from the case with the original data set, which was, however, the case with non-marginalization (fixing every parameter but one).

This brings us to our initial research question; what will be the effect on constraints of X-PSI ST-U model parameters if a double observing time is measured? An argued guess of

Raaijmakers et al. (2019) states results will improve by a factor between √2 and 2. It can be

concluded that constraints on parameters will get better by doubling the data if we look at the

3th _{and 4}th _{column of Table 4.2. The only parameters that are acting completely unexpected}

are β and γ. We don’t have a specific solution or explanation for this. The scaling factor for double data ranges between a couple of percent (e.g. α, D) to an improvement of 300 - 400

percent for Tp and Ts. For M and Req in particular, constraints get improved by a factor of ∼

2.5. The average scaling factor is ∼ 1.8, but no firm conclusions can be made from this value

for now. Somehow M , Req, Tp and Ts are positively more effected by the doubled data set than

other parameters. The answer for that is somewhere in the Fisher matrix. This implies that the inverse of the Fisher matrix is not stable and needs to be improved to take firm conclusions on the case of doubled observing time.

I’ll first present some explanation and comments on why this might happen. We haven’t analyzed the likelihood surface thoroughly, so it’s hard to predict whether certain finite

dif-1_{Except from the radii of the hot spots of the neutron star. Those parameters were isolated from the}

(28)

ference steps will work in the same way for every parameter. At the moment it looks like a difficult problem to optimise the step for off-diagonal coefficients, and cannot be achieved by using the code we have. It would be useful to have numdifftools running while using Fisher, as this module can use optimal step sizing. We tried to understand numdifftools and play with the base-step and analyze the full-output of the numerical derivatives. However, more time needs to be spend exploring the documentation of numdifftools, as it’s not very clear.

It seems to be important to have off-diagonal values with more precision, as we are coming around to the idea that changes in off-diagonal elements in the Fisher matrix have radical effects on variances and covariances in the covariance matrix. We were getting this idea because on-diagonal elements in the Fisher matrix seem relatively stable when, for instance, different step sizing is used (based on 1D slicing). Another issue that may occur is that we were not using the maximum likelihood vector for the STU model. However, as we’re are taking second order partial derivatives, and log-likelihood surface looks parabolic from slicing evidence, we expect

a constant curvature value2_{. This implies we’re pretty insensitive to the fiducial point we are}

using.

The doubled data set is causing a higher SNR, which is not clear yet how this effects the likelihood function. An obvious conclusion for doubling the data set is that likelihood curvature will get steeper, which also probably asks for different step sizing.

We noticed during our evaluation that constraints (σ) are dependent of the step size s that was used. This indicates that off-diagonal values are sensitive to the step size. I will present some checks and techniques which can reveal the problem of coefficients in the Fisher matrix with particularly off-diagonal values, as those values seem to be the problem. This

may reveal more problematic parameters. First, both 1D and 2D slicing can be the first

indication whether a parameter is troubling the Fisher matrix. Especially 2D slicing is valuable once automatically implemented in the Fisher module. Manually checking every off-diagonal element of the N xN fisher matrix is not doable. More linear algebra techniques/tricks might improve Fisher matrix results. Adding priors for D, α and γ contributed significantly towards changes in the covariance matrix. We ended up with less NaN values after adding priors. Even tough priors do not contribute much the elements in the Fisher matrix, they are important for inversion if some parameters are highly degenerate. Also fixing the hot spot radii resulted in a better invertable Fisher matrix with less NaN values.

Last, we can take a better look at parameters which are not correlated with each other using Monte Carlo posteriors from Riley (2019a). In this way, we can take smaller sub-matrices of the Fisher matrix and invert those, which might end up in less problems. For instance, if mass is not correlated by a certain parameter, marginal variance of mass will not be affected when we leave that parameter out.

In the end, Fisher forecasting will always be Gaussian approximations. Trying to get those approximations right for the X-PSI likelihood function seems difficult, but not hopeless after all. It is useful to know that inversion is giving troubles. From this point it is getting quite technical and more knowledge on linear algebra and numerical derivatives is needed. More research on this will be needed to implement a potential Fisher module into X-PSI. Two papers which go into more technical aspects are Gill & King (2004) and Vallisneri (2007) and might be relevant while diving into this.

(29)

Chapter 6 Conclusion

Fisher information forecasting was applied to the X-PSI software package. First, a hessian function was implemented, to evaluate the likelihood function. We set up a model from the model construction page of Riley (2019b) for checks for our hessian function. Those results were looking good, but more checks on 2D slicing must have been done. This is because we ended up in inversion problems with are second model from Chapter 4 of Riley (2019a). Off-diagonal values might cause precision problems in the covariance matrix when inversion was done, but this is no firm statement yet. Step sizing of the numerical derivatives was a big deal during the process, so if numdifftools can be used in a way that it optimizes step sizing without raising errors on the likelihood function, it would be extremely useful. Adding priors and fixing certain problematic parameters appeared to be good fixing tools to prevent some issues of the inversion problem, but deeper knowledge into numerical derivatives, Fisher information forecasting and X-PSI is needed to make further steps.

In general, the numerical derivatives and their verification on the likelihood function are approximations. Along the thesis, decisions with seemingly highest accuracy were taken, but there are still questions and uncertainties left regarding the Fisher information forecasting applied to the X-PSI model. So for now we cannot put firm conclusions on the test case which was aiming to give an answer to a potential double observing time to neutron star J0030+0451. Better checks and confirmations that off-diagonal values in the Fisher matrix are correct, are needed. Results better than a factor of 2 are certainly not excluded, as in particular mass and radius are improving by a factor of ∼ 2.5! The average scaling factor of all parameters is ∼ 1.8. However, as said before, no firm conclusion can be taken because scaling factors per parameter differ a lot. Thus, it is not certain what the improvement will be on matter composition and the EOS of neutron star matter if the observing time is doubled. More research with Fisher information can be interesting though!

To end, open questions and uncertainties are still left regarding Fisher information fore-casting on X-PSI. Questionable is what precision we want the Fisher tool to have. As com-puting time of the Fisher tool cost roughly 15 minutes, and the full Monte Carlo sampling costs 100.000’s of computer hours, one can argue that less precision is satisfied for the Fisher-forecasting tool. However, the tool is not reliable enough yet to be implemented as a module into X-PSI. But, if inversion eventually seems to end up in correct standard deviation posterior values, a Fisher-forecasting tool can be valuable for premature evaluation of future experiments on X-PSI research.

(30)

Bibliography

[1] Abramowitz, M., Stegun, I. A. (Eds.). (1948). Handbook of mathematical functions with formulas, graphs, and mathematical tables (Vol. 55). US Government printing office. [2] Agathos, M., Meidam, J., Del Pozzo, W., Li, T. G., Tompitak, M., Veitch, J., ... Van Den

Broeck, C. (2015). Constraining the neutron star equation of state with gravitational wave signals from coalescing binary neutron stars. Physical Review D, 92(2), 023012.

[3] Arzoumanian, Z., Gendreau, K. C., Baker, C. L., Cazeau, T., Hestnes, P., Kellogg, J. W., ... Markwardt, C. B. (2014, July). The neutron star interior composition explorer (NICER): mission definition. In Space Telescopes and Instrumentation 2014: Ultraviolet to Gamma Ray (Vol. 9144, p. 914420). International Society for Optics and Photonics.

[4] Baym, G., Hatsuda, T., Kojo, T., Powell, P. D., Song, Y., Takatsuka, T. (2018). From hadrons to quarks in neutron stars: a review. Reports on Progress in Physics, 81(5), 056902. [5] Brodtkorb, P. A. (2008, August 1). Source code for numdifftools.core, Retrieved from

https://numdifftools.readthedocs.io/en/latest/_modules/numdifftools/core. html#Hessian

[6] Coe, D. (2009). Fisher matrices and confidence ellipses: a quick-start guide and software. arXiv preprint arXiv:0906.4123.

[7] Demorest, P. B., Pennucci, T., Ransom, S. M., Roberts, M. S. E., Hessels, J. W. T. (2010). A two-solar-mass neutron star measured using Shapiro delay. nature, 467(7319), 1081-1083. [8] Gill, J., King, G. (2004). Numerical issues involved in inverting hessian matrices. Numerical

issues in statistical computing for the social scientist. Wiley, Hoboken, 143-176.

[9] Janka, H. T., Langanke, K., Marek, A., Martnez-Pinedo, G., Mller, B. (2007). Theory of core-collapse supernovae. Physics Reports, 442(1-6), 38-74.

[10] Lackey, B. D., Wade, L. (2015). Reconstructing the neutron-star equation of state with gravitational-wave detectors from a realistic population of inspiralling binary neutron stars. Physical Review D, 91(4), 043002.

[11] Lattimer, J. M., Prakash, M. (2016). The equation of state of hot, dense matter and

neutron stars. Physics Reports, 621, 127-164.

[12] Lo, K. H., Miller, M. C., Bhattacharyya, S., Lamb, F. K. (2013). Determining neutron star masses and radii using energy-resolved waveforms of X-ray burst oscillations. The Astrophysical Journal, 776(1), 19.

(31)

[13] Nttil, J., Miller, M. C., Steiner, A. W., Kajava, J. J. E., Suleimanov, V. F., Poutanen, J. (2017). Neutron star mass and radius measurements from atmospheric model fits to X-ray burst cooling tail spectra. Astronomy Astrophysics, 608, A31.

[14] Myers, J. D. (2019, October 10). NICER documentation, NASA. Retrieved from https: //heasarc.gsfc.nasa.gov/docs/nicer/nicer_docs.html

[15] Oertel, M., Hempel, M., Klhn, T., Typel, S. (2017). Equations of state for supernovae and compact stars. Reviews of Modern Physics, 89(1), 015007.

[16] Psaltis, D., zel, F., Chakrabarty, D. (2014). Prospects for measuring neutron-star masses and radii with X-ray pulse profile modeling. The Astrophysical Journal, 787(2), 136. [17] Raaijmakers, G., Riley, T. E., Watts, A. L., Greif, S. K., Morsink, S. M., Hebeler, K., ...

Arzoumanian, Z. (2019). A NICER view of PSR J0030+ 0451: Implications for the dense matter equation of state. The Astrophysical Journal Letters, 887(1), L22.

[18] Riley, T. E. (2019). Neutron star parameter estimation from a NICER perspective (Doctoral dissertation, Universiteit van Amsterdam), https://dare.uva.nl/search? identifier=aa86fcf3-2437-4bc2-810e-cf9f30a98f7a.

[19] Riley, T. E. (2019). X-PSI: A prototype open-source package for neutron star X-ray Pul-sationSimulation and Inference, https://thomasedwardriley.github.io/xpsi/

[20] Riley, T. E., Watts, A. L., Bogdanov, S., Ray, P. S., Ludlam, R. M., Guillot, S., ... Gendreau, K. C. (2019). A NICER view of PSR J0030+ 0451: Millisecond pulsar parameter estimation. The Astrophysical Journal Letters, 887(1), L21.

[21] Shaw, A. W., Heinke, C. O., Steiner, A. W., Campana, S., Cohn, H. N., Ho, W. C., ... Servillat, M. (2018). The radius of the quiescent neutron star in the globular cluster M13. Monthly Notices of the Royal Astronomical Society, 476(4), 4713-4718.

[22] Vallisneri, M. (2007). A user manual for the fisher information matrix. Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, 91109.

[23] Watts, A. L. (2019, July). Constraining the neutron star equation of state using Pulse Pro-file Modeling. In AIP Conference Proceedings (Vol. 2127, No. 1, p. 020008). AIP Publishing LLC.

[24] Watts, A. L., Andersson, N., Chakrabarty, D., Feroci, M., Hebeler, K., Israel, G., ... Patruno, A. (2016). Colloquium: Measuring the neutron star equation of state using x-ray timing. Reviews of Modern Physics, 88(2), 021001.

[25] Watts, A., Xu, R., Espinoza, C., Andersson, N., Antoniadis, J., Antonopoulou, D., ... Hessels, J. (2014). Probing the neutron star interior and the Equation of State of cold dense matter with the SKA. arXiv preprint arXiv:1501.00042.

[26] Wittman, D. (2012). Fisher Matrix for Beginners. Physics Department,

Univer-sity of California, Davis, CA, available at: _{www.physics.ucdavis.edu/~dwittman/}

Fisher information forecasting applied to neutron star pulse-profile modelling

Fisher information forecasting applied

to neutron star pulse-profile modelling

Bob de Witte

11324929

A thesis presented for the BSc degree

Physics & Astronomy

Anton Pannekoek Institute (API)

University of Amsterdam

Netherlands

Fisher information forecasting applied to neutron star

pulse-profile modelling

Bob de Witte

Popular abstract

Fisher information forecasting applied to neutron star

pulse-profile modelling

Bob de Witte

Scientific abstract

Contents

Chapter 1

Introduction

Chapter 2

Theoretical framework

2.1

Likelihood function

2.2

Fisher information matrix

2.3

Numerical second order derivatives

Chapter 3

Methodology

3.1

Installation of X-PSI on Windows

3.2

Fisher matrix decisions and checks

3.2.1

Model setup

3.2.2

Hessian implementation

3.2.3

Step size

3.2.4

1D and 2D comparison

3.2.5

Multivariate Gaussian

3.3

Test case: effects double observing time on J0030+0451

Chapter 4

Results

4.1

Fisher information results

4.1.1

Step size

4.1.2

1D results

4.1.3

2D results

4.1.4

Multivariate Gaussian results

4.2

Test case results

Chapter 5

Discussion

5.1

Fisher information results

5.2

Test case and problems regarding Fisher matrix

in-version

Chapter 6

Conclusion

Bibliography