• No results found

Uses and Limitations of Fisher Forecasting in Setting Upper Limits on the Interaction Strength of Dark Matter

N/A
N/A
Protected

Academic year: 2021

Share "Uses and Limitations of Fisher Forecasting in Setting Upper Limits on the Interaction Strength of Dark Matter"

Copied!
79
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

GRAVITATION AND ASTROPARTICLEPHYSICS

M

ASTER

T

HESIS

Uses and Limitations of Fisher

Forecasting in Setting Upper Limits

on the Interaction Strength of Dark

Matter

by

Maikel D

OORNHEIN

10104453

May 2018 60 ECTS Februari 2016 - May 2018

Supervisor/Examiner:

Dr. Christoph W

ENIGER

Examiner:

Dr. Shin’ichiro A

NDO

Research Group for Gravitation and Astroparticle Physics (GRAPPA) Institute of Physics

(2)

Abstract

In determining the true nature of dark matter it can be useful to set confi-dence intervals, or limits, on the model parameters that control its interac-tion strength. Computer generated ’mock’ data can be created and analyzed by means of Monte Carlo simulations from which we can construct expected upper limits, possibly serving as detection thresholds for a new, yet to be build, measuring apparatus. These simulations are highly computationally expensive which is why an alternative, faster approach to calculating limits would be of great interest. This method is available in the form of a Fisher Forecast, using the Fisher information matrix it is possible to give a fore-cast of the expected limit one would obtain from a full Monte Carlo simula-tion. We explore this technique, its accuracy and its limitations, applied to an indirect dark matter detection experiment subject to a dark matter anni-hilation model with one free parameter. Sky maps of dark matter halos are created and subsequently, using both techniques, upper limits are calculated for the thermally averaged cross section hσvi so that their performance can be compared. We show that the Fisher Forecast performs equally well in those situations where the distribution of maximum likelihood estimators is de-scribed by a normal distribution, but that significant deviations arise when this prerequisite is violated due to small sample size or, equivalently, due to non-Gaussian likelihood functions.

(3)

Contents

Abstract ii

1 Introduction 3

2 Astrophysics of Dark Matter Halos 7

2.1 Distribution of Dark Matter . . . 7

2.1.1 NFW-Profile . . . 7 Virial radius . . . 8 Determining Rs . . . 9 Determining ρs . . . 10 2.1.2 Astrophysical Targets . . . 11 2.1.3 Visualization . . . 11 Einasto-profile . . . 12 2.2 Indirect detection . . . 12 Expected flux . . . 13 The J-factor . . . 16 Annihilation channels . . . 18 Fermi LAT . . . 19

2.2.1 Creating a sky map of the SMC . . . 20

HEALPix coordinates . . . 20

Euclidean coordinates . . . 22

3 Statistical Framework and Data Simulation 27 3.1 Poisson Statistics . . . 28

3.2 The likelihood function . . . 32

Frequentist inference . . . 34

Performance . . . 36

3.3 Confidence Intervals . . . 37

Test statistics, Significance and Power . . . 38

Maximum Likelihood Ratio . . . 40

Neyman-Pearson lemma. . . 41

Wilks’ theorem . . . 42

3.3.1 Monte Carlo simulation . . . 43

Containment regions . . . 44

4 Fisher forecasting 48 4.1 Fisher information . . . 48

4.1.1 Explicit calculation . . . 50

(4)

4.1.3 Forecast of confidence interval . . . 55

Multiple parameters . . . 58

Equivalent Counts Method . . . 59

4.1.4 Upper limits from Fisher forecast . . . 61

4.2 Additional features . . . 61

4.2.1 Energy binning . . . 62

4.2.2 Smoking gun signal . . . 63

4.2.3 Angular resolution . . . 64

4.3 Limitations of Fisher approach . . . 66

4.3.1 Skewness of Poisson distribution . . . 67

5 Conclusion 72

(5)

Chapter 1

Introduction

The history of dark matter goes back at least as far as 1884, when Lord Kelvin made an attempt to answer a compelling question raised by scientists in pre-vious years.[1] On the night sky they observed vast regions of darkness, which raised the question if there existed ’dark’ bodies that did not emit any light themselves and were blocking the light from stars that lay behind. Lord Kelvin derived an estimate for the mass of our galaxy by measuring the velocity dispersion of stars orbiting the center of the Milky Way, and sub-sequently compared the result to the best available estimates of all observ-able mass it contained. The first exceeded the latter significantly which led him to propose that indeed there existed dark bodies in the galaxy, perhaps even vastly outnumbering the visible stellar bodies. These dark bodies were postulated not to emit any light but in principle they could still reflect it, al-though the intensity would be far too low for us to observe them. In this the concept of dark bodies differs from the contemporary notion of dark matter, which is believed to be a substance that does not interact with light in any way.

Half a century later in 1933 German physicist Fritz Zwicky build upon the ideas of Lord Kelvin, this time considering an astrophysical object much more massive and distant then the center of the Milky Way.[2] Zwicky tar-geted the Coma Cluster, a cluster of more then a thousand galaxies some 300 million light years from Earth. Due to the redshift of their light spectra he could measure the velocity dispersion of the individual galaxies within the cluster which led him to discover, much like Lord Kelvin did, that some of them possessed velocities with a magnitude that were way too high for them to be gravitationally bound. Based on the total gravity of all visible matter in the cluster these galaxies should simply fly off into space. The empirical fact that they do not, combined with the virial theorem, seems to indicate that there is at least six times more mass contained within the Coma Cluster then can be observed.

It was in the 1970’s that Vera Rubin rekindled the concept of dark matter when she made the discovery of flat rotation curves of stars in spiral galax-ies.[3] The persistence of the discrepancy, and the promise of new physics made physicists undertake great efforts in an attempt to identify the true na-ture of the substance responsible for this additional, yet unobservable, mass.

(6)

A great number of ideas were brought forward, many of which still exist in some form today, but in general three main categories can be distinguished. Some physicists believed that the additional mass fell into the category of the dark bodies proposed by Lord Kelvin, which came to be known as MAssive Compact Halo Objects (MACHOs). Others followed the promising lead that dark matter might be an entirely new, as yet undiscovered, type of particle, that interacts only with the weak and gravitational forces. These types of particles are collectively referred to as Weakly Interacting Massive Particles (WIMPs). Still others believed that it had to be our theory of gravity that was incomplete, like it had been shown to be before when Einstein developed his theory of relativity. Disappointingly, little progress was made on determin-ing which of these possibilities was correct.

With the study of the gravitational interactions inside the Bullet cluster even more evidence for the existence of dark matter became available,[4] making the necessity for a solution to the problem ever more pressing. Slowly but surely the ideas involving MACHOs were discredited, there simply did not seem to be enough dark bodies roaming the galaxy. A rough estimate could be given since dark bodies can traverse in front of an illuminating object so that they can occasionally be observed, not unlike one of the methods used to detect exoplanets nowadays. In addition, gravitational lensing effects should give away their presence as well. Attempts to alter the theory of gravity in such a way that it would account for an additional gravitational effect on the very large scales of the universe were equally unfruitful. More importantly, the degree of gravitational lensing the bullet cluster provided seemed to con-tradict this possibility altogether. The theory of dark matter being some kind of exotic particle, especially of the WIMP variety, seemed the most promising direction to proceed in. It needs to be stressed however that WIMPs are in no way the only particles postulated as dark matter candidates, other examples include the axion[5] and the gravitino. Even though no convincing detection of WIMPs has ever been made they continue to be a promising dark matter candidate and therefore we shall focus on them exclusively in this thesis. Besides the arguments for the existence of dark matter discussed above, a wealth of complementary evidence is supplied by the field of cosmology. For instance, the total amount of ordinary baryonic matter can be derived from the observed abundance ratio between deuterium and hydrogen that formed in the early universe. According to the current Big Bang model most of the baryonic matter was created during the Big Bang Nucleosynthesis (BBN). Calculations yield that the portion of baryonic matter makes up roughly 20% of the total energy density of the universe, corroborating that dark matter is not made up of baryonic matter.[6] Independent evidence originates from the anisotropies in the Cosmological Microwave Background (CMB), detailed images of which are available thanks to the WMAP and PLANCK satellites. The anisotropies correspond to a still image of the acoustic oscillations in the proton-photon plasma making up the early universe, resulting from the tug-of-war between radiation pressure and gravity, at the exact time recom-bination took place and the universe became transparent. It is thought that

(7)

the size of the fluctuations in the CMB can not have grown as large as they did without the help of dark matter. Whereas the baryonic matter is pre-vented from clumping together due to the radiation pressure, dark matter does not interact with the electromagnetic field and is not prohibited from doing so. The idea is that the early clumping of dark matter increased the magnitude of the acoustic oscillations because of its additional gravitational pull on baryonic matter. The anisotropies suggest that 84% of all matter is dark matter,[7] in good agreement with the calculations derived from the deuterium-hydrogen abundance ratio.

Although we know very little about the nature of dark matter it is possible to get an indication of the magnitude of its thermally averaged annihilation cross-section hσvi. Imagine the early universe where dark matter particles exist in thermal equilibrium, being continuously created and annihilated. While the universe expands both the temperature and number density drop until the point where the annihilation rate becomes so low that thermal equi-librium can no longer be maintained. This point in time is also referred to as dark matter "freeze-out" and from that moment onward the total amount of dark matter in the universe remains more or less constant. The annihila-tion rate at the moment of freeze-out is equal to the annihilaannihila-tion cross secannihila-tion times the number density. The latter can be derived from the estimated con-temporary density so that we can calculate a very rough estimate for hσvi as long as we specify a particle mass Mχ. In the case of cold dark matter, e.g.

the ΛCDM model, Mχ is believed to lie in the GeV - TeV range, so that it

can be estimated that hσvi ∼ 3 × 10−26cm3s−1

.[8] This happens to be close to the electroweak scale, providing motivation for postulating WIMPs as dark matter candidates.

In principle there exist three methods for detecting the existence of WIMPs by means other than their gravitational pull. First, we can attempt to observe them directly due to their interaction with the weak force, examples include experimental setups deep underground that try to capture the interaction of a dark matter particle with a large body of liquid xenon. No evidence for WIMPs have been found to this date using direct detection but limits keep improving.[9] The second available detection method is to look for dark mat-ter in particle colliders like the Large Hadron Collider in Geneva. If the con-servation laws, like those of energy and momentum, appear to be violated during particle collisions the only possible solution may turn out to be that undetected WIMPs carried the difference away.[10] Likewise the method of direct detection no discovery has been made as of yet.[11]

The final method of detection is that of indirect observation, often with the goal of constraining dark matter candidates, a method with a lengthy his-tory.[12] It relies on the theory that dark matter particles may spontaneously decay by themselves or, alternatively, occasionally annihilate each other when collisions occur between them. Both would result in a cascade of particles we do know from the Standard Model, finally resulting in γ-ray photons that we can easily observe with well known methods. The challenge lies in distin-guishing the γ-rays originating from dark matter from those already present

(8)

due to other astrophysical sources. This entails mapping all possible other sources of photons in the relevant energy range. What remains is the sta-tistical challenge to determine if a significant photon excess correlates with known locations of high concentrations of dark matter, for example a cluster of galaxies. A promising result was obtained quite recently when a poten-tial γ-ray excess was detected from the center of the Milky Way, resembling a signal we would expect from WIMP annihilation.[13][14] However, others found that the measurements could be entirely explained by the emission from galactic millisecond pulsars.[15]

When no clear discovery can be made, a next step often entails calculating confidence intervals or upper limits, a one sided confidence interval, on one or more parameters that appear in the theory. This allows for the possibility of excluding certain dark matter candidates, or for more tightly constraining the interaction strength or decay width of the dark matter candidate of inter-est. Alternatively, without any real measurements at all it is also possible to put expected limits on parameters. One would then generate random ’mock’ data for the background only, thereby simulating a real measurement from some instrument if there were no signal, and analyzing this by means of a Monte Carlo simulation. A conceivable reason to be interested in such an analysis is that one can investigate the necessary sensitivity of the measuring apparatus to probe certain parameter regimes associated with the model. An upper limit found in this manner corresponds to the parameter regime were a statistically significant detection is impossible and hence if the true value of the parameter of interest lies below the upper limit, then the experiments sensitivity would be insufficient to discover it.

In this thesis I will take the approach of indirect dark matter detection. How-ever the main goal of the thesis is not to improve existing limits, or even to match them. Instead we would like to investigate a recently presented new technique that utilizes Fisher information and allows for calculating ex-pected upper limits without the need for computationally expensive Monte Carlo simulations.[16][17] The Fisher approach is much quicker and can be equally accurate, thus possibly making it a superior alternative in many situ-ations. The thesis will focus on the accuracy of the Fisher forecast, of specific interest is if results agree with those from the tried and tested Monte Carlo simulations as well as when significant deviation can be expected. In Chap-ter 2 I will discuss the astrophysics underlying indirect dark matChap-ter detection and will show how an accumulation of dark matter in and around a satel-lite galaxy can be represented in a computer generated sky-map. Chapter 3 focuses on the statistical tools necessary for setting confidence intervals and utilizes these in a Monte Carlo simulation that calculates upper limits on the interaction strength of WIMPs as a function of their mass. In chapter 4 we will explore the Fisher method of setting upper limits, showing that running computationally expensive Monte Carlo simulations are in some cases obso-lete. At the end of the chapter we will introduce some additional features like energy binning as well. Some final remarks are given in the Chapter 5, containing a short conclusion.

(9)

Chapter 2

Astrophysics of Dark Matter Halos

In this chapter we will investigate how we go about the physics of describ-ing a γ-ray signal originatdescrib-ing from a dark matter halo due to WIMP decay or annihilation. Aside from the astrophysics and particle physics involved we will take a look at the properties of the Fermi Large Area Telescope, which we adopt as our means of observation. At the end of the chapter we present the expected signal from two astrophysical targets, the Milky Way and the Small Magellanic Cloud, in computer generated sky-maps using two differ-ent types of coordinates.

2.1

Distribution of Dark Matter

In order for us to estimate the magnitude of the signal we expect to receive from annihilating or decaying dark matter and, ultimately, do statistical anal-ysis with this expected signal, we first need to know how dark matter tends to distribute itself in and around a massive source, e.g. a galaxy. The en-tire spherically symmetric mass distribution is referred to as a dark matter halo. There exist a multitude of density profiles that attempt to describe the mass distribution within the halo, one of the most commonly used being the Nevaro-Frenk-White (NFW) profile.

2.1.1

NFW-Profile

Named after the initial discoverers, the NFW-profile was found by fitting to dark matter halos identified in N-body simulations.[18] The profile as a function of distance to the center r has the following mathematical shape:

ρ(r) = ρsRs r(1 +Rr

s)

2 . (2.1)

Here, ρsand Rsare scaling parameters which vary from one dark matter halo

(10)

determine their values, but first let us find the total mass contained within a sphere with radius Rmax by means of integration,

M = Z Rmax 0 4πr2ρ(r)dr , = 4πρsRs Z Rmax 0 r (1 + Rr s) 2dr , = 4πρsRs  R2sln(Rs+ r) + R3 s Rs+ r Rmax 0 , = 4πρsR3s  ln(Rs+ Rmax Rs ) − Rmax Rs+ Rmax  . (2.2)

From the last line we can see that the total mass is divergent, since if Rmax

increases without limit, so does M . Therefore it is useful to define a cut-off edge at some radius, outside of which we set the dark matter density equal to zero. Although arbitrary in principle, a natural choice for the cut-off turns out to be the virial radius, as we will be able to show later.

Virial radius

The virial radius is that radius where the velocity dispersion σ of a gravi-tationally bound system is at a maximum. The velocity dispersion can be written as

σ2 = khv2i , (2.3)

where k is a parameter that reflects the kinematic structure within a galaxy. To obtain an expression for the virial radius we can use the virial theorem to relate the velocity dispersion to the mass and radius of an extended medium. The virial theorem states

2K + U = 0 , (2.4)

where the gravitational binding energy U of an extended mass can be calcu-lated to be

U = −3 5

GM2

R , (2.5)

and the kinetic energy contained within the halo is

K = 1

2M hv

2i .

(11)

Now, according to (2.3) and (2.4): M hv2i = 3 5 GM2 R , (2.7) σ2 = 3GM 5kR , (2.8)

the latter sometimes approximated by,

σ2 ≈ GM

R . (2.9)

As mentioned before, the virial radius (and mass) are those that maximize the velocity dispersion σ:

σ2max = GMvir Rvir

, (2.10)

where Mvir is the mass contained within the virial radius. From equation

(2.10) we see that it is in principle possible to acquire the virial radius of a galaxy by measuring the velocity dispersion and determining where it is maximal. Unfortunately these measurements can be difficult and labor in-tensive.

Determining Rs

Since locating the virial radius requires knowledge about the radius where the velocity dispersion is maximal, and this is difficult to obtain observation-ally, it is often useful to approximate it by different means. This is done by setting it equal to the radius where the average density of dark matter within that radius equals a specific number of times the critical density of the uni-verse ρcrit. Frequently a multiple of 200 is chosen, and we shall do the same:

200ρcrit∼ M 4 3πR 3 vir . (2.11)

The critical density is defined as

ρcrit =

3H2

8πG, (2.12)

where H is the Hubble constant and G is the gravitational constant. We then find our approximate value for Rvirto be

(12)

Rvir≈ 3

s 3M 800πρcrit

. (2.13)

Now that we have obtained the virial radius Rvirwe can relate it to the scale

radius Rsby introducing a dimensionless concentration parameter c,

Rs=

Rvir

c . (2.14)

The concentration parameter depends on halo mass only. Typical values lie between 4 and 40. For example, the Milky Way has approximately: 10 < c < 15. Since our main concern will be with dwarf and satellite galaxies we will use a function for the concentration parameter c(M ) that is especially well suited for a low mass range.[19] It is given by

c = 5 X i=0 ailn( 0.7M M )i, (2.15)

where the constants ai are given by those quoted in Table 2.1. We now

ex-pressed Rscompletely in terms of constants and the halo mass.

TABLE2.1: Constants aiused in determining the concentration

parameter c. Constant Value a0 37.52 a1 −1.51 a2 1.64 × 10−2 a3 3.66 × 10−4 a4 −2.89 × 10−5 a5 5.32 × 10−7 Determining ρs

Next, we want to find the scale density ρs in terms of halo mass as well. It

can be found quite easily by using equation (2.2) for the total mass within the halo. For the maximum radius Rmax we use Rvir and the value for Rs was

defined in the previous section as Rvir

(13)

M = 4πρs( Rvir c ) 3 ln( Rvir c + Rvir Rvir c ) − R Rvir vir c + Rvir ! , M = 4πρs( Rvir c ) 3  ln(1 + c) − c 1 + c  , ρs= M c3 4πR3 vir  ln(1 + c) − c 1 + c −1 , (2.16)

where Rvir and c are given by equation (2.13) and (2.15) respectively. Hence

we have expressed ρs solely in terms of M as well.

2.1.2

Astrophysical Targets

Many different astrophysical objects have been picked as targets in indirect dark matter detection experiments, e.g. galaxies, dwarf galaxies and entire clusters of galaxies.[20] One might imagine that for higher mass targets more dark matter resides in its halo and therefore a stronger signal can be expected. Also, the closer the target the higher the flux we will receive. This line of reasoning makes the center of the Milky Way the most attractive target, and it has indeed been used in many indirect detection experiments. Yet, more important then high mass is a high mass-to-luminosity ratio, because this implies a high dark to baryonic matter ratio. If a large fraction of the total mass is made up of dark matter this allows for a better signal to noise ratio. A generally excepted view is that dwarf spheroidal galaxies (dSph) have the highest M-L ratios and are dominated by dark matter at radii larger then 3-4 times the scale radius of the stellar disk.[21] A clear correlation between the M-L ratio and the luminosity of dwarf galaxies was found by Penny and Con-selice (2011),[22] suggesting that the faintest dwarf galaxies have the highest dark matter content in order to stay gravitationally bound. An additional benefit of dSph over regular galaxies is that localized astrophysical emission at X-ray and γ-ray energies is almost absent as well. Thereby measurements are not likely to be contaminated by for example millisecond pulsars, which could be the case in the measured GeV-band photon excess from the center of the Milky Way that we discussed in the introduction. For the reasons above the 8 dwarf galaxies of the Milky Way are excellent targets for the indirect de-tection of dark matter.[23] They may not have the highest possible M-L ratio, but they are certainly the closest.

2.1.3

Visualization

We now have all the information we need to calculate the distribution of dark matter, purely in terms of the mass contained within the halo up to Rvir. Let

us pick an astrophysical target and see what the dark matter density looks like as a function of radius. Because of the reasons just mentioned, satellite

(14)

and dwarf galaxies are of particular interest. We choose the Small Magellanic Cloud (SMC), although we could have taken any of the seven other satellite galaxies. Figure 2.1 shows a plot of the NFW density profile as a function of radius for the SMC. The values of relevant parameters are shown in Table 2.2.

When we look at Figure 2.1 we can clearly see a break in the density curve at r ∼ Rvir. Up until one scale radius the density decreases with roughly

the first power of the radius, while at larger radii it decreases with the third power, as can be expected by eyeballing equation (2.1). The rate at which the NFW-profile contributes to the total mass of the halo with increasing radius diminishes considerably at radii larger than Rvir = cRs. This is thus the

motivation we mentioned earlier for putting the cut-off radius of the halo at Rvir.

Einasto-profile

The Einasto density profile[24] is sometimes used as an alternative for the NFW-profile,[25] because it is in slightly better agreement with higher reso-lution numerical simulations. This is unsurprising since the Einasto profile features an extra free parameter α:

ρ(r) = ρsexp  −2 α  r Rs α − 1  , (2.17)

where α controls the degree of curvature. The higher α the faster the density changes with radius. This becomes apparent when we calculate the deriva-tive on a log-log scale:

d log (ρ) d log (r) = −2  r Rs α . (2.18)

The Einasto profile thus resembles a generalized version of a power law ρ ∝ r−α. But whereas a power law has constant slope on a log-log scale, the Einasto profile has a slope that changes faster with radius as α gets larger. Two examples, with α = 0.10 and α = 0.15, are plotted in Figure 2.1.

2.2

Indirect detection

Although WIMPs do not interact with light in any way, and can thus not be observed directly, at least with light, it is still possible to infer their existence indirectly if dark matter particles get converted into photons themselves. We can conceive of two main mechanisms that can cause this to occur. The first is spontaneous decay of the particles into photons which may occur for sterile neutrinos, a hypothetical dark matter candidate that only interacts with the

(15)

TABLE 2.2: Relevant properties of the SMC for calculating its density profile. Parameter Value Mass 6.5 × 109M [26] Rvir 38.5 kpc c 11.9 Rs 3.2 kpc ρs 9.3 × 106M /kpc3

gravitational force. Photons produced in this way are expected to possess energies somewhere in the X-ray regime.[27]

Secondly, if the dark matter particle happens to be its own anti-particle, two particles can annihilate when they happen to collide with each other. For in-stance Weakly Interacting Massive Particles (WIMPs), one of the more promis-ing dark matter candidates and hence extensively researched over the last decades, satisfy the condition of being their own anti-particle. The typical predicted mass of a WIMP lies in the range 100 MeV - 1 TeV, making photons produced by their mutual annihilation energetic enough to be in the γ-ray regime.

It can thus be expected that, compared to all other known sources, an excess of γ-rays is observable in the direction of a large concentration of mass, like the SMC, if indeed WIMPs make up the dark component of gravity in the universe. A statistically significant detection of such an excess, while it is certain that no astrophysical contribution to the γ-ray background spectrum has been ignored, would be strong indirect evidence for the existence of an-nihilating dark matter particles in the given mass range. How large of a flux we can expect is the topic of the next few sections.

Expected flux

Imagine a dark matter halo as viewed from Earth with an apparent angular size of a few degrees in the night sky. The expected intensity of the pho-ton flux from annihilating dark matter processes depends on the direction in which we are observing. If the direction in which we are observing is di-rectly towards the center of the halo we can expect a higher intensity, while we expect a lower intensity when we look at the edges of the halo. This is not only because the dark matter density is higher at the center, but also due to the halo’s spherical shape, because the line of sight (l.o.s) in the direction of the center traverses a larger distance through the halo, thereby intercepting more dark matter in total. See also Figure 2.2 for a schematic side-view.

(16)

FIGURE 2.1: The shapes of the NFW and Einasto mass den-sity profiles as a function of radius. The NFW-profile displays a characteristic break at r = rs where the slope increases from

a power law of first order to a power law of third order. Two curves represent the Einasto profile to emphasize that an

addi-tional parameter α has to be chosen.

In this section we will derive a general formula for the expected flux from such a halo due to annihilating dark matter, while simply stating the equiv-alent formula for decaying dark matter, since it is derived in a similar man-ner. For decaying dark matter we expect to find an expression containing the integral over density, since the rate of decay and thus the rate of pho-ton production depends on the concentration of particles in a linear fashion. For annihilating dark matter we expect the integral over the density squared, since two dark matter particles have to collide for them to annihilate, making the process dependent on the densities of both particles participating in the collision.

Let us start by considering a region of space dV containing Nχ dark matter

particles of mass mχ. The frequency with which collisions occur depends

both on the average velocity of the particles as well as their cross section, which is why we introduce the velocity averaged cross section hσvi to quan-tify the probability of collision. The annihilation rate per particle is

ξ = ρ(r) mχ

hσvi , (2.19)

(17)

Nχ 2 = nχ 2 dV = ρ(r) 2mχ dV , (2.20)

where nχ = Nχ/dV = ρ(r)/mχ and the factor of 1/2 is to ensure we do not

count the same interaction twice. Consequently the total rate of annihilation can be seen to be[28]

R = ρ(r) mχ hσvi  ρ(r) 2mχ dV  = hσviρ(r) 2 2m2 χ dV . (2.21)

When a pair of dark matter particles annihilate they can do so through many different annihilation channels and the photons produced through each chan-nel have their own unique energy spectrum. To account for this we write the total produced energy spectrum as the rate multiplied by the sum over all energy spectra f , dN dEγ = ρ(r) 2 2m2 χ X f hσvif dNf dEγ dV . (2.22)

Next, we choose a spherical coordinate system with the observer at the origin and the volume element dV at a distance s, covering a solid angle dΩ, such that

dV = s2ds dΩ . (2.23)

The flux produced by volume element dV is uniformly distributed over a sphere so that we can write

dφ dEγ = dN/dEγ 4πs2 = ρ(r)2 8πm2 χ X f hσvif dNf dEγ ds dΩ . (2.24)

Subsequently, the total differential flux is obtained by integrating along the line of sight and solid angle, so that in the case of annihilating dark matter it is given by dΦ dEγ = 1 4π Z ∆Ω dΩ Z l.o.s dsρ(r) 2 2m2 χ X f hσvif dNf dEγ . (2.25)

This equation is essentially made up of two parts, an astrophysical part rep-resented by the l.o.s. integral over density and solid angle, while the other factors have their origin in particle physics. Because of this, it is convenient to give the integral its own symbol J, referred to as the J-factor,

(18)

Jann. = Z ∆Ω dΩ Z l.o.s dsρ(r)2. (2.26)

We will show how to calculate it in the next section, where we express the density ρ(r) as a function of distance s and opening angle θ. We thus end up with the following equation:[28]

dΦ dEγ = Jann. 8πm2 χ X f hσvif dNf dEγ , (2.27)

where mχ is the mass of the WIMP, hσvif is the velocity averaged cross

sec-tion of annihilasec-tion channel f and dNf/dEγ is the energy spectrum of the

photons produced in that channel. Analogous equations hold for decaying dark matter:[29] Jdec. = Z ∆Ω dΩ Z l.o.s dsρ(r) , (2.28) dΦ dEγ = Jdecay 4πmχ X f Γf dNf dEγ , (2.29)

where Γf represents the decay rate of decay channel f . To calculate the

num-ber of expected photons we can integrate (2.27) over energy and multiply by the exposure of the observational instrument,

Nγ,s= tobs Aef f Z J ann. 8πm2 χ X f hσvif dNf dEγ dE , (2.30)

where the exposure is the observation time tobs multiplied by the effective

area Aef f. In the next section we will show how to calculate the J-factor.

Afterwards we take a look at the many possible annihilation channels and examine some of their energy spectra.

The J-factor

From the previous section it is clear that, for annihilating dark matter, the J-factor of a halo equals the density squared integrated over l.o.s. and solid angle Jann = Z ∆Ω Z l.o.s. ρ(r)2dsdΩ . (2.31)

However, the NFW-profile describes the density ρ in terms of the distance from halo center r, while integral (2.31) is over the distance along l.o.s s and

(19)

solid angle ∆Ω. Fortunately we can relate r to s by using simple geometry, the situation is schematically drawn in Figure 2.2. Because of spherical sym-metry we can find a function r(s, θ) where θ is the opening angle between the lines of sight through the center and an arbitrary one.

FIGURE2.2: Schematic side-view of the geometrical situation of

halo observation. The observer is positioned at the leftmost side of the figure and looks in the direction of the line of sight with opening angle θ and a distance variable s, while the distance to the halo center is d and its cut-off edge Rvir. The distance

to an arbitrary point (s, θ) in the halo is r(s, θ) so that ρ[r(s, θ)] determines the dark matter density at that location.

The rule of cosines

c2 = a2+ b2− 2ab cos(C) , (2.32)

can be applied to find

r(s, θ) =pd2+ s2− 2ds cos(θ) . (2.33)

Here, d represents the distance from the observer to the halo center. Plugging equation (2.33) into (2.1) and the resulting density profile in equation (2.31) we find:

(20)

Jann = Z ∆Ω Z l.o.s. " ρsRs r(s, θ)(1 +r(s,θ)R s ) 2 #2 dsdΩ . (2.34)

In order for us to carry out the integration over solid angle it is necessary to obtain the function that relates opening angle to solid angle. This expression is given by: Ω = 2π  1 − cos(θ 2)  , (2.35) such that dΩ dθ = π sin( θ 2) , (2.36) dΩ = π sin(θ 2)dθ , (2.37)

and equation (2.34) becomes:

Jann= π Z θmax 0 sin(θ 2) Z l.o.s. " ρsRs r(s, θ)(1 + r(s,θ)R s ) 2 #2 dsdθ , (2.38)

where θmax is now the opening angle between the center of the halo and

Rvir. This completes the calculation of the total J-factor of a dark matter halo,

which we now completely expressed in terms of halo distance d, mass M and maximum opening angle θmax.

Annihilation channels

When WIMP-like dark matter particles annihilate they do not in general de-cay directly into two photons, although this is not forbidden. Multiple differ-ent annihilation channels can be followed, resulting in intermediate reaction products, before eventually decaying to photons. Examples include annihi-lation to a variety of quarks, leptons, gluons and even Higgs bosons. See Figure 2.3[30] for a visual summary of some of these possibilities. For our purposes it will not be necessary to consider all of them, instead we will pick two representatives

(21)

which are the annihilation channels to a bottom, anti-bottom quark and to a charm, anti-charm quark. The photons produced through different chan-nels have different energy spectra, as denoted in (2.27) bydNf

dEγ, where f is the

channel under consideration. The energy spectra, shown for a dark matter mass of 100 GeV in Figures 2.4 and 2.5, are obtained by interpolating the val-ues provided by the tables in PPPC4DMID,[29] using a short program writ-ten by reference [31]. For the main results of the thesis we will exclusively adopt the χχ → bb annihilation channel.

FIGURE2.3: Examples of the many possible annihilation

chan-nels that can result from the annihilation of two dark matter particles χ.

Fermi LAT

So far we have not thought about the practicality of how we would actually detect excess γ-rays from a source. Therefore a short intermezzo on an instru-ment that is capable of doing so, the Large Area Telescope (LAT). LAT is an instrument on board the Fermi Gamma-ray Space Telescope capable of de-tecting individual γ-rays. Launched in 2008 one of its missions was to detect a γ-ray excess originating from the Milky Way center. We can thus assume that if such an excess exists it will be detectable by the LAT, or by a simi-lar more sensitive device to be build in the future. Although we will not be considering real datasets from Fermi-LAT we can investigate its sensitivity to discovery by generating mock data assuming a background signal only. Figure 2.6 shows the effective area of the LAT as a function of photon energy if the angle of incidence is 0◦.[32] We will be interested in the energy range

(22)

FIGURE2.4: The energy spectrum of the b¯b annihilation channel for Mχ= 100 GeV.

back is relatively constant at 0.9 m2. This value will be used when we wish

to generate mock data by means of Monte Carlo simulation. We will further assume an observation time of 6 years. The effective area of the instrument multiplied by the observation time is referred to as the spectral exposure of the experiment.

In addition to choosing a spectral exposure, we need to have as accurate as possible measurements of the γ-ray background if we want to distinguish a small signal on top of that background later on. Since we picked a satellite galaxy as a target we can neglect contributions from localized X-ray and γ-ray sources like millisecond pulsars. For simplicity we assume a simple back-ground model consisting only of the uniform extragalactic γ-ray backback-ground. Extensive measurements hereof have been performed by Fermi-LAT, the re-sults are shown in Figure 2.7.[33]

2.2.1

Creating a sky map of the SMC

HEALPix coordinates

We want to visualize the γ-ray flux that, supposedly, gets created in the Small Magellanic cloud due to dark matter annihilation. To this end we are going to create a full map of the sky with sufficient resolution to distinguish any features in the flux received. A convenient way of making a map of the sky is to use HEALPix coordinates.[34] This is a particular way of projecting a

(23)

FIGURE2.5: The energy spectrum of the c¯cannihilation channel for Mχ= 100 GeV.

2-sphere, like the surface of the Earth, or in our case the night sky, onto a Eu-clidean plane. The mapping results in the pixels having a non-square shape. We choose a resolution of NSIDE = 210.

For every pixel in the map we will calculate the J-factor by integrating over the line of sight that passes directly through the center of that pixel. Then we assume that the pixels are sufficiently small for us to treat the l.o.s. integral constant within a pixel, so that

Jpixel= Z ∆Ω Z l.o.s. ρ [r(s, θ)]2dsdΩ = ∆Ω Z l.o.s. ρ [r(s, θ)]2ds , (2.40) where ∆Ω is the solid angel covered by one pixel. This obviously saves an enormous amount of computational work with negligible information loss as long as we keep the resolution high enough. Also, for the total J-factor the integral over solid angle now turns into a sum over pixels,

Jdec= X pixels ∆Ω Z l.o.s. ρ [r(s, θ)] ds , (2.41) Jann = X pixels ∆Ω Z l.o.s. ρ [r(s, θ)]2ds . (2.42)

Now, because the SMC is rather small on a full map of the sky, it is more insightful to show a map of a larger target, the Milky Way. This makes it

(24)

FIGURE 2.6: The effective area of Fermi-LAT. In the energy range from 10 GeV to 1000 GeV it is more or less constant at

0.9 m2.

visually much clearer what the difference is between received flux from a decaying dark matter type as opposed to an annihilating one. To be clear, after this section we will be working with the SMC only, so the coming plots are just to show how equation (2.38) translates into a HEALPix sky-map. A graph of the J-factor of the Milky Way as a function of opening angle is shown in Figure 2.8, where we estimated its mass to be 1.2 × 1012 M

and

the distance of its center to Earth 8.33 kpc. Figure 2.9 and Figure 2.10 show the resulting sky map of the Milky Way for decaying and annihilating dark matter respectively. Total J-factor and the number of observed signal and background photons are quoted in table 2.3. To calculate the expected num-ber of signal photons Nγ,swe use equation (2.30) and assume that mχ = 100

GeVand hσvi = 3 × 10−26cm3s−1

in addition to the spectral exposure men-tioned earlier. For the decay rate of dark matter we take the typical value of Γ = 10−26s−1.

Euclidean coordinates

Using HEALPix coordinates makes it possible to nicely visualize the entire sky at once. Unfortunately it turns out to be a bit too slow for some of the applications that are yet to come in chapters 3 & 4. It will be much faster to use a regular flat Euclidean map, and that is what we will, in fact, be using most of the time. The problem is of course that Euclidean mapping ignores the curvature of the sky because we need to approximate the solid angle of every pixel by,

(25)

FIGURE2.7: The intensity of extragalactic γ-rays as a function of energy, separated into categories by type of source. The to-tal extragalactic energy spectrum as measured by Fermi-LAT is given by the red dots. We adopt these measurements as the

total extragalactic γ-ray background intensity.[33]

∆Ω = (4

π)2π(1 − cos( θp

2)) , (2.43)

where θp is the opening angle of the center of a pixel directly adjacent to the

center of the halo. As a consequence the Euclidean map can only be reliably used when we consider a small part (θ < 10◦) of the full sky. Luckily, dark matter annihilation is, due to its squared dependency on density, by far the largest in the very center of the halo. We can thus make a map of the center part of the halo using Euclidean coordinates while still getting the major part of the total signal, and without getting unreasonably large deformations of the picture. Figures 2.11 and 2.12 show the resulting Euclidean sky-maps for decaying and annihilating dark matter respectively, taking the SMC as a target.

(26)

TABLE2.3: Total J-factor and photon count for Figures 2.9 - 2.12. Nγ,sand Nγ,bare the total number of observed signal and

back-ground photons respectively in the energy range 0.2 - 600 GeV, assuming a pure observation time on target tobs = 6 yearsand

an effective area of Fermi-LAT Aef f = 0.9 m2.

Target Total J-factor Nγ,s Nγ,b

Milky Way (Decay) 1.57 × 109 M

kpc−2 2.7 × 108 1.0 × 108

Milky Way (Ann.) 1.56 × 1016 M2 kpc−5 1.5 × 106 1.0 × 108

SMC (Decay) 2.05 × 105 M

kpc−2 3.5 × 104 3.2 × 104

SMC (Ann.) 2.59 × 1012 M2

kpc−5 2.5 × 102 3.2 × 104

FIGURE 2.8: The J-factors of a pixel with a solid angle ∆Ω = 10−6srand an opening angle θ as used in the HEALPix maps of the Milky Way. The top graph shows the result for decay-ing dark matter, while the bottom graph shows the result for

(27)

FIGURE2.9: HEALPix sky-map of the Milky Way for decaying dark matter.

FIGURE2.10: HEALPix sky-map of the Milky Way for annihi-lating dark matter.

(28)

FIGURE 2.11: Euclidean sky-map of the Small Magellanic Cloud for decaying dark matter.

FIGURE 2.12: Euclidean sky-map of the Small Magellanic

(29)

Chapter 3

Statistical Framework and Data

Simulation

In the previous chapter we saw how we can describe the photon flux we would observe from an accumulation of dark matter if it would annihilate at some rate. As we saw this rate depends on the velocity averaged cross section hσvi of the particles, the true value of which is unknown since we have yet to figure out the physical nature of dark matter. We could however get an estimator on hσvi in terms of the dark matter mass if we were to measure the photon count originating from the dark matter halo under consideration. An excess amount of γ-rays would be an indication of the presence of dark matter.

In analyzing such an experiment it would be critical to make a distinction between the actual signal photons and the background γ-rays coming from a variety of other sources. That is, we measure a variable λ that we hypothesize consists of two parts:

λ(θ) = Nbkg + Nsig(θ) . (3.1)

Nbkgand Nsigare the number of background and signal photons respectively.

In our case the parameter θ = hσvi, controlling the interaction cross section of the particles and thereby the strength of the signal. Also, we know from equation (2.30) that we expect zero signal photons if hσvi = 0. We would like to distinguish between two hypotheses:

Null hypothesis: There is no dark matter signal, i.e. Nsig(θ) = 0,

im-plying that hσvi = 0.

Alternative hypothesis: There is a dark matter signal, i.e. Nsig(θ) > 0,

implying that hσvi > 0.

As such, if we have a data set x = (x1, x2, x3, ..., xn)we want to know if it is

(30)

defined confidence level α, suggesting that Nsig(θ) > 0and thereby

corrobo-rating the existence of dark matter. This procedure can be relatively straight-forward, but as we shall see some complications arise when Nsig(θ) << Nbkg,

which turns out to be the case in the situation we are interested in, as can be seen in Table 2.3. The extragalactic γ-ray background flux is two orders of magnitude larger then the expected annihilation signal from the SMC. When dealing with such a scenario it may proof impossible to refute the null hy-pothesis, while there is still significant suspicion that the alternative hypoth-esis is the one that is correct. Fortunately, it would then still be a possibility to calculate a confidence interval for the parameter θ, even though no statis-tically significant detection was made.

We can basically assume the null hypothesis cannot be rejected at present, since considerable efforts have been made by others, but to no avail. The rare case reporting an excess of γ-rays[13][14] could in hindsight be explained by other sources.[15] Nevertheless, we can still generate some mock data from the extragalactic background only, using it to constrain the expected values of hσvi. This assumes that the true value of hσvi is hiding in the statistical noise from the background, and indeed if this was not the case a discovery would have already been made. This allows for the investigation of the necessary sensitivity for future observational instruments to make a discovery.

Keep in mind however that this is not our main goal. We calculate these up-per limits by traditional means so we have a benchmark we can use in chap-ter 4. In this chapchap-ter we shall investigate how we should go about setting confidence intervals by means of an example. Afterwards we show how we set up a Monte Carlo simulation to calculate one sided confidence intervals on hσvi using generated mock data.

The experiment described above is in principle a counting experiment, we measure the number of photons λ hitting our detector in a time interval t. Therefore the general approach is to utilize Poisson statistics to define a like-lihood function that will in turn quantify the probability of obtaining a given data set as a function of θ. Subsequently we obtain the maximum likeli-hood estimator (MLE) and construct from this the maximum likelilikeli-hood ratio (MLR), a test statistic we shall use for constructing confidence intervals.

3.1

Poisson Statistics

The Poisson distribution is used whenever we wish to analyze counting ex-periments. Both the mathematical and graphical shape of the Poisson distri-bution are most likely familiar to you, but it may prove useful to take a closer look at how it is derived before we dive deeper into the subject. We follow the approach by Glen Cowan.[35]

Suppose we set up a detector that collects photons from a particular region of the sky, and that on average we observe λ photons in a time t. Furthermore,

(31)

suppose that we break up this time interval into smaller non-overlapping intervals δt. If we choose δt small enough the chance of more than one photon being observed in that interval is completely negligible. In one such sub-interval we will observe a photon with a probability

P (1; δt) = ν δt , (3.2)

where ν is a constant given by

ν = λ

t , (3.3)

while the probability to observe no photons is

P (0; δt) = 1 − ν δt . (3.4)

In the equations above, and henceforth, we use the convention that a semi-colon in the function definition separates arguments that are variables (on the left) from arguments that are parameters (on the right).

Before going any further with the derivation of the Poisson probability distri-bution function (PDF) it is important that we state the assumptions that we make in order to do so:

1. When we make δt very small the probability of observing a photon in that time interval is proportional to δt.

2. The probability of observing more than one photon in a time interval δt is negligible when we make δt sufficiently small.

3. The number of photons observed in one time interval δt is independent from the observation of photons in any other time interval, where we choose the time intervals such that they don’t overlap.

What we wish to know is what the probability is that we observe n photons in a time t when the long term average is λ. The approach in deriving this is to first find the probability that we find zero photons in this time interval, P (0; t), and then generalize by induction.

Suppose that we did know what P (0; t) was, then what is the probability of finding zero photons in a time t + δt? Because the time intervals are indepen-dent this is just the product:

P (0; t + δt) = P (0; t)P (0; δt) , (3.5)

P (0; t + δt) = P (0; t)(1 − ν δt) . (3.6)

(32)

P (0; t + δt) − P (0; t)

δt = −νP (0; t) , (3.7)

such that in the limit δt → 0 we obtain a differential equation dP (0; t)

dt = −νP (0; t) , (3.8)

with solution,

P (0; t) = C1e−νt. (3.9)

The boundary condition is given by P (0; 0) = 1, i.e. the probability of ob-serving zero events in zero time is equal to unity. This immediately shows that the constant C1 has to be equal to one as well.

Now that we know the probability of observing zero events in a time t + δt we can consider the probability of finding k events,

P (k; t + δt) = P (k; t)P (0; δt) + P (k − 1; t)P (1; δt) ,

= P (k; t)(1 − νδt) + P (k − 1; t)νδt . (3.10) No more than two terms have to be considered since, according to our as-sumption, there can be no more than 1 event in a time interval δt. Analogous to the situation where n = 0 this defines a differential equation in the limit where δt → 0:

dP (k; t)

dt + νP (k; t) = νP (k − 1; t) . (3.11)

A method to solve (3.11) is to define an integration factor f (t) = eνt with

which we multiply the differential equation before integrating:

f (t) dP (k; t) dt + νP (k; t)  = f (t)νP (k − 1; t) , (3.12) Z eνt dP (k; t) dt + νP (k; t)  dt = Z eνtνP (k − 1; t)dt . (3.13) By the product rule we recognize

d dt e

νtP (k; t) = eνtdP (k; t)

dt + e

νtνP (k; t) , (3.14)

(33)

eνtP (k; t) = Z eνtνP (k − 1; t)dt , (3.15) or d dt e νtP (k; t) = eνt νP (k − 1; t) . (3.16)

This can be used with for example k = 1 to obtain

d dt e νtP (1; t) = eνtνP (0; t) , = C1νeνte−νt, = ν , (3.17)

where we substituted equation (3.9) for P (0; t) and remember that C1 = 1.

Integration gives:

eνtP (1; t) = Z

νdt = νt + C2 (3.18)

The probability of finding 1 event in zero time is zero. Thus the boundary condition must be P (1; 0) = 0, and we see that C2 = 0. Therefore we find that

the probability of observing 1 photon in a time t when on average we expect λ = νtphotons is

P (1; t) = νte−νt. (3.19)

We will prove by induction that this result generalizes to arbitrary k as

P (k; t) = (νt)

k

k! e

−νt

, (3.20)

for which we have already proven the cases k = 0 and k = 1. Take the differential equation (3.16) with k + 1 on the left hand side and substitute (3.20) on the right hand side:

d dt e νtP (k + 1; t) = νeνt(νt) k k! e −νt = ν(νt) k k (3.21)

(34)

Integrating both sides leads to

eνtP (k + 1; t) = (νt)

k+1

(k + 1)! + C3, (3.22)

where once again the boundary condition P (k +1; 0) tells us that C3 = 0. This

proves that (3.20) is true for any k ≥ 0. Recall that νt = λ:

P (k; λ) = λ

k

k!e

−λ

, (3.23)

which is the familiar Poisson PDF. This distribution is normalized and has a mean and standard deviation

hki = λ , (3.24)

σ =√λ , (3.25)

although we will not prove this here.

3.2

The likelihood function

Returning to our counting experiment, suppose that the parameter λ is ac-tually the frequency with which some process occurs on average in a given time frame, e.g. photons observed within a years time. Then (3.23) represents the chance that you actually measure k events in a random year. Now sup-pose we have n such variables, X = (X1, X2, X3, ..., Xn), and those variables

are independent and distributed identically according to (3.23) subject to a known value of θ, then their joint pdf is given by the product[36]

P (X|λ(θ)) = n Y i=1 λ(θ)Xie−λ(θ) Xi! . (3.26)

But, in an experiment we often don’t know the value of θ determining the underlying probability distribution, leaving us to estimate θ from the data it-self. For this purpose we define the likelihood function L(λ(θ)|x). If we have a specific realization of the n variables, i.e. a data set x = (x1, x2, x3, ..., xn),

then the likelihood of obtaining precisely these measurements is given by[36]

L(λ(θ)|x) = n Y i=1 λ(θ)xie−λ(θ) xi! , (3.27)

(35)

L(λ(θ)|x) = λ(θ)

Pn

i=1xi e−nλ(θ)

x1!x2!x3! ... xn!

, (3.28)

where λ(θ) is determined by the physical model, represented by equation (2.30), while xi corresponds to the ith measurement. The model is

depen-dent on a single parameter θ, which happens to be hσvi in our case, so that it can be denoted by λ(θ). Notice that, although the joint pdf and the likeli-hood function appear similar, their interpretations are completely different. Indeed, the joint pdf is a function of the data and as such it is normalized, while the likelihood is a function of θ which means that the integral over all realizations of θ does not in general equal unity.

It is customary to take the natural logarithm of the likelihood function, which is convenient for multiple reasons. First of all it offers computational ease when calculating derivatives since the product in (3.27) is converted into a sum. Secondly, when approximating the shape of the likelihood function by means of a Taylor series we obtain a quadratic for the regular likelihood and a Gaussian for the log-likelihood, the latter being a better approximation of a general peak shape. Taking the logarithm of (3.28):

l(λ(θ)|x) = ln [L(λ(θ)|x)] =

n

X

i=1

[xilog(λ(θ)) − λ(θ) − ln xi!] . (3.29)

Notice that the third term in (3.29) is simply a constant. As such it is not of any particular significance for we are usually just interested in the extrema of the log-likelihood function, which are left unchanged by any added con-stants. Typically the shape of the log-likelihood function resembles an in-verted bowl. To be more precise, when we consider only a single parame-ter then, in the limit of large sample size, the log-likelihood approximates a quadratic function that sharply peaks around its maximum.

Now that we have established the likelihood function associated with Pois-son statistics we have two ways of using it to calculate the most probable value of the model parameter θ. These correspond to the two main schools of statistics, the Bayesian and Frequentist inference, of which we will choose to adopt the latter. Taking basic conditional probability math as a starting point:

P (A&B) = P (A|B)P (B) = P (B|A)P (A) , (3.30) so that for a model λ(θ) and dataset x

P (λ(θ)|x) = P (x|λ(θ))P (λ(θ))

(36)

which is known as Bayes’ theorem. It relates the probability of the model be-ing true, given the data, to the probability of obtainbe-ing the data when given that model. Here, P (x|λ(θ)) is a specific realization of the joint pdf in equa-tion (3.26) and P (λ(θ)|x) is given by the likelihood funcequa-tion that we derived earlier to be (3.27).

Frequentist inference

The Frequentist approach to statistical inference is to hypothesize that in re-ality there exists an unknown, but fixed, parameter θ. In other words there is a single true value. As such, for a frequentist, although P (λ(θ)) and P (x) in (3.31) are unknown, they are constant and can be treated as such. Therefore we can simply state that

P (λ(θ)|x) = L(λ(θ)|x) ∝ P (x|λ(θ)) . (3.32)

Hence, we will be able to find the most likely value of θ associated with a measured dataset by maximizing the likelihood function, since this also maximizes the probability to obtain that dataset for a given value of θ. In a practical sense this is a restatement of the principle of maximum likelihood, which states exactly that. Since the logarithm is a monotonically increasing function the natural logarithm of the likelihood has the same extrema, mak-ing it equivalent to maximize the log-likelihood function. We thus define the maximum likelihood estimator (MLE) ˆθM LE to be[36]

max ln L(λ(θ)|x) = ln L(λ(ˆθM LE)|x) . (3.33)

Lets explicitly calculate the MLE for the Poisson distribution, ∂ ∂θl(λ(θ)|x) = ∂l(λ(θ)|x) ∂λ ∂λ(θ) ∂θ , = ∂ ∂λ " n X i=1 (xilog(λ) − λ − ln xi!) # ∂λ(θ) ∂θ , = " n X i=1 xi λ(θ)− n # ∂λ(θ) ∂θ . (3.34)

Set to 0 in order to find the maximum,

0 = " n X i=1 xi λ(ˆθM LE) − n # ∂λ(ˆθM LE) ∂ ˆθM LE , λ(ˆθM LE) = (n ∂λ(ˆθM LE) ∂ ˆθM LE )−1 n X i=1 xi. (3.35)

(37)

However, we have to keep in mind that When analyzing sky-maps, since every pixel has a different contribution from Nsig, we will have to adapt the

log-likelihood such that every measurement comes from a different Poisson distribution. In other words the distribution parameter λi is different for

every measurement xi, and the sum is over all the N pixels instead of the n

measurements. This results in the log-likelihood

l(θ|x) = N X i=1 (xilog(λi(θ)) − λi(θ) − ln xi!) , (3.36) with an MLE N X i=1 " ( xi λi(ˆθM LE) − 1)∂λi(ˆθM LE) ∂ ˆθM LE # = 0 , (3.37) N X i=1 xi λi(ˆθM LE) = N X i=1 ∂λi(ˆθM LE) ∂ ˆθM LE , (3.38) where λi(ˆθM LE) = Nbkg + Nsig,i(ˆθM LE) , = Nbkg + tobsAef f Jann,iθˆM LE 8πm2 χ Z dE dN dEγ , = Nbkg + Jann,iθˆM LE C , (3.39) and ∂λi(ˆθM LE) ∂ ˆθM LE = Jann,iC . (3.40)

Here we collected some of the factors in a single constant C: C = tobsAef f 8πm2 χ Z dEdN dEγ . (3.41)

For the the maximum likelihood estimator ˆθM LE we then find n X i=1 xi Nbkg + Jann,iC ˆθM LE = n X i=1 Jann,iC , (3.42)

(38)

Performance

Since the maximum likelihood estimator is exactly that, an estimator, it is important to consider how accurate it is. Like all statistical estimators the MLE has two important properties that determine this. First, we can ask whether or not the estimator is unbiased, meaning that it has an expectation value that is precisely equal to the true mean of the parameter it is attempting to estimate. Second, we are interested in the efficiency of the estimator, i.e. how large its spread is about the mean. We can imagine that an unbiased estimator can still have a very large spread, which is less desirable than one with low spread.

The best conceivable estimator is thus the one that is both unbiased and has the smallest possible variance. Such an estimator is referred to as the minimum-variance unbiased (MVU) estimator. Is the MLE an MVU estima-tor for the Poisson distribution? As it turns out the method of maximum likelihood does not guaranty that the resulting estimator is unbiased. Un-der certain regularity conditions however it does guaranty that, if the MLE is shown to be unbiased, that it is also of minimum variance.[37]

Theorem 1 If a statistic Λ(x) is an unbiased estimator of a parameter θ and it is a function of the MLE ˆθM LE of θ, then Λ(x) is a Minimum

Variance Unbiased (MVU) estimator of that parameter.

Corollary 1 If the MLE ˆθM LE is an unbiased estimator of the

parame-ter θ, then it is a MVU estimator of that parameparame-ter.

We will be able to show its validity in chapter 4, but for now we simply assume it is correct. We need to investigate the expectation value of the MLE and show that it is equal to the mean:

E[∂l(λ|x) ∂λ ] = 0 , (3.43) E[ n X i=1 (xi λ − 1)] = 0 , 1 λE[ n X i=1 xi] = n , 1 λE[x1+ x2+ x3+ ... + xn] = n , 1 λ(nhki) = n ,

(39)

hki = λ ,

showing that the MLE is unbiased. Hence according to Theorem 1 it is also of minimum variance, making it an MVU estimator.

3.3

Confidence Intervals

Imagine that we have a dataset x and that from this dataset we constructed the log-likelihood function as described above. By maximizing this likeli-hood function we find its peak, which according to the principle of maximum likelihood corresponds to the most probable value for the model parameter θ. But, if we are testing one hypothesis against another this is not necessarily sufficient to draw a conclusion. Consider for instance the (fabricated) data in Figure 3.1, represented in a histogram. We see a more or less uniform back-ground of photon counts over the full energy range in addition to what may or may not be a small signal at 40 GeV.

FIGURE3.1: Randomly generated data drawn from a uniform distribution. The peak at 40 GeV may be the result of random fluctuations. On the other hand, it could be caused by a

legiti-mate signal.

The data represents a sample from the total population of γ-rays that, in a sta-tistical sense, contain the true value of θ. The associated log-likelihood func-tion may look something like the parabola shown in Figure 3.2. Although

(40)

it would be straightforward to obtain from this data the MLE ˆθM LE, it is far

from clear if we can claim a signal exists or if we have to pass it off as the re-sult of an unusually large statistical fluctuation of the background. Assessing the likelihood function it seems that the variance is too large to draw a solid conclusion. The likelihood that the true value of θ is zero is still quite large, even though it is smaller than the probability of ˆθM LE. This inconclusiveness

is due to the fact that the signal component is small compared to the back-ground. We need additional statistical concepts to quantify the significance of the MLE compared to the null hypothesis, mainly that of a test statistic.

FIGURE 3.2: An example of the log-likelihood function one might obtain from the data in Figure 3.1. The MLE corresponds

to the top of this function.

Test statistics, Significance and Power

A test statistic (TS) is some function of the data (or ’sample’) that returns the value of a property of that data. For example, in the case of a Z-test, which is applicable whenever the TS is normal distributed, this property is how far the sample mean is located from the population mean measured in units of the standard deviation of the sample. If the test statistic is properly normalized we can calculate the probability of obtaining said value for the TS under the assumption that the null hypothesis is true. Consequently we can decide if we find this probability unreasonably small, so that the null hypothesis is discredited and can be considered false.

As was mentioned before the problem of indecisive data is likely to occur when the signal component is small compared to the background, while we

(41)

were unable to acquire a sufficiently large dataset due to lack of time or re-sources. In this scenario we obtain test statistics for H0 and H1 that largely

overlap, due to their predicted probabilities for the parameter θ lying so close together. That makes the probability of a type II statistical error, incorrectly retaining the null hypothesis, very large. As to make this a little more intu-itive a visual representation of this problem is shown in Figure 3.3.

FIGURE 3.3: Two normal distributions representing the test statistics one might obtain from the null hypothesis and the al-ternative hypothesis. The observed parameter value from the data in Figure 3.1 is given by θobs while the significance level of

α = 0.05corresponds to θ = θcut

Let us imagine we constructed two arbitrary test statistics, one assuming that there is only background (H0) and one assuming there is a signal such that

the true value of θ has some non-zero value (H1). Now suppose that we infer

from the data that the maximum likelihood estimate for θ, corresponding to the 40 GeV bump, is some value ˆθM LE = θobs. Let the resulting situation be as

in Figure 3.3. In this case, the observed value of the model parameter appears to be consistent with both hypotheses, even without explicit calculation. The blue shaded region represents the probability of finding a value larger then or equal to θobs assuming the null hypothesis is correct. Judged purely on a

qualitative level this probability seems to be too large to convincingly reject the null hypothesis of background only.

To make this quantitative we need to decide on a significance level so that we can decide whether or not we can reject the null hypothesis. The significance level α defines a decision boundary θcut in the parameter range below which

(42)

α = Z ∞

θcut

g(θ|H0)dθ . (3.44)

For now we will choose α = 0.05, or roughly 2 standard deviations, as indi-cated in Figure 3.3 by θcut. In the given scenario the probability of finding θobs

is larger then the chosen significance level and H0 cannot be rejected.

Mean-while there is still a good chance to find a value of θobsor smaller assuming H1

is true, which is represented by the orange shaded region. This is the proba-bility of a type II statistical error, incorrectly refuting the null hypothesis, and it is given by[36]

β = Z θcut

−∞

g(θ|H1)dθ . (3.45)

An associated quantity is the power of the statistical test, 1 − β, which is a measure of the ability to discriminate against the alternative hypothesis. No-tice that even if P (θobs|H0) ≤ 0.05then, in our case, β is very large and hence

the statistical power is low. Therefore it is insufficient to merely consider if P (θobs|H0) ≤ 0.05, we also need to keep in mind the power of a statistical test.

This brings us to the usefulness of setting upper limits over attempting to re-ject the null hypothesis, which may fail or it may succeed with low statistical power. Instead we calculate a region of parameter space the true value is likely to lie within with a specified probability, e.g. a confidence level of 95%.

Maximum Likelihood Ratio

In the previous example we were forced to retain the null hypothesis, while there might very well be an actual signal hiding in the statistical noise. As it turns out this is exactly what we are dealing with in our real life situation, a potential dark matter signal on top of a γ-ray background, since the sig-nal component is expected to be a great deal smaller than the background component. On top of that the number of background photons observed in a typical time frame (∼ years) is already quite small to begin with (∼ 10 - 100). For this reason, the detection of a dark matter signal as well as rigorous fal-sification has so far stayed out. But, such a scenario does not imply we have nothing of interest to say at all. From the fact that we did not reject H0 we

can conclude that either there is no signal component or, if their is one, the relevant parameter does not exceed a given value. The latter option is what we refer to as setting an upper limit on the parameter. We can thus set an upper limit on hσvi even in the absence of a detection.

In order to set an upper limit on a model parameter we need to be able to test how well the data is supported by the alternative hypothesis as opposed to the null hypothesis. Hence we need to define a statistic that can be used for hypothesis testing, that is, a test statistic. For simple hypotheses a possible statistic can be constructed from the likelihood ratio (LR)[38]

(43)

ΛLR(x) = ln

 L(λ(θH1)|x)

L(λ(θH0)|x)



. (3.46)

Larger values of the LR provide stronger evidence against H0. If the ratio

is equal to x this means the data favors H1 by a factor of ex. However, our

alternative hypothesis hσvi > 0 is not simple, but composite and we need to adjust the likelihood ratio accordingly. Our alternative hypothesis specifies that θ lives in a set Θ1 where Θ1 = h0, ∞i, while the null hypothesis specifies

that θ lives in a set Θ2 = {0}. We thus define the maximum likelihood ratio

(MLR) as[38] ΛM LR(x) = ln  maxθ∈Θ1L(λ(θH1)|x) maxθ∈Θ2L(λ(θH0)|x)  . (3.47)

The denominator represents the likelihood of a particular value of the pa-rameter we are testing, while the numerator denotes the likelihood of the MLE. Observe that when we test precisely the value given by the MLE, the maximum likelihood ratio will be equal to zero. When we test a value other the the MLE the ratio within the natural logarithm will always be larger than one, making the maximum likelihood ratio always larger than zero. The MLR thus has a minimum at the MLE where it is equal to zero.

Neyman-Pearson lemma

What justifies the adoption of the likelihood ratio as a test statistic when we could have chosen any test statistic? This is motivated by the Neyman-Pearson lemma for simple hypotheses which we state without proof.[39] Consider a random sample x = (x1, x2, x3, ..., xn)drawn from a distribution

with parameter θ, where θ ∈ {Θ1, Θ2}, and let the likelihood function be

L(λ(θ)|x). Additionally let C1 be the critical region of the test statistic such

that if x ∈ {C1} then H0 can be rejected at a significance level α, and let C0 be

the region complement to C1. Then the following is true:

Lemma 1 (Neyman-Pearson lemma) If at a significance level α there ex-ists a constant k such that:

Λ(x) = L(θ0|x) L(θ1|x) ≤ k for each x ∈ C1, P (Λ(x) ≤ k|H0) = α , Λ(x) = L(θ0|x) L(θ1|x) > k for each x ∈ C0,

Referenties

GERELATEERDE DOCUMENTEN

Zola zou een zelfingenomen bourgeois zijn geworden, bij wie de echte kunstenaar Cézanne zich niet meer welkom voelde. Ook destijds kon de politiek een vriendschap verpesten, zij

The residual power spectra after GPR foreground removal and noise bias subtraction are dominated by an excess power that is in part spectrally and temporally (i.e. between

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of

Upper bound on the expected size of intrinsic ball Citation for published version (APA):..

Keywords: Critical percolation; high-dimensional percolation; triangle condition; chemical dis- tance; intrinsic

Word deletions, insertions and substitutions Having a manual alignment of similar words in both sentences allows us to simply deduce word deletions, substitutions and insertions,

The courts should therefore not adopt a narrow, formalistic textual analysis to the relevant socio-economic rights provisions when developing the common law in terms of section