• No results found

Inference of GC point-source luminosity function under entropy loss

N/A
N/A
Protected

Academic year: 2021

Share "Inference of GC point-source luminosity function under entropy loss"

Copied!
46
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Master thesis Theoretical Physics

Inference of GC point-source

luminosity function under

en-tropy loss

Abel Stam

July 26, 2020

(2)

Abstract

Astrophysical images are generally composed of a diffuse background signal and point like signals. These physical signals are subject to distortions by photon shot noise and the instruments point spread function (IPSF). The latter poses complications when inferring the luminosity function of a source population. Due to the distortions and diffuse background, faint sources are underestimated which results in a bias in the luminosity function. To com-pensate for the underestimation of faint point sources, this thesis will introduce a weighted kth nearest neighbour entropy loss term under which the luminosity function is inferred. This entropy loss term is applied to models with increasingly complex distortions. This approach can provide better confidence intervals for simplistic models compared to models without the entropy loss. While simple models seem to benefit from the entropy loss, getting a strong two sided confidence region for the luminosity parameter remains non-trivial and strongly depends on the chosen entropy loss weights and model complexity.

(3)

Contents

1 Introduction 5

2 Galactic center 6

2.1 Morphology of CMZ . . . 6

2.2 Diffuse γ-ray components in the GC . . . 7

2.2.1 Diffuse HE emission . . . 7

2.2.2 Very high energy components . . . 10

2.3 Point-like components in the GC . . . 11

2.3.1 Pulsars . . . 11

2.3.2 The galactic nuclei . . . 12

2.3.3 EGRET catalogue . . . 13

2.3.4 Fermi-LAT catalogue . . . 13

2.4 Hints of something more . . . 13

3 Statistics and entropy 16 3.1 The two philosophies of inferential statistics . . . 16

3.2 Bayes’ theorem . . . 16

3.3 Bayesian inference . . . 17

3.3.1 Maximum a posteriori estimation . . . 17

3.3.2 A physics example . . . 17

3.4 Likelihood principle . . . 18

3.4.1 Binomial likelihood function . . . 18

3.4.2 Maximum likelihood estimator . . . 18

3.5 Wilks’ theorem . . . 19

3.6 Entropy . . . 20

3.6.1 Shannon Entropy . . . 20

3.6.2 Information of heads and tails . . . 21

3.6.3 Kozachenko-Leonenko entropy estimation . . . 21

3.6.4 Weighted KL entropy estimation . . . 23

4 Research method 24 4.1 Probabilistic programming models . . . 24

4.1.1 A simple example . . . 25

4.2 Model components . . . 25

4.2.1 Mock data . . . 25

4.2.2 The model . . . 25

4.3 Obtaining statistical results . . . 26

5 Results 28 5.1 No entropy optimization . . . 28

5.2 Tuning entropy weights . . . 29

5.3 Applying IPSF . . . 31

(4)

5.5 Discussion . . . 32

6 Conclusion 34

6.1 Future research . . . 34

7 Appendix 35

7.1 Synchotron . . . 35 7.2 Comparing entropy weights . . . 35

(5)

CHAPTER 1

Introduction

Astrophysical images are generally composed of a diffuse background signal overlain by point like signals. These physical signals are subject to distortions by photon shot noise and the in-struments point spread function (IPSF). These distortions pose implications for the detection of astrophysical objects as well as inferring physical properties from the image. The detec-tion of faint point sources is especially hard due to the Poisson nature of photon shot noise. The convolution due to the IPSF will smooth out these signals, resulting in less discrimi-nating power between point-source and diffuse background signals. One research area where the latter poses implications is in the modeling of the Galactic center (GC) gamma-ray (GR) signal. GR radiation in the direction of the GC is composed of diffuse and point-like sig-nals. Diffuse components include Cosmic ray interaction with interstellar gas and radiation fields. Point-like signals contributing include pulsars, binary systems and the galactic nucleus [van Eldik, 2015, Hooper et al., 2018] as well as signals from outside the galaxy. Several re-search papers report an excess on top of the commonly adopted GR models in the GeV band near the GC ([Goodenough and Hooper, 2009, Vitale et al., 2009, Gordon et al., 2013]), know as the Galactic center gamma-ray excess (GCGRE). Currently there are multiple interpretations of the GCGRE. These interpretations include an unresolved population of microsecond pulsars (MSP) ([Bartels et al., 2016], [Petrovic et al., 2015]) as well as weakly interacting massive par-ticles (WIMPs) annihilating or decaying ([Goodenough and Hooper, 2009, Vitale et al., 2009]). In the case of WIMP annihilation the GCGRE is expected to be a completely diffuse signal, whereas an underlying astrophysical source population would have an underlying substructure. Effort has been made to reconstruct the population of point sources ([Selig et al., 2015]), but the current catalogue cannot account for the GCGRE. In [Petrovic et al., 2015] the importance of the luminosity function of the hypothetical source population is emphasized.

The luminosity function for a source population is hard to infer in the presence of a O( ˆF ) diffuse background component, where ˆF is the expected source flux, because of the earlier men-tioned distortions. When inferring the luminosity function using probabilistic models without compensating methods in place, the result might become bias due to the underestimation of sources because they could well be masked by statistical fluctuations. One compelling compen-sating method could be introducing entropy loss when inferring the luminosity function. This entropy loss will act as a repulsive force on degenerate fluxes, effectively increasing the flux dif-ferences of faint point sources. This thesis will research if the added entropy loss can produce better constrained confidence intervals for the luminosity function. Specifically, if the entropy loss allows the inference of a non bias luminosity function in the presence of a diffuse background. Before presenting the results in section 5, the theoretical groundwork will be discussed in sections 2, 3 and 3.6. In section 4 the theoretical framework will be brought into context by presenting the research methods.

(6)

CHAPTER 2

Galactic center

Since the discovery of the galactic nucleus it has been a region of interest because of its extreme nature and relative proximity. At the very center [Balick and Brown, 1974] found that a large unresolved radio emission region was present, Sagittarius A*. This is now widely accepted as being a product of a super massive black hole (SMBH), dictating the dynamics of the inner parsecs of the GC. The mass of the SMBH is established at M = (4.31 ± 0.06|stat + −0.36|R0) · 106M

by [Gillessen et al., 2009], by means of analysing the movements of 28 stars in orbit around Sagittarius A*. The distance to Sagittarius A* is set at R0 = 8.33 ± 0.35 kpc, which relates

an angular distance of 1◦ to approximately 140 pc. This implies the most precise VHE γ-ray telescope, with an angular resolution up to ∼ 0.05◦, is able to resolve point sources that are O(10) pc apart. The density of the interstellar medium (ISM) in the galactic plane in proximity of the GC is an order of magnitude larger than the average density of the ISM. This obscures the region in radiation of the visible wavelength. The ISM is however translucent to high (X-ray and up) and low (low IR) frequencies. This allows for direct observation of high energy (HE) and very high energy (VHE) γ-ray emission coming from the GC, where

10 MeV < HE < 100 GeV < VHE. (2.1)

A caveat of directly observing HE and VHE radiation is that the angular resolution is inferior to the angular resolution of lower energy X-ray and IR telescopes, which can resolve point sources at a distance from 20mpc and 3mpc respectively [van Eldik, 2015]. To generate an understanding of the γ-ray emission regions and sources in the GC, the next section will present a birds eye view of the morphology of the GC. In the subsequent sections contributors to diffuse and point-like components of the observed γ-ray signal from the GC are discussed. In the last section observations pointing toward an excess in the GR spectrum are presented.

2.1

Morphology of CMZ

In figure 2.1 a wide field image of the GC is displayed in λ = 90cm frequency exhibiting a complex ensemble of emission regions. The overall emission is dominated by non-thermal synchotron radiation (appendix 7.1), which suggests the presence of strong magnetic fields. Several super nova remnants (SNR) are identified in figure 2.1 accounting for some of the synchotron radiation from the GC. At the center lies complex electron synchotron radiation emitter [Beckert et al., ] Sagittarius A* which dominates the inner 50pc region. The complex is composed of the compact source Sgr A*, the non-thermal emission region Sgr A East and the ionised hydrogen region Sgr A West which emits a thermal radio spectrum. Chandra X-ray observations were used to propose the SNR nature of Sgr A East by [Maeda et al., ], making Sgr A East an substantial contributor to the HE and VHE emission surrounding Sgr A. Later [Sangwook et al., ] identified the core born from the SN creating Sgr A East as well, increasing the likelihood of the SNR nature of Sgr A East. Sgr A East has a relatively small shell radius compared to other SNR. One likely explanation for this can be the relatively large density of the ISM around Sgr A East slowing down the expansion.

(7)

Figure 2.1: Wide field radio image taken from [LaRosa et al., ].

2.2

Diffuse γ-ray components in the GC

2.2.1

Diffuse HE emission

The EGRET mission contributed to mapping out the HE components of the GC sky around 100MeV. The resulting photon map of 9 years (1990-1999) of observations can be seen in figure 2.3. The theoretical components contributing to the flux as well as the observed flux are illus-trated in figure 2.4. The components include π0 decay, Inverse Compton (IC) scattering and

bremsstrahlung which are illustrated in the figure as NN, IC, EB. Electron bremsstrahlung

From figure 2.4 it can be seen that for energies < 100MeV electron bremsstrahlung is the expected dominant component, which has been postulated by [Fichtel et al., 1976]. Electron bremsstrahlung is radiation as a result of the deceleration of electrons caused by other charged particles. In the process of deceleration the lost energy of the electron will be converted into photons. [Yusef-Zadeh et al., 2012] argues that cosmic ray electrons interacting with the charged particles in the molecular gas around the GC can explain the observed spectrum.

(8)

Figure 2.2: An illustration of the regions contained in the inner 50 pc of the GC from [LaRosa et al., ].

Figure 2.3: All sky raw photon map as observed by EGRET E > 100M eV taken from [Thompson, 2008].

Pion decay

At > 100MeV emission more than 90% of the diffuse γ-ray emission is generated by neutral pion (π0) decay ([Drury, 2012], [Aharonian et al., 2006]). The decaying pions are themselves products of cosmic rays, relativistic protons or nuclei, colliding with ambient material. The leading order GR production due to pion decay comes from the collision of two protons

p + p → π0→ 2γ, (2.2)

a conclusion which has recently been reinforced by comparing the Fermi-LAT measurements to the underlying theoretical models in [?]. The results are displayed in figure 2.2.1. The solid black

(9)

Figure 2.4: Theoretical constructed fluxes for IC, NN, EB and ER compared to the flux observed by EGRET for different energies.

line corresponds to a hybrid model proposed by [Dermer, ]. Inverse Compton scattering

A small contribution to the total diffuse γ-ray signal in 2.4 can be accounted for by IC scattering. IC is the scattering of relativistic electrons with photons. This scattering can boost the photons

(10)

Figure 2.5: Pion flux for different models and as observed by Fermi-LAT from [?]. The solid black line corresponds to the hybrid model proposed in [Dermer, ].

energy in the γ-ray spectrum. The spectrum of the γ radiation is then determined by the energy of the electron and the scattering photon. It was suggested by [Melia et al., 1998] that the scattering between the electrons of Sgr A East and its halo and the UV and IR photons from the nucleus can account for the observed diffuse γ-ray spectrum observed near the GC.

Cosmic rays

Cosmic rays are relativistic bundles of particles permeating the universe. The origin of cosmic rays is still under active research, but the energy source for the cosmic ray acceleration in our galaxy is thought to be supernovae. The argument being that the integrated CR energy in our galaxy, calculated to be (0.7 ± 0.1)1034W by [Strong et al., ], can only be efficiently transformed

into accelerated particles by supernova processes [Drury, 2012]. This indeed seems plausible if a supernova happens every O(10) years and the released mechanical energy of a supernova is O(1044) J. This will put the available power at around 1035 J and so as long as there exists a

mechanism that can convert ∼ 10% of this power into accelerated particles the energy argument cannot be rejected. Indeed several researches have proposed mechanisms that seem to meet this constraint [Axford, 1981]. [Uroˇsevi´c et al., 2019].

2.2.2

Very high energy components

In the VHE regime photons reaching earth become extremely scarce. The expected amount of photons reaching earth from a relatively strong VHE source at the GC is less than 1m−2yr−1. It is because of this VHE photon scarcity that ground-based Imaging atmospheric Cherenkov telescopes, like [H.E.S.S., ], [MAGIC, ] and [VERITAS, ] are used as detectors. These telescopes utilize the atmosphere as a detection medium by observing the Cherenkov radiation as a product of the particle shower generated by the VHE photons and consist of multiple distance separated detector areas. This detection method drastically increases the detection surface to around 104− 105m2. This detection area will on average result in a detection rate O(104− 105)yr1 for

a dedicated detector, yielding a reasonable photon statistic sample. For VHE photon detection of sources within the GC the H.E.S.S. telescope is the most efficient because of the the small angle with the zenith. The energy detection threshold of H.E.S.S is around 10GeV and reaches up to ∼ 10TeV, making it a suitable observatory for VHE γ-rays from the GC. A survey by [Aharonian et al., 2006] of the GC VHE photons reveals two dominant point-like emission regions

(11)

in figure 2.6. When subtracting the point-like sources from the photon map the morphology of the diffuse γ-ray distribution in the GC becomes more pronounced. The emission regions appear to trace the dense molecular cloud illustrated in white in figure 2.7. This could hint that the VHE emission has the same pion decay mechanism as discussed in the previous section. However, this requires cosmic rays with energies up to PeV. Currently no supernova particle acceleration model exist for such high energy cosmic rays. An alternative hypothesis by [Abramowski et al., 2016] states that mechanisms around Sgr A* can be responsible for the cosmic rays generating the VHE emission.

Figure 2.6: VHE photon map of the GC [Aharonian et al., 2006] where the numbers on the axes represent the degrees from the GC.

Figure 2.7: VHE photon map of the GC [Aharonian et al., 2006] after subtracting known point-like structures where the numbers on the axes represent the degrees from the GC. The white lines illustrate the molecular cloud and the black star is positioned at the position of Sgr A*.

2.3

Point-like components in the GC

This section will discuss the possible classes of γ-ray point sources within our galaxy.

2.3.1

Pulsars

Pulsars are neutron stars that are spinning with very small rotational periods. This high fre-quency rotation induces a magnetic field. When charge particles move through this magnetic field, beams of radiation are emitted from the magnetic poles. During the core collapse af-ter a SN, conservation of angular momentum causes the rotation frequency of the core to in-crease. Because of the extremely small radius O(10)km and the typical mass of O(MJ) of

(12)

a neutron star, rotation frequencies can reach up to 716Hz [Hessels, 2006]. A special kind of Pulsar that are relevant in the context of the GCGRE are Millisecond Pulsars (MS), due to the spectral similarity [Bartels et al., 2016]. MPs are pulsars which have a rotation frequency smaller than around 40 millisecond. The origin of MPs has been studies a lot over the years (see [Bhattacharya and van den Heuvel, 1991] for example) and is thought to be a product of a binary system. The original pulsar coexisted with a neighbouring star, which was used as a donor by the pulsar. In the process of accretion the mass of the pulsar increases and hence the rotation frequency.

2.3.2

The galactic nuclei

As mentioned in the introduction of this section, the galactic nuclei has been the center of a lot of research since its discovery as a radio emitter. With the emergence of Image Atmospheric Cherenkov telescopes the region was first actively researched in the VHE spectrum, but it took until 2004 for a significant detection of VHE γ-ray signals from the direction of Sgr A* (HESS J1745-290) by [Kosack et al., 2004], [Tsuchiya et al., 2004] and [Aharonian et al., 2004]. Figure 2.6 clearly illustrates a bright VHE point source in the direction of Sgr A* as observed by the H.E.S.S. telescope. After two years of observations a clear energy spectrum of the source was constructed. The results from different detectors are collected in figure 2.8. Observations of the region in HE by the Fermi-LAT have resulted in the detection of a HE radiator in the direction of Sgr A* as well. This source, appearing in the Fermi-LAT catalogue as 2FGL J1745.6-2858, matches with the location and spectrum of HESS J1745-290 as can be seen by comparing figures 2.8 and 2.9. Note the conversion factor of 1erg ∼ 0.62TeV. Although the two sources seem to have partly overlapping spectra [?], models that explain both spectra as one are not widely available. Recent studies ([?], [Linden et al., 2012]) suggest that PeV hadronic interactions may produce the observed spectrum, but are still limited by the angular resolution of the observations.

Figure 2.8: Compilation of the energy spectrum of

Since it is not yet clear if there is a common source for both the HE and the VHE spectrum, most models have been based on the VHE spectrum. As with the process underlying the emission, the source responsible for the accelerating of the particles has not been resolved as well. As discussed in section 2.1, the Sgr A complex complex contains three objects that may well be associated with, at least part of, the observed VHE flux. This in combination with the modest angular resolution of γ-ray observations complicates confirming a single theory. Although most observations seem in favour of Sgr A* as the main contributor to the observed γ-ray signal, models exist where SNR Sgr A East and PWN G359.95-0.04 contribute to the signal as well. Research has disfavoured SNR Sgr A East as main contributor at a significance of 3.9 in [Acero, 2010] by

(13)

Figure 2.9: Energy spectrum distribution of point source 2FGL J1745.6-2858 at the GC as detected by Fermi-LAT.

comparison of the best-fit position for the three possible sources, leaving PWN G359.95-0.04 and Sgr A*. The contribution of PWN G359.95-0.04 remains plausible, but arguments from ?? state that if both the HE and the VHE spectra should come from the same source, a PWN scenario can likely be excluded. As it stands now Sgr A* is the most likely from the other two alternatives discussed. However, another option is proposed by [Bednarek and Sobczak, 2013]. The latter writing states that the observed signal can be a cumulative effect of a collection of unresolved Micro second pulsars (MSP).

2.3.3

EGRET catalogue

Point-like sources appear on the modeled background as a statistically significant excess. EGRET data was analysed using the likelihood ratio test, see ?? for a detailed overview, to determine the significance of point sources over no source ([Mattox et al., 1996]). For different parts of the sky adequate detection thresholds were used. Within 10 degrees of the galactic plane a threshold of 5 was used to, were as outside this region 4 was. The third EGRET catalogue, il-lustrated in figure 2.10, included 5 pulsars, the Large Magellanic Cloud and 94 sources that may be associated with galactic nuclei. The catalogue also identified 170 sources which cloud not be associated with an astrophysical source at the time. An alternative catalogue has been proposed in [Casandjian, J.-M. and Grenier, I. A., 2008], drastically reducing the amount of unassociated point-sources. especially near the galactic plane, see figure 2.3.3. The alternative diffuse back-ground model that was used did however remove some of the previous confirmed point-sources.

2.3.4

Fermi-LAT catalogue

During the last 12 years the Fermi Large Area Telescope (Fermi-LAT) has been doing observa-tions of the GC in the HE γ-ray spectrum. This satellite telescope improves on the angular and energy resolution of EGRET and can differentiate between the CR background and point-sources using dedicated veto detectors REF. These improvements contributed to the now 1873 detected HE point-sources in the Fermi-LAT catalogue.

2.4

Hints of something more

Since the first publication of the Fermi-LAT results an unaccounted amount of flux seemed to be measured around a few GeV. One of the first to point out the excess could be modeled by self annihilating dark matter was [Goodenough and Hooper, 2009]. The collected data was compared to a model including diffuse background and the expected flux of the source in the galactic nuclei

(14)

Figure 2.10: EGRET point source catalogue

Figure 2.11: EGRET alternative point source catalogue from

[Casandjian, J.-M. and Grenier, I. A., 2008]

as well as the predicted dark matter contribution. The dark matter contribution is calculated as Φγ(Eγ, ψ) = 1 2hσvi dNγ dEγ 1 4πm2 dm Z ρ(r)2dl(ψ)dψ, (2.3)

with < σv > the expected dark matter self annihilation cross-section multiplied by velocity and dNγ

dEγ the spectrum γ-rays for an annihilation. The integral over the squared dark matter

density ρ(r) is taken over the line of sight, with ψ the angle from the galactic center. The best fit with the data for this model was obtained considering a slightly modified NFW halo profile ([Navarro et al., 1996]) with γ = 1.1 and dark matter particle mass of mdm = 28GeV.

This profile assumes that the dark matter density falls of with a radial power law ρ(r) ∝ r−γ near the GC. The fit of this model with the observed data is illustrated in figure 2.12 for the inner .5◦ (left) and the inner 3◦ (right). The cumulative model including the modeled diffuse background (dot-dashed line), the dark matter contribution (dashed line) and the GC point source (dotted line) is illustrated with the solid line. It is obvious that the model very closely resembles the data in the 1 − 10GeV region, however [Goodenough and Hooper, 2009] concluded that the model does not differentiate between dark matter and a possible underlying astrophys-ical background with similar spectral shape. Since then several papers have argued that the

(15)

astrophysical interpretation more likely due to spatial differences in the excess predicted by the NFW density ([Bartels et al., 2017], [Clark et al., 2016]). The support for an, as of yet, unre-solved underlying micro second pulsar (MSP) population has since grown due to publications like [Bartels et al., 2016], [Macias et al., 2018], [Bartels et al., 2018] and [Ploeg et al., 2017]. These publications state that the morphology more closely resembles that of a population of unresolved MPS. One essential ingredient to this hypotheses is the luminosity function that is used to resem-ble the population [Petrovic et al., 2015]. The luminosity function dNdL gives the expected number of MSP when integrated over a luminosity interval. Bearing in mind that a luminosity function used to represent the unresolved MPS in the GC must agree with the observed luminosity func-tion of resolved MPSs in the galactic disk. Research by [Ploeg et al., 2017] has shown that the luminosity function of resolved MSP in the galactic disk could, within the current uncertainties, be used to model the unresolved population in the GC. Despite the efforts made to understand the GCGRE, a firm conclusion remains absent and requires a more resolved image of the GC.

Figure 2.12: NFW halo profile model with γ = 1.1 compared to observed data from [Goodenough and Hooper, 2009]. The cumulative model including the modeled diffuse back-ground (dot-dashed line), the dark matter contribution (dashed line) and the GC point source (dotted line) is illustrated with the solid line.

(16)

CHAPTER 3

Statistics and entropy

In this chapter the statistics that are essential to the research method are discussed. Probabilistic programming languages are build on the theory of Bayesian inference, which is discussed in section 3.2 and 3.3. An important connection between statistics and entropy is discussed in the second half of the chapter.

3.1

The two philosophies of inferential statistics

Inferential statistics allows one to make statements about the properties of a population by observations of only a subset of the population. In inferential statistics the three main goals one might want to achieve are parameter estimation, data prediction or model comparison. Within inferential statistics there are two distinct branches: Frequentist inference and Bayesian inference. Both branches try to achieve the same goals by means of different approaches. The two can be distinguished by the somewhat subtle different interpretation of probability. In the Frequentist interpretation, probability is only assigned to events that are a part of repeatable experiments with a random outcome. The probability of an event will be equal to the long term frequency of the event occurring. When repeating a coin flip experiment for an infinite amount of times the frequency of heads will converge to the frequency of tails, therefor a probability of 0.5 is assigned to observing heads in an experiment. It is important to note that in general the Frequentist approach will not allow to assign probability to a general hypothesis. Contrary to this, the Bayesian interpretation allows using probabilities to represent uncertainties in events or hypothesis that do not satisfy the repeatable experiment constraint. It is perfectly acceptable to assign a probability to an event that is not part of a repeatable experiment. In this assertion the probability is based on the counter stone of the Bayesian interpretation, the prior beliefs. These priors together with observed data contains all necessary information to infer something.

3.2

Bayes’ theorem

The most important concept of Bayesian statistics is Bayes theorem, which converts the observed results from a test into probabilities of the event. If for example one would want to say something about the mean of a population, the frequensist would state that the mean has a fixed value. Because frequensists only assert probability to random distributed values, they would just give you a single number, namely the most likely one given the data. Bayes theorem would however, given the observed data, give you a probability for every possible value of the mean. Bayes theorem can be written as

P (A|B) =P (B|A)P (A)

P (B) (3.1)

with A a hypothesis of any form, B the observed data or evidence, P (A|B) the posterior, P (A) the prior, P (B|A) the likelihood and P (B) the total probability of B or the global likelihood. From 3.1 if follows that only the numerator depends on the hypothesis being tested. Equation

(17)

3.1 defines the posterior distribution as a function of the hypothesis A, considering n mutually exclusive hypothesis for the observed data normalization enforces that

n

X

i=1

P (Ai|B) = 1. (3.2)

For a composite hypothesis the parameters of the hypothesis need to be integrated over the allowed values in the likelihood P (B|Ai)

P (B|Ai|) = Z P (θ|Ai)P (B|Ai, θ)dθ = L(B) (3.3) such that P (Ai|B) = R P (θ|Ai)P (B|Ai, θ)dθP (Ai) P (B) , (3.4)

with P (θ|Ai) the parameter prior and P (B|Ai, θ) the likelihood. Based on equation 3.1 the Bayes

estimate of an unknown parameter becomes the average over the posterior ˆ

θ = Z

θP (θ|X)dθ (3.5)

with now θ the hypothesis and X the data or evidence.

3.3

Bayesian inference

Bayesian inference is the deducting of the posterior using Bayes theorem. Specifically, it assesses probability to competing hypothesis in the light of observed data and the prior believes. It must be noted that the probabilities in Bayes theorem can implicitly depend in prior information I

P (θ|X) = P (X|θ, I)P (θ|I)

P (X, I) , (3.6)

which just encodes prior available information about the hypothesis or data. Frequently the model of interest has multiple parameters. Parameters that are not considered are called nuisance parameters and should be integrated out

P (θ|X, I) = Z

P (θ, φ|X, I)dφ. (3.7)

The process of integrating out these nuisance parameter is called marginalization and the results P (θ|X, I) is called the marginal posterior pdf of θ.

3.3.1

Maximum a posteriori estimation

In Bayesian inference maximum a posteriori (MAP) estimation is the process of identifying a parameter value that is the mode of the posterior distribution. The mode is the parameter value that maximizes the posterior distribution

θmode= arg maxθP (θ|X)

ˆ

θMAP(X) = arg maxθP (θ|X)

ˆ

θMAP(X) = arg maxθ

P (X|θ, I)P (θ|I) P (X|I)

(3.8)

3.3.2

A physics example

A simplified physics variation of a famous application of Bayes theorem could be the following. Assume we are observing a patch of sky 100x100 and that the probability of an observable source in this patch is P = 0.40. The probability of observing a photon in our detector if there is an

(18)

observable source is P = 0.95. On the other hand there is a probability of observing a photon if there is not a source with a probability of P = 0.07. The latter forms our prior beliefs of the problem. Now suppose that the detector measures a photon, what is the probability that there is a source? This question seems like a simple one, but can actually be quite hard to get right. Applying Bayes theorem will help to get the right answer. First note that we are searching for P (A|B), the probability of a source given a detected photon. Then the reverse, the probability of getting a detection when there is a source, is P (A|B) = 0.95. The probability of an observable source in the detector scope is P (A) = 0.4. Which leaves only the probability of a photon being detected P (B) = 0.40 ∗ 0.95 + 0.6 ∗ 0.07 (the probability of a source times the probability of a detected photon plus the probability of no source times the probability of still observing a photon). Filling in equation 3.1

P (A|B) =P (B|A)P (A) P (B) P (A|B) = 0.95 ∗ 0.4 0.40 ∗ 0.95 + 0.6 ∗ 0.07 P (A|B) = 0.38 0.422 P (A|B) = 0.90 (3.9)

yields a probability of 90% that the observed photon actually comes from a source.

3.4

Likelihood principle

The likelihood principle states that given a statistical model sample, all information necessary for inference of model parameters is contained in the likelihood function L(θ|X = x). Specifically, likelihood functions give insight into the likelihood of model parameters given observed data and a model. Considering a sample coming from a distribution with probability distribution function (pdf) p(x, µ, ) then for a specific sample X = x the likelihood function L(µ, σ|X = x) = p(X = x|µ, σ) describes how likely different parameter values for µ, σ are given the data.

3.4.1

Binomial likelihood function

The binomial distribution describes the probability of a sequence of independent experiments which can succeed with probability θ (and thus fail with probability 1 − θ). The pdf of the binomial distribution can be written as

pdf(k, n, p) = n! k!(n − k)!p k(1 − p)n−k pdf(k, n, p) =n k  pk(1 − p)n−k (3.10)

with n the number of experiments, k the number of successes and p the probability of an inde-pendent success. Say a sample of size n = 120 and k = 30 successes comes from a binomial test without prior knowledge of the probability of success. The likelihood function becomes

L(p|n = 120, k = 30) =120 30



p30(1 − p)90. (3.11)

This equation encapsulates how likely values for p are relative to other values. The equation in 3.11 is plotted in figure 3.1 together with another likelihood function with different sample and successes size. From the figure it follows that as the sample size grows the maximum likelihood peak becomes more localized.

3.4.2

Maximum likelihood estimator

The maximum likelihood estimator is described as the parameter that maximizes the likelihood function for a given sample of data. It is obtained by maximizing the likelihood function, so that

(19)

Figure 3.1: Likelihood function of the binomial distribution for n = 12, k = 3 and n = 120, k = 30

.

under the assumed model the data is the most probable, thus ˆ

θ = arg maxθL(θ|X). (3.12)

Note that if flat priors are taken in equation 3.8 map estimation and ml estimation become the same thinguy..

3.5

Wilks’ theorem

Wilks’ theorem provides a statement about the asymmetrical distribution of log-likelihood ratio statistic. The log-likelihood ratio statistic is a ratio between the log-likelihoods under two dif-ferent composite hypothesis, Θ0 and Θa. Under Θa all n model parameters are free and should

be optimized

LΘa = L(ˆθ1, ..., ˆθn|X), (3.13)

whereas Θ0 allows for n − k free parameters and fixes k parameters

LΘ0= L(θ1, ..., θk, ˆθk+1, ..., ˆθn|X). (3.14)

Then the log-likelihood ratio is defined as Λ(θ1, ..., θk) = 2 ln LΘa LΘ0 Λ(θ1, ..., θk) = −2 ln LΘ0 LΘa Λ(θ1, ..., θk) = −2 ln L(θ1, .., θk, ˆθk+1, .., ˆθn|X) L(ˆθ1, .., ˆθn|X) . (3.15)

It is in [Wilks, 1938] that Wilks proves that under the assumption of normality of ˆθi, unbounded

model parameters and in the large sample limit, that for the true model parameters θ1, ..., θk

(20)

Instantiating this for the one dimensional case

Λ(θ|X) = Λ(θ|X) = −2 lnL(θ, ˆθ2, .., ˆθn|X) L(ˆθ1, .., ˆθn|X)

∼ χ2(1), (3.17)

one can construct confidence intervals by minimizing LΘa with respect to all model parameters.

Then repeating the fitting for values of θ around ˆθ. The result of such a process is illustrated in figure 3.2. Confidence intervals for the χ2(1) distribution are included.

Figure 3.2: Confidence intervals for model parameter θ as follows by Wilks’ theorem.

3.6

Entropy

The notion of entropy emerged from the study of thermodynamics. In this context it can be thought to encode the measure of order within a physical state and is written as

S = Kbln(Ω(s)), (3.18)

with kbthe Boltzmann constant and Ω(s) the number of microstates available for a macro state

s. microstates in this context can be interpreted as a redundancy measure for the state it concerns, specifically it counts the amount of underlying configurations that correspond to the same macrostate.

3.6.1

Shannon Entropy

Mid 19th century the concept of entropy was introduced in information theory by Claude Shannon in [Shannon, 1948]. It can be interpreted as a measure of uncertainty or information that can be attributed to the possible outcomes of a random variable. Shannon developed the theory in the context of telecommunication while researching encoding, compressing en transmitting of data.

(21)

In the source coding theorem, that also appeared in [Shannon, 1948], entropy is related to the limit of compression possible without information loss. For a random variable X with outcomes xi and corresponding probability P (xi) the entropy H(X) is given by

H(X) = −X i P (xi) log P (xi) = X i P (xi)I(xi) = hI(X)i (3.19)

with I(xi) the self-information contained in the outcome xi. The notion of self-information

is therefore equal to the logarithm of the inverse probability −log(P (xi)) = log(1/P (xi)) of an

event. When the stochastic process produces an outcome with a low probability the event carries more information than an outcome with a high probability. Shannon proposed entropy in the unit of bits, setting the logarithmic base in 3.19 to 2. For random variables taking continuous values 3.19 needs a slight modification

H(X) = − Z

f (x) log f (x)dx. (3.20)

3.6.2

Information of heads and tails

Uniform probability yields maximum uncertainty and therefore maximum entropy. This state-ment of uniformity coincides with the notion of entropy in physics. Take for instance a set of N particles that can be in a spin up or down state si= ±1. If the multiplicity of the macroscopic

property S = P

isi is considered, then it follows from 3.18 that the entropy is maximized for

N (−1) = N (1) = N/2. This is due to the fact that the multiplicity for such a system is equal to Ω =N

k 

, (3.21)

which is maximized for k = N (−1) = N (1) = N/2. Returning to the statement of uniform probability, one can analyse the information contained in a toss of a coin. Assuming the coin is fair PH= PT = 12 the entropy contained in a single toss equals

H(X) = −X i P (xi) log2P (xi) H(X) = − 1 X i=0 P (xi) log2(xi) H(X) = −21 2log2 1 2 H(X) = 1 bit. (3.22)

Repeating the process for an unfair coin is illustrated in figure 3.3 and will always be smaller than the information generated by tossing a fair coin. From the figure an observation that can be made, in line with the uncertainty interpretation of entropy, is that when there is no uncertainty Ph= 1|0 the entropy is zero.

3.6.3

Kozachenko-Leonenko entropy estimation

With the emergence of entropy of a stochastic process came the estimators. Estimators make predictions on the properties of the underlying stochastic process given a dataset. Estimates of H(X) make predictions of the entropy of the process responsible for creating the data set. These predictions are particularly useful for estimating other properties of the distribution like mutual information of two random vectors. The latter being widely used in, among other fields, machine learning and data analyses. The Kozachenko-Leonenko (KL) entropy estimator proposed in [Kozachenko and Leonenko, 2016] only depends on the distance to the nearest neighbour for every element in the random sample. To give a general definition of the estimator, let X1...Xn

be independent identical random vectors sampled from distribution f on Rd. For every element

in the sample i = 1, ..., n, let

(22)

Figure 3.3: The information contained in a toss of a coin as a function of the fairness [Tran et al., 2015].

be the distance to the nearest neighbour of random vector i. Then the KL entropy estimator is given by ˆ H(X1, ..., Xn) = 1 n n X i=1 log (ρdiVd(n − 1)eγ), (3.24)

where γ is the Euler–Mascheroni constant γ = −R∞

0 e

−xlog xdx ≈ 0.577 and V

d = πd/2/Γ(1 +

d/2) the volume of a d-dimensional unit ball (r=1). Note that the estimator is defined for random samples from a multidimensional distribution and depends only on its nearest neighbours. For a more elaborate discussion on the variance and bias of the estimator in multiple dimensions see [Fournier and Delattre, 2016]. Since [Kozachenko and Leonenko, 2016], a more general form of 3.24 have among others been proposed by [Singh et al., 2003]. This generalizes the nearest neighbour approach to the kth nearest neighbour. Building on the definitions proposed, let X1,i, .., Xn−1,ibe a permutation of the set {X1, ...., Xn} such that |X1,i− Xi| <, ...., |Xn−1,i− Xi|

is the ordered set of nearest neighbours for random vector i. Then defining

ρk,i= |Xk,i− Xi| (3.25)

as the kth nearest neighbour for random vector i, 3.24 generalizes as ˆ H(X1, ..., Xn, k) = 1 n n X i=1 log (ρ d k,iVd(n − 1) eΨ(k) ), (3.26) with Ψ(k) = −γ +Pk−1

j=11/j. Note that if the distribution lives on R, equation 3.26 becomes

ˆ H(x1, ..., xn, k) = 1 n n X i=1

log (ρk,i) + log 2 + log (n − 1) − Ψ(k), (3.27)

and thus ˆ H(x1, ..., xn, k) ∼ 1 n n X i=1 log (ρk,i). (3.28)

(23)

3.6.4

Weighted KL entropy estimation

The way that entropy is defined in the models used in this thesis is a variation of the KL kth nearest neighbour estimation of the previous section. Instead of choosing a single kth nearest neighbour, a set of multiple nearest neighbours are weighted and summed for every random element of the random sample. Specifically, a weight vector wkmin,kmax = wkmin, ...., wkmax

withPkmax

i=kminwi= 1 is used to average over a number of KL kth nearest neighbours

ˆ H(x1, ..., xn, k) ∼ n X i=1 kmax X k=kmin wklog (ρk,i). (3.29)

Generalisations like this have been proposed by [Sricharan et al., 2012] and [Moon et al., 2016] as a solution to the inefficiency of KL estimators as the dimension grows. Whereas mostly entropy estimators are used to infer properties of random variables, here it will be used as a penalty on degeneracy of source fluxes. The entropy of a sample of fluxes will be added to the loss function.

(24)

CHAPTER 4

Research method

4.1

Probabilistic programming models

The models in this research project are written in the probabilistic programming language (PPL) Pyro [Bingham et al., 2018], which is a library build on the PyTorch backend [Paszke et al., 2017]. Pyro allows building complex probabilistic models from Python functions and inferring the dis-tribution of latent variables. Latent variables refer to variables that are unobserved in the model. In a physics context latent variables might refer to quantum state amplitudes, that are an unob-servable property, but can be inferred using experiments. In the context of probabilistic models latent variables refer to parameters that cannot be observed but can be inferred, like probabil-ity distribution parameters. Latent variables are inferred under Evidence lower bound (ELBO) optimization (4.1) of the chosen guide. The guide is another python function that samples every latent variable from the chosen distributions. The guide approximates the posterior distribution since the posterior is in practice intractable. In terms of Bayesian statistics, a Pyro model defines

P (θ|X) = P (X|θ)P (θ)

P (X) , (4.1)

where θ are the latent variables and the guide is used to approximate the posterior distribution P (θ|X). This approximation is done by inference algorithms contained in Pyro. Variational inference is the process of finding the guide from a given family of guides K that is the best approximation according to some loss function. The quality of the approximation is measured using KL divergence of the approximation and the exact posterior. Inference comes down to solving

q ∗ (z) = arg minq(θ)∈K)KL(q(θ)||P (θ|X)), (4.2)

with the with KL(q(θ)||P (θ|X)) = E[log (q(θ))] − E[log (p(θ|X))]. The fact that P (θ|X) appears in the above equation is the reason that approximate inference is needed. To approximate a different measure is used that is equivalent to KL up to a constant term, the evidence lower bound (ELBO)

ELBO = Eq[log (P (θ, X))] − Eq[log (q(θ))], q(θ) = δ(θ)

ELBO = Eq[log (P (θ)P (X|θ))] − Eq[log (q(θ))].

(4.3)

Note that the second term in the equation −Eq[log(q(θ))] = H(q), is the Shannon entropy of

distribution q as in equation 3.20. Stochastic variational inference is the process of finding q(θ), such that the above ELBO is maximized, maximizing the ELBO minimizes KL divergence and thus gives the best representation of the posterior. Say one uses a delta distribution as the guide,

P (y) = (

1 if y = x,

(25)

then, using the statement that the entropy measures uncertainty made in section 3.6.1 and the example in section 3.6.2, the second term in equation 4.3 is always zero. This is the case because the delta distribution does not carry any uncertainty. Rewriting equation 4.3 for the delta function therefore yields

ELBO = Eq[log (P (θ, X))] + H(q(θ))

ELBO = Eq[log (P (θ)P (X|θ))],

(4.5)

when maximized yields the MAP estimates of the latent variables.

4.1.1

A simple example

Consider the project [Stam, 2020] as an example. In this model it is assumed that the price of avocados is log-normal distributed. The log-normal parameters are the latent variables of the model and can be inferred observing the data under the assumption that both parameters have independent Gamma(7.5, 1) distributions as priors. Here the guide samples the latent variables log mean and log var from a δ(x) distribution. Choosing the δ distribution to sample from in the guide corresponds to MAP estimation of the latent variables, as will become clear from 4.1. Hence, here the MAP estimates of the log-mean and log-variance are inferred from the dataset. For more examples consider the Pyro documentation [Uber Technologies, 2018].

4.2

Model components

This section will discuss the properties of the components that are used in the models to generate results in section 5. The models are trained against mock data, of which the components are known. Using SVI, the MAP estimates of the source fluxes are inferred whilst under the entropy loss constraint.

4.2.1

Mock data

The mock data is generated on a 100 by 100 pixel grid. For each pixel a sample is taken from a standard normal distribution

f (x) = √1 2πe

−x2

2 . (4.6)

. This sample is then transformed to a physical flux using the luminosity function transformation

F (X) = 10aX−b, X ∼ N (0, 1). (4.7)

For different values of a and b = 1.2 the luminosity function is plotted in 4.1 in log log scale. As a reference two power laws are also displayed in the plot. This sample from the luminosity F (X), X ∼ N (0, 1) will be added to a background bg(xi), which could be the zero background

or any other distribution. This cumulative signal is then convoluted with the hypothetical IPSF g(x), which can be a delta function (no convolution) or a Gaussian kernel with varying standard deviation σ. The cumulative convoluted signal Fc,c(X) is a last passed trough a noise distribution

like N (Fc,c(X), ), POIS(Fc,c(X) or δ(Fc,c(X)). An example mock photon map, generated with

the configuration in table 4.3, is illustrated in figure 4.2.

4.2.2

The model

The model is composed of the same components present in the mock data applied in the same order. The only latent variables to be inferred are the fluxes that are sampled from the normal distribution. The guide that is used to MAP estimate the fluxes is the AutoDelta guide from the Pyro library. The model is conditioned on the generated mock data. When applicable, the entropy defined in 3.6.4 is incorporated in the loss as

(26)

Figure 4.1: Luminosity functions for several values of the luminosity parameter a compared with two power laws.

Figure 4.2: Example mock data set split into the different components.

Noise Background IPSF Luminosity function

POIS(F (xi)) bg(xi) = Fermi-LAT20◦x20◦ g(x) = Gauss(.5) 101.5f (X)−1.2

Figure 4.3: Mock data configuration used to generate figure 4.2.

4.3

Obtaining statistical results

From the definition of the ELBO using a delta function in equation 4.5 and setting the luminosity parameter a to a constant while keeping the fluxes in the SVI

ELBO(a, F ) = Eq[log ((P (F, a)P (X|F, a))], (4.9)

when maximized for the best approximation to the posterior

(27)

yields an expression proportional to the likelihood of the luminosity parameter. This log-likelihood can be used to construct confidence intervals for the luminosity parameter using Wilks’ theorem from section 3.5.

(28)

CHAPTER 5

Results

Now that the theoretical background and the model definitions are in place, it is time to present the obtained results. In this chapter increasingly complex models are considered. The source fluxes in mock data are inferred and confidence intervals for the luminosity parameter are con-structed. In section 5.1 a model without entropy loss is considered as a performance baseline. This model only contains Poisson noise applied to a set of source fluxes. The latter will illustrate the problem that arises when constructing confidence intervals for the luminosity parameter. In the next section entropy loss is added to the model, and different entropy weights are evaluated for efficiency in constraining the luminosity parameter. The most effective weights are considered for all the following sections. An IPSF and diffuse background are added to the model in sections 5.3 and 5.4 respectively.

5.1

No entropy optimization

As a baseline this section will contain the results obtained in an entropy free context. The model configuration is displayed in table 5.1. Figure 5.2 shows the MAP estimated fluxes and the log-likelihood ratio for different luminosity parameter values. When constructing a confidence interval, ideally the log-likelihood ratio will look similar to figure 3.2. From figure 5.2 b it becomes clear that the log-likelihood ratio does not minimize around the right luminosity parameter at a = 1.4. The log-likelihood ratio instead slopes down and clearly favours higher luminosity parameter values over lower values. The test statistic of ∼ 1200 at the a = 1.4 is almost three orders of magnitudes larger than the 95% confidence value. The MAP estimates of the fluxes in 5.2 display a discrete profile, which is most clear at smaller fluxes 10−1, 100, 2 ∗ 100, 3 ∗ 100. This

profile is generated by the Poisson nature of the added noise. Due to the observed Poisson noise, the flux MAP estimates are transformed to a discrete-like space. To inspect the global under- or overestimation of the fluxes by the model, the sum of the residuals

r(a) = 1 N i=N X i=1 ˆ Fi− Fi = 1 N( ˆFtotal− Ftotal), (5.1) is plotted in figure 5.3 as a function of the assumed luminosity parameter. As the assumed lumi-nosity parameter is increased, the total MAP estimated flux slowly becomes less underestimated.

Noise Background IPSF Luminosity function

POIS(F (xi)) 0 g(x) = δ(x) 101.4fi−1.2+ bg(xi), fi∼ N(0, 1)

Figure 5.1: Model configuration for figure 5.2 .

(29)

Figure 5.2: (a). MAP estimation of fluxes without entropy weights at a = 1.4. The MAP estimates of the fluxes are quantized due to the Discrete nature of Poisson noise. (b) Log-likelihood ratio as a function of the luminosity parameter is illustrated in blue. The yellow line represents the test statistic of the 95% confidence interval. Figures are generated by a model without entropy loss.

Figure 5.3: The weighted sum of residuals is plotted against the assumed value of the luminosity parameter in blue. The green line represents the real luminosity parameter used to generate the data.

5.2

Tuning entropy weights

In this section the model will only contain Poisson noise applied to the observed fluxes. The mode configuration is summarized in figure 5.4. To illustrate the difference between entropy weights, several different entropy configurations are used to find the flux MAP estimates in section 7.2 and the log-likelihood ratio. Scanning through the log-likelihood ratios for different weights in 7.2 it can be observed that the luminosity parameter is nicely constrained from below, the only exception being the symmetric weights in 7.2. The trend of the log-likelihood ratio in figure 7.2 follows a similar trend as the model without the entropy loss in section 5.1. The other weights

(30)

Noise Background IPSF Luminosity function POIS(F (xi)) 0 g(x) = δ(x) F (xi) = 101.4fi−1.2, fi∼ N (0, 1)

Figure 5.4: Model configuration for figure 5.5 .

considered can now be assessed by observing how well the minimum is localized, and the size of the fluctuations that are present in the log-likelihood ratio. The better the minimum is localized the smaller confidence interval around the luminosity parameter. This localization should be combined with a small fluctuation, or deviation from the trend, in the log-likelihood ratio, to avoid multiple acceptable confidence regions. Linear weights, figures 7.7, 7.8 and 7.9, seem to perform poorly both in the localization of the minimum as well as the deviation from the trend. The same holds for the weights in figures 7.6, 7.1 and 7.4. The weights in figure 7.5 do obtain a very smooth log-likelihood ratio, but lack a strong localized minimum. The entropy weights in equation 7.4 seem to result in the smoothest and most localized log-likelihood ratio, which is again illustrated in figure 5.5. To be of value the model must at least produce similar results for a variate luminosity parameter. Figure 5.6 illustrates how the model performs for different underlying luminosity functions, which are plotted in figure 4.1. The figures show that the log-likelihood minimizes around the actual parameter, but can display order of magnitude deviations from the trend.

Figure 5.5: (a). MAP estimation of fluxes with entropy weights from equation 7.4 at a = 1.4. (b) Likelihood ratio as a function of the luminosity parameter. (c) Model weights.

(31)

Figure 5.6: Log-likelihood ratio for different luminosity parameters is plotted in blue. The actual value of the luminosity parameter is illustrated by the green vertical line. The 95% confidence interval based on Wilks’ theorem is displayed in orange.

5.3

Applying IPSF

In this section the IPSF is added to the model. The IPSF is set to a Gaussian kernel with varying standard deviation σIP SF. Here the standard deviation is defined in number of pixels. In this

setup setting σIP SF = .5 will result in a distribution with 95% of the flux in the source pixel

and the 8 pixels around it. In the model and the generation of the mock data before the noise is added, the signal is convoluted with this kernel. The convolution will result in the smearing of the signal, which can be seen in figure 4.2. Figure 5.7 show the behaviour of the log-likelihood ratio inference under different values of σIP SF. It is clear that, although a trend persists, the

smoothness of the log-likelihood ratio disappears when increasing σIP SF.

5.4

Adding background

In this section the model will contain a diffuse background component generated from the Fermi-LAT data. The background signal of the inner 20 by 20 degrees is added to the physical flux as in figure 4.2. In figure 5.9 the MAP estimated fluxes for this model are shown. Below mock fluxes O(102) the MAP estimates seem to be distributed relatively randomly. Looking at the

log-likelihood ratio for this model in figure 5.10 it is clear that no minimum is obtained around the mock data luminosity parameter at a = 1.4. The log-likelihood ratio also displays large fluctuations over the inspected interval.

(32)

Figure 5.7: Behaviour of the likelihood ratio under standard deviations σ = 0.5, 0.25, 0.35.

Noise Background IPSF Luminosity function

δ(F (xi)) bg(xi) = Fermi-LAT20◦x20◦ g(x) = δ(x) F (xi) = 101.4fi−1.2+ bg(xi), fi∼ N (0, 1)

Figure 5.8: Model configuration for figure 5.9 and 5.10 .

Figure 5.9: Mock physical flux function and the MAP fluxes at the real luminosity parameter a = 1.4

5.5

Discussion

From figure 5.3 it follows that the total MAP estimates of the fluxes is underestimated at the real luminosity function when entropy loss is not incorporated. It is also clear from figure 5.2 that the most likely luminosity parameter for this model biased toward a larger value. This is expected because the model needs to compensate for the underestimated sources. This is the mechanism that is restraining the inference of a proper confidence interval for the luminosity

(33)

Figure 5.10: Likelihood ratio for luminosity parameter with diffuse background component

parameter. When adding a weighted Kth nearest neighbours entropy loss term to the model in 5.5, the lower bound for the luminosity parameter can be constrained relatively effective and does not seem to depend on the chosen weights too much. One exception being the Gaussian weights in figure 7.2, that are symmetric around the central nearest neighbour. These weights result in a log-likelihood ratio that, just as with the entropy free model, is bias toward larger values of the luminosity parameter. This might indicate that fluxes are underestimated in this model as well. The construction of the two sided confidence region is however strongly dependent on the used weights as can be seen in appendix 7.2. Overall it is clear that the upper bounds are less strong than the lower bounds. The fluctuations in the log-likelihood ratio for models with entropy loss might be a product of the entropy loss term adding more complexity to the loss surface. One can imagine that the addition of such a term can result in a lot of local optima, which might explain the deviations from the trend of the log-likelihood ratio. Incremental complexities added to the model like introducing a IPSF or a diffuse component, reduces the smoothness of the log-likelihood significantly for the chosen weights. This makes it almost impossible to infer a proper confidence interval on the luminosity parameter. One important thing to note is that whilst experimenting with different entropy weights for the same model, the results can change significantly as can be seen in appendix 7.2. This suggests that the effectiveness of the model strongly depends on the weights. It is likely that there are weights that perform better for the more complex models and there might be a better way to chose weights for a specific model.

(34)

CHAPTER 6

Conclusion

In section 5.1 it is shown that a model without entropy loss will result in a bias in the luminosity function due to the underestimation of sources. This bias favours higher values of the luminosity parameter. Adding entropy loss to the model can improve on this bias as long as the entropy weights are chosen effective and the model remains relatively simple. From appendix 7.2 it becomes clear that most weights can improve on the lower bound of the confidence region of the luminosity parameter in a relative simple model. The quality of the upper limit however dependents strongly on the choice of the weights. As long as the model remains relatively simple, like the model in section 5.2, the results are promising. The model is relatively stable in the inference of different luminosity parameters. Introduction of more model complexities, like IPSFs or diffuse backgrounds, reduce the smoothness and localisation of the minimum of the likelihood function drastically, making it hard to give a good confidence interval for the luminosity parameter. The effectiveness of entropy weights are very unstable between models. To allow the application of the entropy loss on more complex models, like a diffuse GC model, to infer the underlying luminosity function of a source population more research toward the mechanics of the weights is needed.

6.1

Future research

As explained in the discussion, the ability of a model to provide a good log-likelihood ratio for the luminosity parameter depends strongly on the used weights and the complexity of the model. Although different families of weights have been considered, the studied group was not very large. In this research no quantitative statements are made regarding the relation of the weights and the models effectiveness in producing a good log-likelihood ratio for the luminosity parameter, nor is there a quantitative measure for a good log-likelihood ratio. To allow for a better understanding of weight versus model effectiveness, a study could be conducted to asses this relationship. This subsequent study might focus on the relation between the entropy weights and the smoothness of the loss surface they induce. Another point of interest is the ability of the entropy term to form a non bias minimum in the log-likelihood ratio. Another study that might be of interest, could be determining efficient weights using some form of machine learning. Although this might improve models, it does not directly improve on the understanding of the weight versus model effectiveness relationship.

(35)

CHAPTER 7

Appendix

7.1

Synchotron

Probabliy just drop this.. In general moving charged particles accelerated in magnetic fields will radiate. This radiation is called cyclotron radiation when the particles are non relativistic. The emission power can be derived by considering Larmor’s formula

P = 2q

4β2B2

3c2M3 . (7.1)

Lighter particles will radiate more because these particles will lose more energy in the magnetic field. A property of cyclotron radiation is that the spectrum is very discrete and only has several peaks. When considering relativistic speeds the emission will be subject to relativistic beaming. This will morph the emission into a cone with opening angle θ ∼ γ−1. Due to the red shift when the cone of the beam moves over our line of sight the observed spectrum of synchotron radiation is very broad and smooth compared to cyclotron radiation.

7.2

Comparing entropy weights

In figures 7.1 to 7.9 the influence of several types of entropy weights are illustrated. Note that the weights are all normalized such that the sum over all the weights adds up to one. This property is enforced by adding the normalization constant N .The first sub figure of all the plots illustrates the MAP estimates for the fluxes when the luminosity parameter corresponds to the actual value which is used to generate the mock data. In the second sub figure the likelihood ratio as a function of θ0 is plotted. The last sub figure illustrates the entropy weights that were

used to obtain the results. Note that the equations that correspond to the entropy weights are included in the figures.

(36)

we(k) = N k2e−k

2/1000

, 4 < k < 50 (7.2)

(37)

we(k) = N e−

(k−84)2

402 , 4 < k < 50 (7.3)

(38)

we(k) = N (k − 4)e−

(k−4)2

202 , 4 < k < 64 (7.4)

(39)

we(k) = N (k − 4)e−

(k−4)2

602 , 4 < k < 184 (7.5)

(40)

we(k) = N ke−k/10, 4 < k < 50 (7.6)

Figure 7.5:

we(k) = N ke−k/35, 4 < k < 50 (7.7)

(41)

we(k) = N k, 4 < k < 14 (7.8)

Figure 7.7:

we(k) = N k, 4 < k < 40 (7.9)

(42)

we(k) = N k, 4 < k < 80 (7.10)

(43)

Bibliography

[Abramowski et al., 2016] Abramowski, A. et al. (2016). Acceleration of petaelectronvolt protons in the Galactic Centre. Nature, 531:476.

[Acero, 2010] Acero, F. (2010). Localising the VHE gamma-ray source at the Galactic Centre. Mon. Not. Roy. Astron. Soc., 402:1877–1882.

[Aharonian et al., 2004] Aharonian, F., Akhperjanian, A. G., Aye, K. M., Bazer-Bachi, A. R., Beilicke, M., Benbow, W., Berge, D., Berghaus, P., Bernl¨ohr, K., Bolz, O., Boisson, C., Borgmeier, C., Breitling, F., Brown, A. M., Bussons Gordo, J., Chadwick, P. M., Chitnis, V. R., Chounet, L. M., Cornils, R., Costamante, L., Degrange, B., Djannati-Ata¨ı, A., O’C. Drury, L., Ergin, T., Espigat, P., Feinstein, F., Fleury, P., Fontaine, G., Funk, S., Gallant, Y., Giebels, B., Gillessen, S., Goret, P., Guy, J., Hadjichristidis, C., Hauser, M., Heinzelmann, G., Henri, G., Hermann, G., Hinton, J. A., Hofmann, W., Holleran, M., Horns, D., de Jager, O. C., Jung, I., Kh´elifi, B., Komin, N., Konopelko, A., Latham, I. J., Le Gallou, R., Lemoine, M., Lemi`ere, A., Leroy, N., Lohse, T., Marcowith, A., Masterson, C., McComb, T. J. L., de Naurois, M., Nolan, S. J., Noutsos, A., Orford, K. J., Osborne, J. L., Ouchrif, M., Panter, M., Pelletier, G., Pita, S., Pohl, M., P¨uhlhofer, G., Punch, M., Raubenheimer, B. C., Raue, M., Raux, J., Rayner, S. M., Redondo, I., Reimer, A., Reimer, O., Ripken, J., Rivoal, M., Rob, L., Rolland, L., Rowell, G., Sahakian, V., Saug´e, L., Schlenker, S., Schlickeiser, R., Schuster, C., Schwanke, U., Siewert, M., Sol, H., Steenkamp, R., Stegmann, C., Tavernet, J. P., Th´eoret, C. G., Tluczykont, M., van der Walt, D. J., Vasileiadis, G., Vincent, P., Visser, B., V¨olk, H. J., and Wagner, S. J. (2004). Very high energy gamma rays from the direction of Sagittarius A∗. , 425:L13–L17.

[Aharonian et al., 2006] Aharonian, F., Akhperjanian, A. G., Bazer-Bachi, A. R., Beilicke, M., Benbow, W., Berge, D., Bernl¨ohr, K., Boisson, C., Bolz, O., Borrel, V., and et al. (2006). Discovery of very-high-energy -rays from the galactic centre ridge. Nature, 439(7077):695–698. [Axford, 1981] Axford, W. I. (1981). The acceleration of cosmic rays by shock waves. Annals of

the New York Academy of Sciences, 375(1):297–313.

[Balick and Brown, 1974] Balick, B. and Brown, R. L. (1974). Intense sub-arcsecond structure in the galactic center. Astrophysical Journal, 194.

[Bartels et al., 2018] Bartels, R., Edwards, T. D. P., and Weniger, C. (2018). Bayesian model comparison and analysis of the galactic disc population of gamma-ray millisecond pulsars. Monthly Notices of the Royal Astronomical Society, 481:3966–3987.

[Bartels et al., 2016] Bartels, R., Krishnamurthy, S., and Weniger, C. (2016). Strong support for the millisecond pulsar origin of the galactic center gev excess. arXiv:1506.05104 [astro-ph.HE]. [Bartels et al., 2017] Bartels, R. M., Storm, E., and Weniger, C. (2017). The fermi-lat gev excess

traces stellar mass in the galactic bulge.

(44)

[Bednarek and Sobczak, 2013] Bednarek, W. and Sobczak, T. (2013). Gamma-rays from mil-lisecond pulsar population within the central stellar cluster in the Galactic Centre. Monthly Notices of the Royal Astronomical Society: Letters, 435(1):L14–L18.

[Bhattacharya and van den Heuvel, 1991] Bhattacharya, D. and van den Heuvel, E. (1991). For-mation and evolution of binary and millisecond radio pulsars. Physics Reports, 203(1):1 – 124. [Bingham et al., 2018] Bingham, E., Chen, J. P., Jankowiak, M., Obermeyer, F., Pradhan, N., Karaletsos, T., Singh, R., Szerlip, P., Horsfall, P., and Goodman, N. D. (2018). Pyro: Deep Universal Probabilistic Programming. Journal of Machine Learning Research.

[Casandjian, J.-M. and Grenier, I. A., 2008] Casandjian, J.-M. and Grenier, I. A. (2008). A re-vised catalogue of egret ${\sf \gamma}$ − raysources. A&A, 489(2) : 849 − −883.

[Clark et al., 2016] Clark, H. A., Scott, P., Trotta, R., and Lewis, G. F. (2016). Dark matter substructure cannot explain properties of the fermi galactic centre excess. arXiv: High Energy Astrophysical Phenomena, pages 060–060.

[Dermer, ] Dermer, C. D.

[Drury, 2012] Drury, L. O. (2012). Origin of cosmic rays. Astroparticle Physics, 39-40:52–60. [Fichtel et al., 1976] Fichtel, C. E., Kniffen, D. A., Thompson, D. J., Bignami, G. F., and Cheung,

C. Y. (1976). Significance of medium-energy gamma-ray astronomy in the study of cosmic rays. , 208:211–219.

[Fournier and Delattre, 2016] Fournier, N. and Delattre, S. (2016). On the kozachenko-leonenko entropy estimator.

[Gillessen et al., 2009] Gillessen, S., Eisenhauer, F., Trippe, S., Alexander, T., Genzel, R., Martins, F., and Ott, T. (2009). Monitoring stellar orbits around the massive black hole in the galactic center. Astrophys. J., 692.

[Goodenough and Hooper, 2009] Goodenough, L. and Hooper, D. (2009). Possible evidence for dark matter annihilation in the inner milky way from the fermi gamma ray space telescope. Annalen der Physik.

[Gordon et al., 2013] Gordon, C., Macias, O., and of Canterbury), U. (2013). Dark matter and pulsar model constraints from galactic center fermi-lat gamma ray observations. arXiv:1306.5725 [astro-ph.HE].

[H.E.S.S., ] H.E.S.S. High energy stereoscopic system.

[Hessels, 2006] Hessels, J. W. T. (2006). A radio pulsar spinning at 716 hz. Science, 311(5769):1901–1904.

[Hooper et al., 2018] Hooper, D., Cholis, I., and Linden, T. (2018). Tev gamma rays from galactic center pulsars. Physics of the Dark Universe, 21:40–46.

[Kosack et al., 2004] Kosack, K., Badran, H. M., Bond, I. H., Boyle, P. J., Bradbury, S. M., Buck-ley, J. H., Carter-Lewis, D. A., Celik, O., Connaughton, V., Cui, W., Daniel, M., DVali, M., de la Calle Perez, I., Duke, C., Falcone, A., Fegan, D. J., Fegan, S. J., Finley, J. P., Fortson, L. F., Gaidos, J. A., Gammell, S., Gibbs, K., Gillanders, G. H., Grube, J., Gutierrez, K., Hall, J., Hall, T. A., Hanna, D., Hillas, A. M., Holder, J., Horan, D., Jarvis, A., Jordan, M., Kenny, G. E., Kertzman, M., Kieda, D., Kildea, J., Knapp, J., Krawczynski, H., Krennrich, F., Lang, M. J., Bohec, S. L., Linton, E., Lloyd-Evans, J., Milovanovic, A., McEnery, J., Moriarty, P., Muller, D., Nagai, T., Nolan, S., Ong, R. A., Pallassini, R., Petry, D., Power-Mooney, B., Quinn, J., Quinn, M., Ragan, K., Rebillot, P., Reynolds, P. T., Rose, H. J., Schroedter, M., Sembroski, G. H., Swordy, S. P., Syson, A., Vassiliev, V. V., Wakely, S. P., Walker, G., Weekes, T. C., and Zweerink, J. (2004). TeV gamma-ray observations of the galactic center. The Astrophysical Journal, 608(2):L97–L100.

Referenties

GERELATEERDE DOCUMENTEN

The themes to be kept firmly in mind include protecting human rights claims against their usurpation and monopolisation by state power; protecting human rights claims against

This challenge is scoped to a few combinatorial problems, including the academic vehicle routing problem with soft time windows (VRPSTW) and a real world problem in blood supply

Note that the tessellation cells described above can be seen as adaptive neighbourhoods of a point of Φ. In contrast to the balls of fixed radius h used before, the size of the

Dat is een enorme omwenteling die onvermijdelijk zal blijven doorwerken in de manier waarop onze voorzieningen werken, in onze maatschappelijke rol (die er alleen maar belangrijker

Ongoing monitoring of the IPV project according to the outcome mapping method enabled the project team to adapt strategies as needed and monitor the progress of boundary

Van een driehoek is de zwaartelijn naar de basis gelijk aan het grootste stuk van de in uiterste en middelste reden verdeelde basis.. Construeer die driehoek, als gegeven zijn

Then the students are very quickly seduced to work with these simulation tools rather than being introduced to real basic circuit and systems concepts and

Total IR Luminosity Function of the ALPINE non-target con- tinuum detections in the redshift interval 4.5&lt;z&lt;6: the results shown in Figure 8 (red boxes and black filled