Deep Neural Networks for Position Reconstruction in XENON1T

(1)

MSc Physics and Astronomy

Gravitation, Astro-, and Particle Physics

Master Thesis

Deep Neural Networks for Position Reconstruction in

XENON1T

by

Lucas de Vries

10650881

July 2020

60 ECTS

September 2019 - July 2020

Supervisor/Examiner: Prof. Dr. M.P. Decowski Daily Supervisor: Dr. S.A. Br¨unner Second Examiner: Dr. I.B. van Vulpen

(2)

(3)

Abstract

Dark matter (DM) accounts for 26.5% of the energy content of the Universe, but its nature is yet to be found. The XENON1T direct detection experiment holds the most stringent upper limit on the DM-nucleon spin-independent elastic scattering cross-section for DM masses larger than 3.5 GeV/c2. The xenon-based dual-phase time projection chamber (TPC) has a 2-tonne active mass. Particle interactions yield primary scintillation signals (S1s) and secondary scintillation signals due to ionization (S2s), which we observe with arrays of photomultiplier tubes (PMTs) on the top and bottom of the TPC. The time delay between S1 and S2 allows us to reconstruct the depth of the interaction and we apply neural networks for horizontal position reconstruction using the observed S2 light distribution on the top PMT array. Position reconstruction plays an important role in background mitigation and therefore it is a key-point in our dark matter search analysis.

In this work, we show the advantage of a new method to generate data for training neural networks used for position reconstruction. Additionally, we show how over-fitting parameters affect the reconstruction precision and how tuning these parameters can obscure limitations in our reconstruction performance. We validate our neural networks on simulated data and on real detector calibration data and improve the position reconstruction performance significantly.

Furthermore, we designed a new 1-dimensional convolutional neural network for the recon-struction of the depth of an interaction, using only the S2 signal. We apply this method to simulated data and real detector calibration data and compare the reconstruction performance to the typical S2-only depth reconstruction procedure that is currently used. Our new method reconstructs the depth of a low-energy event with greater certainty, when the detector efficiency drops below 100% and the S1 is not observed.

(4)

(5)

1. Introduction

We have come a long way in understanding the laws of physics that govern our world. Yet, we have not been able to resolve the nature of the bulk of the matter in the Universe. Observational evidence suggests that dark matter accounts for ∼85% of the mass in the Universe. Thus far, we do not know what constitutes this matter component.

XENON1T is the leading experiment that aimed to observe dark matter and currently holds the most stringent upper limit on the spin-independent elastic scattering cross-section for a large range of dark matter masses. The XENON collaboration operated the detector with a 2-tonne liquid xenon target in the underground laboratory Laboratori Nazionali del Gran Sasso (LNGS) in Italy. Equipped with light sensors (photomultiplier tubes) it detects two types of scintillation signals induced by a single particle interaction: the prompt scintillation signal S1 and the time-delayed ionization signal S2. The dark matter search analysis relies greatly on the 3D position reconstruction of signals, because it is our strongest weapon to mitigate backgrounds and thereby it increases our sensitivity to potential dark matter signals. The pattern of light sensors which detect the ionization signal S2 is typically used to reconstruct the horizontal position of the event. The time difference between the S1 and S2 signals provides information about the depth of the interaction in the detector.

The rise of deep learning in fundamental physics suggests further investigation of the ap-plication of deep learning techniques in the XENON1T experiment, specifically for the improve-ment of its position reconstruction. The first objective is to improve the horizontal position reconstruction using new methods to generate better training data for position reconstruction algorithms. Secondly, we aim to reconstruct the depth of a low-energy interaction when the S1 signal is not observed by the detector. We apply deep learning techniques to reconstruct the depth, since the time-delay between the S1 and S2 signals is not known.

We first introduce dark matter and discuss the candidates and detection strategies in Chap-ter 2. We then explain the working principle of the XENON1T detector in ChapChap-ter 3. In Chapter 4, we present an overview of machine learning and deep neural networks. We investi-gate the performance of various deep neural networks for position reconstruction in Chapter 5. Thereafter, we apply deep learning techniques to reconstruct the depth of an interaction using incomplete detector signals in Chapter 6. Finally, we present our conclusions in Chapter 7, and our recommendations in Chapter 8.

(8)

(9)

2. Dark matter

The dark matter puzzle poses one of the greatest questions in the history of physics. The universe appears to be dominated by dark energy and non-luminous matter. We consider only < 5% of the energy content of our universe to be luminous baryonic matter. Temperature anisotropies in the cosmic microwave background give rise to a dark energy density fraction of 68.6% in the leading cosmological model, leaving 26.5% for the so-called dark matter [1]. We call this dark matter, as it plays a critical role in confining galaxies and the formation of large scale structures in the universe, and it does not interact with light in any way. Dark energy, on the other hand, drives the expansion of the universe. Still, our knowledge about this mysterious matter component of the universe is very limited as dark matter does not show itself easily. Even if we assume that it is a particle, we do not know its composition, its way of interacting with standard model particles, or its mass: we are in the dark.

The observation of dark matter will note the first detection of a particle beyond the standard model. Therefore, the physics community as a whole, theorists and experimentalists included, is eager to get a glimpse of this particle. These days, many experiments aim to observe dark matter to get a better understanding of the universe.

We first very briefly introduce the main observational evidence for dark matter in Sec-tion 2.1. Then, we discuss potential dark matter candidates, with in particular the candidate of interest for the XENON1T experiment in Section 2.2. Section 2.3 presents various detection strategies and the interaction rates and limits we expect for direct detection experiments, such as XENON1T. Finally, we discuss the next era of dark matter detectors in Section 2.4

2.1 Observational evidence for dark matter

We present a brief overview of the main evidence for the existence of dark matter. For an extensive review about how dark matter became widely adopted in the scientific world, we refer to [2].

Zwicky first mentioned dark matter after measuring unforeseen large velocities of galaxies in the Coma cluster, indicating a non-luminous matter component [3]. Half a century later, Rubin observed a discrepancy in the radial velocity of stars around the center of some galaxies: the velocity remains constant and is independent of the distance to the center at large radii [4]. These observations led to the hypothesis that dark matter was uniformly distributed in a halo,

(10)

extending far beyond the galaxy.

Other astronomical evidence came from the theory of general relativity. We can estimate the gravitational potential of massive objects in the universe through gravitational lensing, i.e., measuring the amount of deformation of light from behind the object [5, 6]. These measurements show that some objects are more massive than their expected luminous mass, i.e., the mass-to-light ratio is large. Probing galaxy-cluster collisions, [7] show that the X-rays originating from the gaseous interactions are not coming from the same location as the centers of mass of the collided galaxies. The centers of mass of both galaxies have actually moved onwards, while the gas is interacting and therefore lags behind. This indicates a low dark matter self-interaction probability, as the dark matter haloes seem to freely continue their path without regard to the colliding gas.

Recent measurements of the cosmic microwave background (CMB) show O(10−5) tempera-ture or density anisotropies at the time of recombination when the universe became transparent [8]. The (an)isotropies depend on the scale of the examined patch of the sky. The ΛCDM (Λ cold dark matter) cosmological model fits the oscillating pattern in the CMB power spectrum precisely, and includes a cold (thus non-relativistic) dark matter component with a density of ΩDM = 0.265 [1]. The baryon and dark energy density in this flat universe are Ωb= 0.049 and

ΩΛ = 0.686, respectively. Simulations show how dark matter density fluctuations led to the

formation of large scale structure in the universe today, i.e., galaxies and clusters of galaxies. Dark matter was able to accumulate due to gravity, while luminous matter experienced heavy self-interaction. Large scale structure are thus a results of small density perturbations in the early universe.

These astronomical and cosmological evidences are the base of the dark matter mystery and started the search for potential explanations. We discuss a few in the following section.

2.2 Dark matter candidates

We discuss several dark matter models in this section. We also describe the expected properties of a new dark matter particle and discuss the particle that is of interest for the XENON1T experiment.

Modifications of the conventional Newtonian dynamics model are able to describe the discrepancy of the radial velocity observations in galaxies without introducing dark matter. However, these models are very limited to fit other observations. TeVeS [9] is currently the primary modified gravitational model that replaced the flawed MOND model for modified gravity [10]. Still, TeVeS is only able to fit either gravitational lensing, or the rotational curves, and leads to an unstable universe. MOND and TeVeS are both not able to describe the universe that we observe and are unlikely to solve the dark matter problem.

The universe is packed with objects with a large mass-to-light ratio, e.g., black holes, neutron stars, or non-radiative exo-planets. These massive astrophysical compact halo objects

(11)

(MACHOs) are certainly not the primary dark matter component in our universe, as they contribute at most 20% to the observed dark matter density [11]. Furthermore, the predicted baryon density from the ΛCDM cosmological model is in strong agreement with the theory of big bang nucleosynthesis [12]. Therefore, we expect that the nature of dark matter is in fact non-baryonic and this is in contradiction to the baryonic content of MACHOs. If dark matter would couple strongly, baryon dark matter interactions would also have had an influence on big bang nucleosynthesis and thus on the observed baryon density [13]. Big bang nucleosynthesis theory without strong interacting dark matter agrees with the observed baryon density, and therefore dark matter is not expected to couple strongly.

To explain the observational evidences, we thus expect that dark matter candidates have the following properties:

1. Dark matter must be a stable particle, relative to the lifetime of the universe. 2. Dark matter is cold (non-relativistic).

3. The interaction cross-section must be very low and the dark matter particle does not couple with the strong or electromagnetic force. Interaction with the gravitational and weak force is possible.

4. Dark matter is non-baryonic.

5. The dark matter creation mechanism must explain the relic density we observe.

Many theories and corresponding new particles are proposed to solve the dark matter puzzle. We discuss dark matter candidates which arise from theories developed for solving different physical phenomena, but elegantly also provide a well motivated dark matter candidate.

The neutrino is the only standard model particle that is non-baryonic, weakly interacting, and appears to have a mass [14]. However, it is in disagreement with the ΛCDM model, as neutrinos were relativistic in the beginning of the universe. Hot dark matter could not have seeded the large scale structures that we observe in the universe [15]. This suggest that dark matter particles probably are new physics.

Sterile (right-handed) neutrinos could in principle be a dark matter candidate with masses 1-10 keV, if they were non-relativistic at creation. However, we can not probe these particles in our experiment due to their extremely low cross-section and mass. Sterile neutrinos have not yet been observed by experiments that are sensitive to their parameter space [16].

We expect that the dark matter particles have decoupled from the thermal plasma when the universe expanded and the dark matter self-interaction rate dropped below the Hubble rate [17]. We can thus estimate the cross-section based on the thermal relic density and the current dark matter density within this freeze-out model for the creation of dark matter. We usually denote this new dark matter particle by χ and estimate the self-annihilation cross-section to be hσvi ' 3×10−26_cm3_s−1_{[18]. This cross-section is on the scale of the weak force. Specifically, this}

(12)

is what we expect for a massive particle of mχ ' 100 GeV/c2, that interacts weakly. Therefore,

we call this dark matter candidate a Weakly Interacting Massive Particle (WIMP). We perfectly arrive at the total thermal relic density of dark matter particles, only by assuming thermal freeze-out of a ∼ 100 GeV/c2 to ∼ 1 TeV/c2 particle with a cross-section on the weak scale. This is known as the WIMP miracle.

Two hypothetical particles emerge as a side product of two theoretical models, aiming to solve the hierarchy problem and the unification of forces. These particles align beautifully with the WIMP characteristics. The neutralino from supersymmetry theory, and the Lightest Kaluza–Klein Particle from extra-dimensional Kaluza–Klein theory are both weakly interacting and have predicted masses in the correct mass range.

Other dark matter candidates with another physical origin are for examples the non-thermally produced WIMPzillas (motivated by ultra high-energy cosmic rays) and axions (solv-ing the strong CP problem). Axions are non-relativistic, light (O(10−6eV/c2)) particles that are collision-less, i.e., axions are only interacting by gravity. They can justify all observed dark matter in the universe. For a review on axion dark matter, we refer to [19]. WIMPzilla [20] are heavy stable particles and are motivated by the observation of unexpected ultra high-energy cosmic rays. Their mass is expected to be in the range of 1012− 1016_GeV/c2_.

2.3 Dark matter detection strategies

We discuss various dark matter detection strategies in this section. Particularly, we explain the expected WIMP interaction rates in direct detection experiments and the exclusion limits that we set on the cross-section.

Experimental evidence for the existence of dark matter could come from three detection approaches. We could observe dark matter indirectly by means of probing the annihilation products of dark matter particles. Additionally, we could create dark matter particles in particle collider experiments. Furthermore, we can observe dark matter directly through scattering dark matter particles on standard model matter.

Indirect detection The dark matter density is expected to be large in areas with large gravitational potential, such as the center of the galaxy, or other object with large mass-to-light ratios such as dwarf galaxies. Indirect detection experiments such as Fermi-LAT probe these high density regions, aiming to observe γ-rays originating from dark matter self-annihilation and decay. Fermi-LAT currently poses the most stringent limits on the self-annihilation cross-section, as no evidence has yet been found [21].

Production at colliders In collider experiments, we probe dark matter by searching for missing momentum, after the collision of two standard model particles, e.g. protons. The dark matter particles would leave the detector unnoticed if they are created in a collision. We

(13)

also expect momentum loss from standard model processes. We can conclude on the potential creation of dark matter if we observe an excess of events with missing momentum. If we observe this, it still has to be proven that it is stable enough to be dark matter. The CMS and ATLAS experiments at the Large Hadron Collider (LHC) have not yet found evidence for dark matter and put limits on the dark matter production cross-section in [22].

Direct detection Direct detection experiments such as XENON1T aim to detect the inter-action of dark matter with standard model particles. The dark matter particle deposits some energy on the target material. We can measure the energy deposition by means of observing e.g. a charge (ionization), light (scintillation) or heat (phonon) signal. Different types of direct detection detectors are, e.g., solid-state cryogenic detectors [23], superheated liquid detectors [24], directional detectors [25], and more. We will focus on liquid noble gas detectors, like LUX [26], PandaX-II [27], and specifically XENON1T [28]. In these experiments, we expect the nuclei to recoil with an energy of 1-100 keV, depending on the mass of the incoming WIMP [29].

2.3.1 WIMP interaction rates

Direct detection dark matter experiments aim to observe dark matter in our galaxy through elastic scattering. This is the most common and simplest model that we test. We assume that the dark matter particles are locally distributed with a density ρ = 0.3 GeV/cm3 [30]. This is based on measurements in combination with simulations. Furthermore, we assume that WIMPs follow a velocity distribution in the galactic frame that is best characterized by the Maxwell-Boltzmann distribution with a cut-off at velocities where the velocity of the dark matter particle is larger than the escape velocity (vesc ' 544 km/s) in the galactic frame [30]. To estimate the

interaction rate on Earth, we must take the velocity of the Earth in the galactic frame into account, which is on average vE ' 232 km/s [30]. We neglect the fact that the we revolve

around the Sun.

The scattering rate of a WIMP with a target nucleus depends on the WIMP’s number density and speed, and the interaction cross-section between the WIMP and target. We often call the latter the WIMP-nucleus cross-section. We express this in differential form per unit of recoil energy ER and per kg of target material. f (v) and v denote the velocity distribution and

observed velocity of the WIMPs in the detector frame, and we integrate over all possible velocities that can induce a recoil energy until the maximum WIMP velocity vesc+ vE. Furthermore, we

denote the dark matter mass as mχ, the target mass as mN, and the local dark matter density

as ρ. The differential elastic scattering rate is then: dR dER (v) = ρ mχmN Z vf (v)dv dσ dER (v), (2.3.1) with _dEdσ

R the differential cross-section for interactions between a WIMP and a nucleus of the

(14)

The differential cross-section is proportional to the sum of the spin-independent (SI) and spin-dependent (SD) cross-sections, weighted by the squared form factors F_SI,SD2 . Assuming cross-sections at zero momentum transfer σSI,SD₀ , we have:

dσ dER

∝ F_SI2 (ER)σSI0 + FSD2 (ER)σSD0 . (2.3.2)

For details about the form factors, we refer to [31]. We will concentrate on the spin-independent part of the differential cross-section, which is the most discussed and simplest WIMP-nucleus interaction. We can write this in terms of the atomic mass A and atom number Z and the reduced mass µN = mNmχ/(mN + mχ) with mN the mass of the nucleus:

σ₀SI = 4~

2_c2

π µ

2

N[fpZ + fn(A − Z)]2, (2.3.3)

where fp and fn denote the contributions from the protons and neutrons. If we now assume

that the protons and neutrons equally add to the coupling, we find that σSI₀ = σ_nSIµ 2 N µ2 n A2 for fp≈ fn with σnSI = 4~2_c2 π µ 2 nfn2. (2.3.4) σSI

n denotes the WIMP-nucleon spin-independent cross-section with a neutron or proton and

µ2_n the reduced mass of the nucleon and WIMP. We thus find that the WIMP-nucleus spin-independent cross-section, and thereby the differential cross-section and differential elastic scat-tering rate are proportional to A2. The number of nucleons in the target material is of great importance. Figure 2.3.1a presents the expected event rate per tonne of a xenon (A = 131) and argon (A = 40) target for an exposure time of one year and a dark matter mass of mχ = 100 GeV/c2. For WIMPs, we expect low recoil energies. The expected event rate for

xenon is larger in this low-energy domain. Xenon detectors therefore have a larger potential to find WIMPs, in comparison to argon based detectors.

2.3.2 Exclusion limits

We can set a limit on the WIMP-nucleon cross-section if we do not observe any WIMPs during a certain exposure (mass × time). All cross-sections above this limit are excluded: the WIMP-nucleon cross-section is only allowed to be in the parameter space below the limit. We aim to explore the complete parameter space, and therefore we try to push the limit down gradually, through building more sensitive detectors with more exposure. We are limited by the so-called neutrino floor. When we reach these O(10−49cm2_{) cross-sections, coherent neutrino-nucleus}

interactions will overrule any dark matter signals for mχ > 10 GeV/c2. For low mass WIMPs

with mχ< 10 GeV/c2, solar neutrinos limit the sensitivity to cross-sections above O(10−45cm2)

[34].

The exclusion limit depends greatly on the detectors characteristics, e.g., the target mate-rial, the energy threshold, and the exposure. We can increase the exposure by including a larger

(15)

(a) (b)

Figure 2.3.1: (a) The expected WIMP scattering event rate per unit of recoil energy with a target of argon (red) and xenon (black). We assume a WIMP-mass of 100 GeV/c2_{. For dark matter search, we are interested in the}

low-energy (< 40 keV) domain. The higher event rate in xenon is one of the arguments to favor xenon as a target material. Figure generated with code from [32]. (b) The change in the shape and location of the exclusion limit, in comparison to a reference exclusion limit (black). Lowering the energy threshold increases the sensitivity for the entire mass range (blue), while increasing exposure only shifts the limit down (green). If we would use a lighter target, the maximum sensitivity shifts to lower masses (red). Figure from [33].

target mass in the analysis, the fiducial mass, or by increasing the live-time of the experiment. Figure 2.3.1b presents how the limit typically changes for a different set of detector parameters with respect to a reference limit. Increasing exposure yields an overall lower limit. If we would have a lighter target material, we become more sensitive to smaller WIMP masses. This is because the velocity of incoming WIMPs can be lower to yield a recoil with lighter targets. In turn we loose sensitivity to more massive WIMPs. Furthermore, decreasing the energy threshold results in an increase in sensitivity for all WIMP masses: the exclusion limit is lower on the entire WIMP mass range.

The shape of the exclusion limit is typically as depicted in Figure 2.3.1b. The exclusion limit increases for larger WIMP masses, because the event rate depends inversely on the WIMP mass: 1/mχ. We expect less events at large mχ, and therefore the limit on the cross-section is

higher. For lower WIMP masses, the limit shoots up due to the limited energy threshold of the detector.

To make a fair comparison of the exclusion limits of various dark matter direct detection experiments, all collaborations use the same velocity distribution and local dark matter density to pose their limits. Figure 2.3.2 presents the exclusion limits from the leading noble gas direct detection experiments at the time of writing. In Table 2.3.1, we summarize the three most stringent exclusion limits on the WIMP-nucleon spin-independent cross-section for high mass (>

(16)

5 GeV/c2) WIMPs from the xenon-based LUX (33.5 tonne×day) [26], PandaX-II (54 tonne×day) [27] and XENON1T (1 tonne×year) [28] experiments. The confidence bands in the figure denote the 1σ/2σ confidence interval for the predicted sensitivity of XENON1T. The limit is set with a confidence of 90%. The exposure of the XENON1T experiment is larger than the other experiments. So the better sensitivity is expected.

Figure 2.3.2: The current exclusion limits of the leading direct detection experiments XENON1T, LUX and PandaX-II for the WIMP-nucleon spin-independent cross-section. The confidence level is 90% for the upper limit and the green and yel-low bands show the 1σ/2σ confidence interval for the predicted sensitivity. We refer to Table 2.3.1 for the exact cross-sections. Figure from [28].

Experiment mχ[GeV/c2] σWIMP−nSI [cm2]

LUX [26] 50 1.1 × 10−46 PandaX-II [27] 40 8.6 × 10−47 XENON1T [28] 30 4.1 × 10−47 Table 2.3.1: The maximum sensitivity for the limits in Figure 2.3.2. XENON1T poses the most stringent limits on the WIMP-nucleon spin-independent cross-section for a large range of WIMP masses, reaching a minimum at 30 GeV/c2_.

2.4 The future of dark matter search

Because dark matter is not yet observed, the race continues. Both the XENON collaboration and the LZ collaboration [35] (the merger of the LUX and ZEPLIN [36] experiments) are currently assembling the next generation of xenon detectors. Both increase the exposure significantly, by increasing the live-time and target mass in the detectors.

XENONnT will replace XENON1T. With a fiducial mass of 4 tonne and a presumed live-time of five years, XENONnT will be able to set an exclusion limit on the spin-independent cross-section of 1.4 × 10−48cm2 for mχ = 50 GeV/c2 [37]. LZ reaches a similar limit, excluding

dark matter spin-independent cross-sections above 1.6 × 10−48cm2 for mχ = 40 GeV/c2 [38].

The fiducial mass in the LZ experiment is 5.6 tonne and the expected live-time is 1000 days. Both predicted exclusion limits are at 90% confidence.

While we are hopeful to finally discover dark matter with one of the experiments above, the DARWIN collaboration [39] is presently designing an even larger dark matter detector. With an expected fiducial mass of 30 tonne it will further explore the parameter space, until the neutrino floor is reached.

(17)

3. Enlightening the dark: XENON1T fundamentals

XENON1T is the third-generation direct detection experiment of the XENON collaboration. ZEPLIN [36] and XENON10 [40] were the first liquid noble gas detectors using highly purified xenon as the target material in a dual-phase time projection chamber (TPC). The working principle for these xenon detectors is similar and is based on the scattering of dark matter particles of a target nucleus. The liquid xenon (LXe) and gaseous xenon (GXe) both yield scintillation signals that we observe after such a scattering. Today, LXe TPCs are the state-of-the-art direct detection experiments. As new detectors step up in size, the sensitivity of the detectors increases approximately up to two orders of magnitude for every new generation. The upgrade from XENON10 to XENON100, the first- and second-generation xenon detectors operated by the XENON collaboration, reduced the background even by a factor of 100 while the target mass only increased by a factor of 10 [41]. Similarly, the XENON1T detector is able to put a limit on the spin-independent WIMP-nucleon cross-section at a factor of 100 lower than its predecessor XENON100.

LXe TPCs are very efficient in reaching extremely low background rates, making them sensitive to very small O(10−47cm2) cross-sections. We expect that the XENON1T succes-sors XENONnT and DARWIN will reach spin-independent WIMP-nucleon cross-sections of O(10−48cm2_{) and O(10}−49_cm2_{), respectively. Thus, xenon detectors have a bright future and}

will remain prominent in the field of dark matter direct detection experiments.

We discuss the fundamentals of XENON1T in this chapter. We present the design and the detection principle of the XENON1T TPC in Section 3.1. Then, we explain the expected signals and backgrounds in Section 3.2. Thereafter, we elaborate in Section 3.3 on how we simulate the optical detector response to interactions. Finally, we discuss various position reconstruction methods in Section 3.4.

3.1 The XENON1T detector

We present the design and detection principle of XENON1T in this section. For an extensive review of the detector and its subsystems, we refer to [42].

The XENON1T detector is located at Laboratori Nazionali del Gran Sasso (LNGS), in the Gran Sasso massif in Italy. The lab is situated in the center of the mountain below 1300 m of solid rock, which provides an equivalent shielding of 3600 m of water [42]. This shields the experiments

(18)

at LNGS from cosmic radiation and reduces the observed background in the detectors.

A water tank filled with 700 tonnes of deionized water surrounds the detector. This wa-ter shield acts as a muon-veto, because muons, or particle showers induced by muons, yield Cherenkov radiation that we detect when they pass through the water at speeds larger than the speed of light in water. Thus, we can veto events that happen coincidentally with a muon detection.

Next to the water tank, there is a three-story building hosting various systems, e.g., the xenon purification, xenon storage, data acquisition, and detector monitoring systems [42]. The xenon purification system removes impurities from the LXe and thus increases the lifetime of electrons. This is essential, because the limited electron lifetime in xenon affects the signals we observe considerably.

The water tank accommodates the cryostat. The cryostat has two walls and insulates the TPC, keeping the thermal losses as low as possible and maintaining an LXe temperature of −96◦C. The inner vessel contains the xenon and the TPC. The cylindrical TPC contains 2.0 tonnes of liquid xenon and has a radius of 47.9 cm and height of 96.9 cm. Teflon (PTFE) panels enclose the TPC to maximize the light collection efficiency.

We apply a drift field between the gate and cathode of 82 V/cm by biasing the cathode. We establish another considerably larger drift field of 8.1 kV/cm between the anode and the gate. Figure 3.1.1 shows the arrangement of the electrodes. We define the 3D coordinate system of the TPC with z = 0.0 cm corresponding to the position of the gate electrode (the top of the TPC) and x, y = 0.0 cm at the radial center of the TPC. The cathode is at z = −96.9 cm. The gas-liquid interface is at z = +2.5 mm and is regulated by a stainless steel ‘diving bell’.

There are many advantages of using a noble gas as the target material [43]. Noble gasses have a full outer electron shell and they are good scintillators in the UV. Liquid xenon and argon have relatively high boiling points and are also easy to ionize, making them ideal candidates for dual-phase TPCs. We select xenon because it does not have any long-lived radioactive isotopes that cause significant backgrounds, opposed to argon. Additionally, the WIMP interaction rate scales with the atomic number squared A2 and is larger for low-energy recoils when we use xenon, as was discussed in Section 2.3.1. Furthermore, the xenon isotope mixture has isotopes with odd atom numbers. These unpaired spin components make us sensitive to spin-dependent dark matter nucleon interactions. LXe also has a high density of 2.9 g/cm3, making it difficult for γ-rays and other backgrounds from outside of the detector to penetrate. The inner part of the TPC has a very low event rate due to this ‘self-shielding’ property of xenon. On the other hand, xenon is completely transparent to its scintillation wavelength of 178 nm as the emission energy is smaller than the absorption energy.

3.1.1 TPC detection principle

The detection principle of the XENON1T detector is based on two different types of scintillation signals: primary scintillation (S1) and secondary scintillation due to ionization (S2). We discuss

(19)

the microphysics of the scintillation and ionization signals in Section 3.2. Figure 3.1.1 presents a schematic overview of a particle interacting with the xenon in the TPC. When a particle interacts in the detector, it deposits some energy which causes ionization and excitation of the target. These excited states cause the prompt scintillation signal S1 and we detect the photons by the photomultiplier tubes (PMTs) below and above the xenon. The electric field of 82 V/cm causes the freed electrons to drift upwards with a drift velocity of vd= 1.335 mm/µs to the gate

[44]. We apply a stronger electric field of 8.1 kV/cm between the gate and anode, to extract the electrons from the liquid xenon and accelerate them into the gaseous xenon. This gives rise to the secondary scintillation signal S2, which we predominantly observe with the top PMT array. We refer to the S1-area and S2-area (or sometimes simply S1 and S2) to indicate the size of the S1 and S2 signals. The area is in terms of the number of observed photoelectrons (PE), which we discuss in more detail in Section 3.1.2. The relative size of the S1 and S2 conveys a lot of valuable information about the interactions that happen in the TPC. S2s are typically much larger than S1s because the accelerated electrons that hit the gaseous xenon generate O(100) photons each. There is a difference in S1 and S2-shapes and sizes for various interaction types and the fraction S2/S1 can identify the type of interaction. The S2/S1 fraction is large for electronic recoils because the number of ionization electrons is relatively large. Nuclear interactions have smaller S2/S1 fractions, just as potential WIMP interactions. WIMPs are very unlikely to interact and therefore we assume that they only interact once in the TPC. In this manner, we differentiate between potential WIMP signals and nuclear backgrounds by identifying events with multiple S2s as background.

Figure 3.1.1: Schematic visualization of the XENON1T dual-phase time projection chamber (left). A particle scatters in the detector and we observe the S1 and S2 signals. The signatures (right) of various interaction types are different and we use this to discriminate between WIMP dark matter and backgrounds. Figure from [42].

(20)

In addition to the large discriminative power of the S2/S1 ratio, we also infer other informa-tion from these signals. In dark matter search it is essential to know the interacinforma-tions posiinforma-tions. Due to position reconstruction we can define an LXe volume which is best shielded from exter-nal backgrounds to use for the dark matter aexter-nalysis. This is the so-called fiducial volume and this is our strongest background discriminator. We infer the position in the horizontal plane from the S2. The light distribution on the upper PMT array is the input to various position reconstruction algorithms that calculate the (x, y)-position. We calculate the event interaction depth, or the z-coordinate of the interaction, using the uniform drift velocity and the difference in time between the S1 and S2 signals. The drift time is typically in the range of microseconds to milliseconds, for shallow and deep events, respectively.

3.1.2 Eyes in the dark: photomultiplier tubes

For signal read-out we convert S1 and S2 photons into electrons. We do this with photomultiplier tubes (PMTs) which are placed at the top and bottom of the TPC.

Figure 3.1.2: Schematic visualization of a photomultiplier tube. When an incident photon hits the photocathode, it emits a photoelectron. The repeated arrangement of the dynodes increases the electron current and this current is collected at the anode. Figure from [45].

Figure 3.1.2 presents the working principle of PMTs schematically. When an incoming photon hits the photocathode, it produces a primary electron as a consequence of the photo-electric effect. The primary electron passes through the focusing electrode and is accelerated by an electric field in direction of the first dynode. When a primary electron hits the surface of a dynode, it emits secondary electrons. The potential difference inside the PMT accelerates the electrons from dynode to dynode, resulting in a cumulative increase of secondary electrons. This electron current reaches the anode and the signal is converted to ADC counts. For an extensive review of the data acquisition system that follows, we refer to [46]. Evidently, the electron current is proportional to the number of incident photons. However, not all photons cause photoelectrons. The quantum efficiency (QE) is a measure of the sensitivity of PMTs and is the probability that a photon will create a photoelectron. The PMT gain is the amount of current amplification in a particular PMT.

(21)

The XENON1T detector uses Hamamatsu R11410-21 PMTs which were designed together with the XENON collaboration to minimize the radioactivity and maximize the sensitivity for wavelengths of 178 nm [47]. The PMTs have a diameter of 76.2 mm and cover the top and bottom array of the TPC. We place 121 PMTs in such a way to maximize the light collection efficiency on the bottom array and we arrange the remaining 127 PMTs in a radial pattern at the top to increase position reconstruction performance. For the Hamamatsu R11410-21 PMTs, the mean QE is 0.325 at 175 nm.

3.2 Signals in XENON1T

We have not yet observed a dark matter signal with XENON1T. Nevertheless, we do observe many signals in our detector. We explain the scintillation and ionization principle of particle interactions in the detector in this section. Furthermore, we discuss the backgrounds we observe and how we calibrate the detector.

3.2.1 Scintillation and ionization mechanism

A particle interaction in the detector leads to energy transfer: the incoming particle deposits some energy on the target nucleus or electrons. The recoil energy dissipates into atomic motion, excitation, or the ionization of xenon atoms. We use the different signals from these dissipation modes to identify the nature of the recoil. LXe direct detection experiments use both the scintillation and ionization signals to assess the interaction type. WIMPs have very small cross-sections, and therefore we expect them to only interact with the nucleus. Opposed to these nuclear recoils (NRs), βs and γ-rays induce electronic recoils (ERs). Both types of particle interactions leave a track of excitation and ionization in the detector.

Figure 3.2.1 presents a schematic view of the production of scintillation and ionization signals after an energy deposition on the target. First, the excited and neutral xenon atoms form diatomic molecules called excimers. The excimers Xe∗₂ emit 178 nm scintillation light when the diatomic molecule decays to neutral xenon atoms at ground state. This scintillation is the excitation scintillation.

The freed electrons typically produce the S2 signal, the charge signal, when they are accel-erated into the GXe. The ion drift velocity is only O(mm/s) in liquid xenon, while the electron drift velocity is six orders of magnitude larger [44]. Thus, the electric field only transports elec-trons upwards and the ions stay near the interaction site. However, some elecelec-trons do not reach the liquid-gas interface, but they recombine with the ionized xenon atoms. The ionized xenon atom pairs with a neutral xenon atom and together they form a charged diatomic molecule. This diatomic molecule then recombines with an electron, resulting in a twofold excited state Xe∗∗ and a neutral xenon atom. The excited state Xe∗∗ then decays into a onefold excited Xe∗-atom and produces scintillation light following the excitation scintillation principle. This recombina-tion scintillarecombina-tion contributes to the S1. We accelerate the electrons that reach the liquid-gas

(22)

Figure 3.2.1: The production mechanism of the primary scintillation signal S1 through excitation and recombina-tion. Charged diatomic molecules Xe+₂ recombine with electrons and contribute to the S1 by de-excitation. The secondary ionization signal S2 is the result of freed electrons that drift to the top of the TPC. Figure from [48].

interface by the electric field between the gate and anode. The potential ejects the electrons into the gas phase and the electrons excite gaseous xenon atoms. One electron typically yields 21-22 detected photo-electrons in the top PMT array [44, 49].

The path of energy depositions depends on the recoiling particle. Electrons tend to leave sparser tracks: the energy depositions are more spread. NRs typically have denser ionization tracks, therefore the recombination probability is larger and less electrons escape the interaction site. This leads to a smaller S2 signal for NRs. The S1 shape also depends on the track of energy depositions [50, 51], but this effect is less pronounced. Accordingly, we use the S2/S1 ratio to differentiate between ER and NR.

3.2.2 ER backgrounds

The S2/S1 ratio is not always a good discriminator between ERs and NRs. We expect WIMPs that interact with the target to induce NRs with energies mainly < 50 keV. Low-energy ERs (< 12 keV) can have S2/S1 ratios comparable to NRs (< 50 keV), because of the different light yields of ERs and NRs with the xenon. This results in a leakage of ERs into the NR event population. We discuss the most dominant backgrounds in this section.

We differentiate between three sources of ER backgrounds: the LXe, the detector materials, and the Sun. Additionally, we briefly discuss > 12 keV ERs near the edge of the detector that also mimic NRs. For a complete review of all expected backgrounds in XENON1T, we refer to [52].

Radioactive isotopes in the LXe While we constantly purify the xenon, it is still contam-inated by radioactive isotopes. The LXe has three intrinsic background sources that together account for 91% of all backgrounds in the [1, 12] keV energy range.

(23)

contribution comes from β-decays of 222Rn daughters. The β-decay from 214Pb to214Bi (Q = 1.02 MeV) is particularly critical, because the β-spectrum is flat at low energies [53]. The other β-decay in the chain is easily detectable because of an accompanying α-decay.

The second largest intrinsic background source is the β-emitter 85Kr (Q = 687 keV) [54]. Krypton is naturally in the produced xenon with a concentration of O(ppm) and by distillation we are able to decrease the abundance to O(0.1 ppt). 85Kr is in the krypton isotope mixture due to human nuclear activities with an abundance of O(10−11). The contribution of 85Kr to the total ER background is 4.3%.

Lastly, 136Xe is naturally in the xenon isotope mixture with an abundance of 8.9%. The two-neutrino double-β (2νββ) decay from 136Xe (Q = 2.46 MeV) has a small contribution of 1.4% to the background rate at low energies [55].

Contaminated detector materials Radioactive containments in detector materials account for 4% of ER backgrounds, e.g., the shells and flanges of the cryostat, the PMTs, and also the PTFE. Contributions to the background come from the 228_Th, 232_Th, 235_U, 238_{U, and} 226_Ra

decay-chains, and from40K, 60Co, and 137Cs.

Scattering of solar neutrinos Neutrinos from the Sun (mostly pp and7Be) can penetrate the detector an interact with electrons from the xenon. In the [1, 12] keV energy range solar neutrinos contribute 5% to the background. We can only reduce this background contribution through more effective ER rejection.

Charge collection at the wall Events close to the edge of the TPC can loose several electrons to the PTFE panels enclosing the xenon. This reduces the number of surviving electrons, and thereby part of the S2 signal is not observed. The signature of wall-events is similar to NRs, and therefore they are a relevant background for dark matter search and identifying them is important.

3.2.3 NR backgrounds

NR signals are very similar to the expected WIMP signal in the detector. Therefore, NR backgrounds are important and need to be identified and modeled. We estimate the background contribution with simulations and we consider the energy range [4, 50] keV (corresponding to [1, 12] keV for ERs).

Similar to ER backgrounds, NR backgrounds come also from contaminated detector mate-rials that contain traces of the238U, 235U,226Ra, 232Th, and 228Th chains. Neutrons originate from spontaneous fission with energies of O(MeV). The background rate for a 1 tonne fiducial volume is 0.5 ± 0.1 events per year. This is the largest NR background contribution.

Solar 8B neutrinos are the dominant coherent neutrino-nucleus scattering contribution to the NR background. However, the event rate becomes only important for a low-energy threshold

(24)

< 4 keV. Other NR backgrounds are muon-induced neutrons (< 0.01 events per year) which are tagged with the active muon-veto, i.e. the PMTs the water-tank. This contribution is so low that we can safely neglect it.

3.2.4 Detector calibration

Detector calibration is import to understand the signals that we observe in the detector. Noble gasses are particularly useful for calibration because they will be uniformly distributed in the xenon. We inject 220Rn and 83mKr during calibration campaigns. Both do no have any long-lived radioactive decay products that induce a background. Therefore, we can quickly resume data-acquisition after a calibration campaign.

The212Pb β-continuum (Q = 574 keV) in the 220Rn decay-chain is flat at low energies and therefore it is suitable to calibrate the low ER response of the detector [56]. Furthermore, we can use it to define data-cuts that are later used in the dark matter analysis.

We use 83mKr calibration data to determine e.g., the electron lifetime, the uniformity of reconstructed events, the field distortion correction, and the charge and light yields. The

83m_{Kr double-decay emits two internal conversion electrons with 32.1 keV (1.83 h) and 9.4 keV}

(154 ns). We can easily select 83mKr due to the time-coincidence of the peaks. The S1s are generally resolved and the S2s of the double-decay are often merged. Therefore we observe it as a single S2 corresponding to 41.5 keV.

When it comes to calibration of the NR response of XENON1T, we place a deuterium-deuterium neutron generator next to the cryostat. Alternatively, we use an241AmBe source to model the expected NR signals.

3.3 Simulation

We employ simulations to determine the expected background and projected WIMP sensitivity for the XENON detectors by modeling all the components in the detectors. For details regarding this class of simulations, we refer to [37, 52]. Here, we focus on optical simulations, i.e., modeling the optical response of the detector.

To understand the optical signals in the detector, we mostly rely on simulations and cal-ibration data. For position recognition specifically, we depend on optical simulations because that is often the only way to know the true position of interactions. If there is no truth posi-tion, it is not possible to accurately train an algorithm that reconstructs the event’s position. Hence, optical simulations play an important role in the development of position reconstruction algorithms. In the following sections, we discuss two methods for simulating the response of the PMTs.

(25)

3.3.1 Optical Monte Carlo simulation

GEANT4 (GEometry ANd Tracking 4) [57] is developed by CERN as a tool to simulate particles moving through and interacting with a medium. In high energy particle physics, we specifically use it to track particles in a simulated detector.

For the XENON experiments, we use GEANT4 to perform an optical Monte Carlo (MC) simulation to track photons from the moment that they are created until they hit a PMT. We implemented the XENON1T detector geometry in GEANT4 to do this accordingly. All simulated photons have an energy of 7 eV and we generate them in a thin GXe layer near the anode. This corresponds to the xenon scintillation wavelength of 178 nm. The track of the photon depends on many parameters, such as the PTFE reflectivity in the gas phase and liquid phase, the LXe refraction index, the attenuation length for LXe and GXe, and the transparency of the anode electrodes and the screening mesh that protects the PMTs from the high voltage. The values of these parameters are only estimates. Consequently, the simulations are only valid up to some precision.

The photons are generated in the gas gap just below the anode at z = +4 mm. We simulate photons that originate from all possible positions on a grid in the horizontal plane to get a light collection efficiency map (LCE-map) per PMT, which is a distribution of the probability of observing a photon from a certain location per PMT. So, for an interaction at a certain position, we can determine the S2 hit-pattern on the top PMT array. This hit-pattern is simply the probability map of all 127 PMTs with for each PMT the normalized probability that the PMT was hit. We use these hit-patterns to train an algorithm for position reconstruction, because the true position of the interaction is known. The precision is limited to the grid-spacing on which the hit-patterns are generated.

Alternatively, we generate similar LCE-maps from 83mKr calibration data. The krypton is uniformly distributed in the xenon and therefore we expect events to generate photons on all possible positions. We use these events to determine the LCE-map per PMT without utilizing a simulation. We only use data from the top 10 cm of the TPC to reduce potential effects of electric field distortion. We can use both types of LCE-maps as the input for the waveform simulator that we describe in the following section.

3.3.2 WFSim: Waveform simulator for LXe TPCs

During the XENONnT detector design and construction phase, we developed FAX (a FAke Xenon experiment) and WFSim1 _{(WaveForm SIMulator) to simulate particle interactions in LXe TPCs.}

The working principle of FAX and WFSim is similar, but WFSim was specifically designed for XENON1T and XENONnT and is considerably faster than FAX. Therefore, we use WFSim for the simulations in this thesis.

The main analysis software was also updated from Processor for Analyzing Xenon (PAX) to

(26)

straxen2. The output of the simulation has the same data-structure as the real DAQ Reader. Therefore, we can use the same analysis software to do analyses on simulated and detector data. Hence, we can have important analyses ready before the new detector data are available.

In the most general case, WFSim works as follows.

1. We define the simulation instructions in WFSim or pass a GEANT4 root file from which the instructions can be inferred. The instructions are e.g., the position, energy, type, number of events, and time spacing between the simulated events. See Appendix A for an example of the simulation instruction file used in this research.

2. We use Noble Element Simulation Technique (NEST) [58] to get the electron and photon yields of an event based on the microphysics of the interaction type. These electron and photon yields differ for various interactions, e.g., γ-rays, βs and83mKr events. At the time of writing it is only possible to simulate electronic recoils using WFSim.

3. We then determine the times at which the photons are generated and observed. For S1, the recombination time and the microphysics of light production are taken into account [50, 51]. For S2, we drift the electrons upwards and we model the longitudinal diffusion based on the depth of the event and a diffusion model. Additionally, some electrons will not reach the interface due to the limited electron lifetime. We then calculate the number of photons that is created for each electron that does survive, utilizing a luminescence model. We save the photon arrival times and the PMT channels that have been hit. 4. Using either the LCE-maps from the optical MC simulation or the data-driven LCE-maps

deduced from krypton calibration data, we produce a PMT hit-pattern and determine the corresponding current in the PMTs.

5. We then create ADC waveforms with the (fake) digitizer response to the PMTs. These form the raw records of the simulation and are in the same data-structure as raw records from the real detector.

We use straxen to do further analysis of the raw records, such as peak clustering, position reconstruction, and the creation of many other event variables.

One of the main problems with the simulation is the fact that it does not match the real detector in some aspects. For example, we are not able to model the observed electric field distortion in the detector [59]. This is one of the largest problems in position recognition, as deep events are drifted to inner radii in the real detector due to electric field distortion. We discuss this in more detail in Section 3.4. Furthermore, WFSim does not simulate PMT after-pulses induced by primary S2s.

(27)

3.4 Position reconstruction

We explain the impact of the event position on detector signals in this section. Furthermore, we discuss the general approach to position reconstruction in XENON1T.

Determining the particle interaction position is of great importance for dark matter ex-periments to mitigate backgrounds. Decreasing the uncertainty on the position reconstruction could increase the fiducial volume for the analysis. We also use position reconstruction to make corrections to the data, based on the position. We correct the S1 and S2 signals to account for position dependent signal losses due to, e.g., electrode sagging and LXe impurities. Electrode sagging results in a better electron extraction efficiency at low radii and the impurities in the xenon reduce the number of electrons that reach the surface. Furthermore, we correct for the S1 light collection efficiency which is position dependent due to photo-absorption by surfaces and LXe impurities. The corrected and thus independent signals are referred to as cS1 and cS2.

We determine the interaction depth z separately, using the drift time of the event: the difference in the detection time between the S1 and S2 signals. The drift velocity of electrons in LXe with an electric field of 82 V/cm is vd = 1.335 mm/µs [44]. The uncertainty on the

z-position is only O(0.1 cm) because the ADC sampling rate of the detector is 10 ns. Thus, the drift time is a good measurement to determine the depth of the interaction. In case the S1 signal is missing, i.e., at very low energies, alternative methods need to be used to estimate the interaction depth. These methods will be discussed in Chapter 6. The top and bottom PMT arrays of the TPC collect different light signals. For the prompt scintillation signal S1 we mainly use the bottom array for light collection. S2 scintillation signals are unquestionably brighter on the top PMT array. The pattern is much more localized than the light yield on the bottom PMT array and therefore it serves as the main measure for position reconstruction in the horizontal (x, y) or (R, φ) plane. A high-energy event can lead to saturated PMT channels in the top PMT array, and therefore we use the bottom PMT array for S2 energy reconstruction.

We observe a non-uniform distribution of reconstructed positions in83mKr calibration data [59]. While the krypton is uniformly distributed in the xenon, we observe more events at lower radii. This effect increases with depth. The freed electrons are shifted inwards due to the distorted electric field. Additionally, electrons accumulate on the PTFE surface of the detector, which also push the electron cloud inwards. We correct for the field distortion in such a manner that the events are uniformly distributed in R2 for 2.5 cm bins in z and 2◦ bins in φ. In this thesis we refer to uncorrected position unless states otherwise.

We use two algorithms to achieve accurate position reconstruction in XENON1T. During and after the three science runs of XENON1T, we put a lot of effort in evaluating and improving these algorithms. The algorithms described below were implemented while XENON1T was still running.

We trained a simple neural network architecture on hit-patterns from the optical MC simulation. We used the Fast Artificial Neural Network Library (FANN) [60] to train the network

(28)

Algorithm TPF seed fraction Description

MaxPMT 0.022 The reconstructed position of the event is the coordinate of the PMT with the maximum area (PE).

WeightedSum 0.001 The reconstructed position of the event is the weighted sum of the positions of the contributing PMTs. Each PMT position is weighted by the fractional area (PE). This method has a large inward bias for S2s at outer radii, because there are always more PMTs in the center of the TPC.

RobustWeightedSum 0.238 The main difference between WeightedSum and RobustWeight-edSum is that the latter applies the WeightRobustWeight-edSum algorithm it-eratively on a small subset of the PMTs in the neighborhood of the center of the hit-pattern. Thereby, we neglect large S2 signals that happen in PMTs far from the hit-pattern. Additionally, we reduce the inward bias from the WeightedSum algorithm. NeuralNet 0.739 The NeuralNet algorithm is the basic FANN neural network with

the area per PMT as input the (x, y)-position as output.

Table 3.4.1: Four position reconstruction algorithms are used to find a seed position for the Top Pattern Fit algorithm. The seed fraction denotes the relative number of times that each algorithm is used to set the seed. The NeuralNet is most often used.

and implement it in the XENON1T raw data processor PAX. While the FANN neural network architecture was implemented when XENON1T was still running, we updated the architecture in the new analysis software straxen. This neural network was implemented using the TensorFlow framework [61]. The performance of the new neural network is better than the network that was implemented with FANN. However, the main SR1 analysis was already finalized using the FANN positions [28, 59]. The TensorFlow neural network is now the default position reconstruction neural network in straxen. We discuss neural networks in-depth in Chapter 4 and we explain the architecture of the neural network in straxen in Chapter 5.

Additionally, we use the Top Pattern Fit (TPF) algorithm to find the optical MC hit-pattern that corresponds best to the observed light hit-pattern on the top PMT array. It does so by calculating the likelihood of the hit-patterns being identical. We give the algorithm a seed position to minimize the search space and computational time. Table 3.4.1 presents which algorithms we use to determine the seed positions. We determine the likelihood of each of the seed positions and use the most likely seed as the input to the TPF algorithm. The fractional number of times that we use each algorithm to find the seed is noted in the table. So, TPF uses the NeuralNet and RobustWeightedSum predominantly as seed positions.

Thus, we reconstruct the event position by comparing the observed and expected hit-patterns. The latter are only defined for certain values on a grid and we use interpolation to accommodate this limitation. TPF was the primary position reconstruction algorithm in the first science run (SR0) of XENON1T.

(29)

4. Machine learning and deep neural networks

The success of machine learning (ML) in modern life is indisputable. Algorithms greatly impact ordinary life and practically every industry one can think of, e.g., finance, health-care, logistics, assembly, and the entire domain of sciences. Developments in the field of image recognition were one of the first signs of the broad capability of ML algorithms. These days, artificial intelligence continues to be a prosperous field of research, because of the infinite amount of applications for ML algorithms and its proven success. The application of machine learning tools in fundamental research gains progressively more attention. In physics, for example, the domains of statistical physics, cosmology, particle physics, quantum computing, and many-body quantum matter, all apply various ML methods [62].

Where ordinary computer tasks rely on fixed rules defined by the developer, machine learn-ing is based on obtainlearn-ing knowledge from examples. Machine learnlearn-ing is defined as the ability for a machine to improve performance at a certain task from experience. By examining the data, often referred to as training examples, ML algorithms are able to find patterns and make decisions based on features of the data. Therefore, we do not need to define features beforehand. Likewise, ML algorithms are able to discover features that researchers would not have thought of.

The advances in deep learning mainly contributed to the fame of machine learning. Deep learning is a subgroup of ML algorithms that share a similar structure of the algorithm. These algorithms essentially apply multiple consecutive multivariate linear regressions. This results in a hierarchal layered structure, where the input to each layer is the output of the previous layer. The working principle of these deep neural networks (DNNs) is based on how the human brain processes information. It is even possible to approximate human performance, because each layer has the potential to derive unique features at different levels of granularity [63, 64, 65]. These achievements enhance the adoption of deep learning algorithms in fundamental science as well. In experimental physics, deep learning is used for tasks like track or position reconstruction in EXO-200 [66], XENON1T [42], and some LHC experiments [67, 68]. The ATLAS and CMS LHC experiments use types of DNNs for particle identification or identifying new physics [69].

In this chapter, we first introduce the fundamentals and taxonomy of machine learning in Section 4.1. We discuss deep neural networks in more detail in the course of Section 4.2. Then, we present convolutional neural networks, a specialized kind of DNN for structured data, in Section 4.3.

(30)

4.1 Machine learning basics

We discuss the basic machine learning principles in this section. Additionally, we explain how we can assess whether machine learning algorithms are training appropriately.

An algorithm is learning when it successfully finds patterns in the data and thereby in-creases the performance on a certain task. We can evaluate this task using a loss function, which characterizes the performance of the algorithm and depends on the type of problem. Training is a method to minimize the loss, and therefore maximize the performance on the task. Despite the very broad range of machine learning applications, we can differentiate between two types of learning.

Unsupervised learning is learning from data without any knowledge about the true label of the data. We aim to find structures or patterns in the data. Unsupervised learning is perhaps best exemplified by clustering algorithms, where the goal is to find k clusters in the data based on the similarity between data-points.

In supervised learning on the other hand, each datapoint or input has a label and we aim to find a mapping from the input to the label based on all examples in the data. Here, the algorithm represents the function f that maps the d-dimensional input x to the label k. The label can be a numerical value such as a detector coordinate (regression) or a class or category (classification), e.g., a type of particle interaction in the detector. We focus on supervised learning algorithms for regression in this thesis.

For later use, it is helpful to define some notation.

• xi ∈ Rd with i = 1, ..., M is the d-dimensional input from the ith data-point of the set of

M data-points.

• yi ∈ Rkwith i = 1, ..., M is the k-dimensional label from the ith data-point of the set of M

data-points. For classification tasks this label represents a category. For regression tasks this label represents any k-dimensional number.

• The pairs {xi, yi} with i = 1, ..., M together form the complete data-set of size M . We use

the data-set to train the algorithm. We often refer to this as the training examples or the training set.

• The parameters of a function, or the weights w, define a function fw : xi → yi. For

elementary algorithms such as polynomial or linear regression, these weights correspond to the regression coefficients. For neural networks, these weights define the mapping between the various layers.

• The loss L = L[fw(xi), yi] evaluated on a single example quantifies the performance of the

algorithm on one training example. fw(xi) denotes the reconstructed label or continuous

variable of the input xi. The most typical loss functions for regression problems are the

(31)

categorical cross-entropy loss. The loss is used during the training process to tune the weights.

• The main goal is to minimize the loss L over all examples {fw(xi), yi} in the training set.

There exist many definitions in literature to indicate the loss of all examples. The cost C, the objective function J , or the empirical risk R of the algorithm are all terms to indicate the same concept. Additionally, the generalization error E is sometimes used to indicate the cost of new examples that were not present in the set used for training. For the sake of clarity, we define C =P

iL[fw(xi), yi]/M as the cost of the algorithm for regression tasks.

• Training an algorithm is determining the optimal set of weights w that minimizes the cost C of the algorithm.

• Each algorithms has some tunable hyper-parameters that affect the training process and performance of the algorithm. We will discuss neural network specific hyper-parameters in the course of this chapter.

The goal of training a machine learning algorithm is not only to learn from training data, but also to generalize well on new data that the algorithm has not seen before. In the context of position reconstruction, we train an algorithm and we aim to reconstruct the position of new unseen events. The performance of the algorithm is only good if the algorithm is able to reconstruct the position of events that were not used in the training process.

To achieve this, we need three complementary and identically distributed training, valida-tion, and test data-sets. The split fractions differ dependent on the availability of the data, but a typical split fraction is 80%, 10%, and 10% for the training, validation, and test set, respec-tively. Evidently, we use the training data-set to train the algorithm. We use the validation set to measure the performance of the algorithm during training. We never touch the test set during training and we only use the test set to evaluate the performance of the final algorithm. An algorithm should have enough complexity to model the training data. Additionally, the model should perform well on new unseen inputs in the test set, i.e., the algorithm should have a low test cost Ctest. We determine the cost Cvalidation with the validation set while the

algorithm is training as an indication for the expected Ctest.

If the model is too complex or has too much degrees of freedom, it has the ability to fit the training data perfectly. However, the model will fail to generalize well on new inputs. Therefore, the difference between Cvalidation and Ctrain (sometimes the generalization gap) is large and the

model is over-fitting. On the other hand, a model could be too simple and not be able to fit the training set: both Cvalidation and Ctrain are large in this case and the model is under-fitting.

The complexity or capacity of the algorithm regulates this trade-off between over-fitting and under-fitting, as depicted in Figure 4.1.1. The capacity refers to the number of parameters or weights in the algorithm. A polynomial model with a lot of parameters has a large capacity opposed to a simple linear model. We prefer a capacity of the algorithm where the gap between the Cvalidation and Ctrain is small. The final model should be evaluated on the test set.

(32)

Figure 4.1.1: The capacity of an algorithm regulates the trade-off between over-fitting and under-fitting. We aim to find the right complexity of the algorithm where the generalization gap, the difference between the training cost and validation cost, is small. The red line denotes the point where the capacity is optimal. Here, the generalization error denotes the validation cost, and the training error the training cost. Figure from [70].

Each kind of algorithm has various hyper-parameters that have an effect on the perfor-mance of the algorithm. Evaluating the perforperfor-mance during training with the validation set also provides an opportunity to tune the hyper-parameters of the algorithm in such a way that the cost is converging.

The previous concepts apply to machine learning and algorithms in general. The primary focus in this thesis lies on a specific kind of algorithms: neural networks.

4.2 Feedforward neural networks

We introduce feedforward neural networks in this section. We explain the mathematical repre-sentation of such neural networks, and discuss how they learn. Furthermore, we discuss how to prevent neural networks from over-fitting.

Feedforward neural networks are the most basic form of artificial neural networks. Their architecture is based on nervous systems such as the human brain. The mathematical analogies for the dendrites and synapses are the weights w and activation functions g. A neuron is a cell in the artificial neural network that takes an input. We apply a non-linear activation function g to the neuron and the neuron fires only if the threshold of the activation function is reached. Then, the signal is propagated to the next neuron. Stacking multiple layers of neurons, mimics the hierarchical structure of the nervous system. Feedforward neural networks are an extremely simplified model of the human brain. Due to the hierarchical structure of the neurons and the non-linear activation functions, it is possible to find highly non-linear features in the data. Feedforward neural networks with a single hidden layer can even approximate any given function [71, 72].

Deep Neural Networks for Position Reconstruction in XENON1T

MSc Physics and Astronomy

Gravitation, Astro-, and Particle Physics

Master Thesis

Deep Neural Networks for Position Reconstruction in

XENON1T

by

Lucas de Vries

10650881

July 2020

60 ECTS

September 2019 - July 2020

Contents

1.

Introduction

2.

Dark matter

2.1

Observational evidence for dark matter

2.2

Dark matter candidates

2.3

Dark matter detection strategies

2.4

The future of dark matter search

3.

Enlightening the dark: XENON1T fundamentals

3.1

The XENON1T detector

3.2

Signals in XENON1T

3.3

Simulation

3.4

Position reconstruction

4.

Machine learning and deep neural networks

4.1

Machine learning basics

4.2

Feedforward neural networks