Optimisation of XAMS data processing software Amstrax

(1)

Optimisation of XAMS data processing

software Amstrax

Report Bachelor Project Physics and Astronomy

Conducted between 09-11-2020 and 22-02-2021

Maurice Geijsen

11328126

Name Maurice Geijsen

Student number 11328126

Institute Nikhef

University University of Amsterdam & Vrije Universiteit Amsterdam

Examiner dr. Ivo van Vulpen

Supervisor prof. dr. Auke Pieter Colijn

Daily supervisor Peter Gaemers

(2)

Abstract

Over the last few decades, the search for dark matter (DM) has become a major discussion within physics. One of the proposed DM candidates are Weakly Interacting Massive Particles (WIMPs), which could theoretically be detected using dual-phase Time Projection Chambers (TPCs). XENONnT is a xenon-based TPC with an active volume of 5.9 t liquid xenon that is set to start running this year (2021). In order to do research and development on this, a small version has been build in Amsterdam named XAMS. To process the data coming from XAMS, a software package called Amstrax has been developed. Amstrax is specific to the XAMS set-up and uses plugins to process the data step-by-step. This thesis describes the workings and calculations of these plugins that I have helped setting up.

Populair wetenschappelijke samenvatting

Al sinds het begin van de vorige eeuw is men het er over eens dat de hoeveel-heid zichtbare materie niet genoeg is om de bewegingen van hemellichamen in het heelal goed te kunnen omschrijven. Hoewel we de zwaartekracht van deze onbekende materie kunnen meten, heeft het geen interactie met licht. Deze onbekende materie heeft daarom de naam ’donkere materie’ gekregen. Een voorgestelde kandidaat voor wat donkere materie kan zijn, zijn Weakly Interacting Massive Particles (WIMPs). WIMPs hebben alleen interactie via de zwaartekracht en zwakke kernkracht, maar niet met licht. Dit maakt hun detectie lastig. Gelukkig zouden interacties via de zwakke kernkracht in theorie mogelijk zijn te detecteren met de hulp van dual-phase Time Pro-jection Chambers (TPCs). Hiervoor is een groot-schalige TPC gebouwd in de bergen van Noord-Itali¨e, genaamd XENONnT. Om deze detector te on-derzoeken en te ontwikkelen, is een kleine versie gemaakt in Amsterdam, genaamd XAMS. Om iets zinnigs te halen uit de signalen die XAMS afgeeft, moeten de signalen verwerkt worden. Dit doen we door de data stapsgewijs te verwerken met de software die specifiek is afgesteld op XAMS, genaamd Amstrax. In deze scriptie zal ik de werkingen en berekeningen uitleggen die gebruikt worden in Amstrax en waaraan ik de afgelopen maanden gewerkt heb.

(3)

1 Introduction

Over the past few decades, dark matter (DM) has become a hot topic in physics. While we have repeatedly measured its gravitational effects [1], it has no interaction with electromagnetic radiation, making detection chal-lenging. One of the hypothesized candidates for DM are Weakly Interacting Massive Particles (WIMPs). These particles interact only through gravity and the weak force, and are an attractive candidate for several reasons. The WIMPs’ thermal production can be explained using known mechanisms and they are predicted to have the right properties to fit in the generally accepted Λ Cold Dark Matter (ΛCDM) model. Since WIMPs are able to interact with the weak force, it is theoretically possible to detect them through direct de-tection [7]. This can be done using dual-phase Time Projection Chambers (TPCs). These detectors are filled with a purified liquid noble element, such as xenon, and a gaseous layer on top. When a particle interacts with the target material in the TPC, scintillation light will be produced and detected using photo-multiplier tubes (PMTs) at the top and bottom of the TPC. This is known as the S1. During the initial interaction, the target material is ionized and the liberated electrons drift upwards in the TPC due to an induced electical field. When they eventually reach the gaseous layer, they produce more scintillation light, which is detected by the PMTs and is known as S2. The time it takes for the liberated electrons to reach the gaseous layer is known as the drift time and is directly proportional to the depth of the interaction in the TPC. Based on the properties of the S1 and S2, the type and source of the interaction can be determined.

XENONnT is a large scale xenon-based dual-phase TPC with an active volume of 5.9 t liquid xenon and is set to start running this year (2021). In order to do research and development on this, a small scale version has been build in Amsterdam, named Xenon AMSterdam (XAMS). XAMS uses an array of silicon photo-multiplier tubes (SiPMs) at the top of the detec-tor. The data processing used to be done with the Processor for Analysing XENON (PAX) [13], but with the introduction of Strax [14], data processing software specific to the XAMS set-up had to be implemented. This software is called Amstrax [16] and in this thesis I will be explaining the workings and calculations used by the plugins implemented in Amstrax that I have set up. During this set-up, I have used a data-set from June 2019 that used the radioactive 22Na source to induce events. Unfortunately, one of the SiPMs was broken during data collection. Its signals have been ignored throughout the data processing.

(5)

Amstrax makes use of plugins to process the data step-by-step. It starts out with the most basic information, such as the time, which sensor and the raw ADC counts detected. The data is subsequently flipped and baselined, before looking for ’hits’ by setting up threshold values. When the hits have been found, they are assembled into peaks. The properties of these peaks, such as their area and interquartile range, are saved and then used in clas-sifying the peaks into S1s and S2s. With the use of individual SiPMs in the array at the top of the TPC, we can measure the distribution of light emitted by the S2. This enables us to recreate the lateral positions of the S2s by minimizing a χ2 or likelihood distribution. Finally, S1s and S2s are paired into events, allowing us to see the S1 and S2 properties of each event.

(6)

2 Theory

2.1 Dark Matter

2.1.1 Rotation Curve

Over the past century, there has been a growing consensus about the ex-istence of a type of matter that does not interact with light, however, its gravitational participation has become increasingly clear [1]. Talks suggest-ing its existence can be traced back to the late 1800s, but it was the Swiss astrophysicist Fritz Zwicky who dubbed it ’dark matter’ after looking at stars’ velocities at the edges of galaxies in 1933. Classical Newtonian me-chanics predicts the velocity of stars to fall off as 1/√r by simply equating gravity to the centripetal force:

~ FG = ~Fc =⇒ GM (r)m r2 = mv(r)2 r =⇒ v(r) = r GM (r) r ∼ 1 √ r. (1)

By comparing the mass needed to reach these velocities with an estimation of the galaxy’s mass based on their brightness, Zwicky realised he had estimated there to be more matter than there was visible, a lot more. His suspicions were proven to be correct after more research was done on the rotation curves of galaxies in the 60s and 70s, such as shown in Figure 1. These show the velocities at large radii to be constant [2], which strongly contradict the classical predictions. By the 80s dark matter had established itself as one of astronomy’s major problems.

2.1.2 Gravitational lensing

In the presence of a gravitational field, light traveling through a distribution of mass to an observer is deflected according to the theory of relativity. This bending of the light results in a distorted image of its source to the observer and this effect is known as gravitational lensing (Figure 2). Gravitational lensing can be divided into two classes, strong and weak lensing. Where with strong lensing its effects are more noticeable in the form of Einstein rings, arcs and multiplication of images, the weak lensing effects are more subtle, often relying on statistical methods to highlight them. The behavior of this phenomenon is well understood and analysis of gravitational lensing concludes the luminous matter to be insufficient to account for the observed amount of light bending.

(7)

Figure 1: The solid line represents the measured data of galaxy velocities as a function of distance from the middle of the Milky Way. The dashed and dotted lines of bulge, disk and their sum represent summed-up baryonic com-ponents. The dashed line representing the halo component is dark matter. Notice that dark matter only starts dominating at larger distances. Figure taken from [2]

2.1.3 Cosmic Microwave Background

Nowadays, we have found mounting evidence supporting the existence of dark matter. With the mapping of the cosmic microwave background radiation (CMB) and its fluctuations by the Planck telescope and Wilkinson Microwave Anisotropy Probe over the past decade (Figure 3), the Λ Cold Dark Matter (ΛCDM) model has cemented itself as the model best fitting this data [4]. This model suggests a spatially flat and ever expanding universe, dominated by the cosmological constant Λ and cold dark matter (CDM). The fluctua-tions are a result of quantum fluctuafluctua-tions inducing density variafluctua-tions in the early universe to which baryonic and dark matter were attracted. As the gravitational potential grew, so did the temperature and therefore radiation pressure. This pressure affected baryonic matter by pushing it back into less dense regions, while the unaffected dark matter kept accumulating in the potential well. The density fluctuations at the time of decoupling of photons

(8)

Figure 2: Gravitational lensing caused by the galaxy cluster known as SDSS J1152+3313. The blue light on the top right side are galaxies whose light is bent around the cluster in their gravitational field. Picture taken by the Hubble Wide View Camera 3 [3].

have permanently been imprinted in their energies and can still be measured, together with the sizes of the dense regions, providing the ratio of baryonic and dark matter. The best fits of the ΛCDM model suggest there to be (only) 5% baryonic matter, 26% dark matter and 69% dark energy, of which the latter would be responsible for the expansion of the universe [5].

2.1.4 Bullet Cluster

The best empiric evidence to date for the existence of dark matter is the Bullet Custer (1E 0657-56). The Bullet Cluster consists of two galaxy clusters colliding and shows the different behaviors of stars, gas, and dark matter colliding, allowing them to be examined separately [6]. Whereas the stars in the colliding clusters are virtually non-interacting, the intergalactic gas is not and heats up, emitting X-rays in the process. Dark matter is harder to trace, but its gravitational effect can be obtained by looking at the gravitational lensing effect. This shows the spacial center of the gravitational potential to be separated from that of what we would expect from the baryonic matter, thus further strongly promoting the existence of dark matter.

(9)

Figure 3: Cosmic Microwave Background temperature fluctuations taken by the Wilkinson Microwave Anisotropy Probe (2010). The average temperature is 2.725 K with fluctuations of up to 0.0002 K, where red and blue regions indicate warmer and cooler regions respectively.

Figure 4: Bullet Cluster 1E 0657-56 displayed with gravitational lensing map (blue) overlapped with optical and X-ray data (pink). While the X-ray data shows the majority of the baryonic matter to be in the middle after collision, the lensing effect is most present at the sides, suggesting non-interacting dark matter is mostly responsible for the observed gravitational effects.

(10)

2.2 Weakly Interacting Massive Particles (WIMPs)

Naturally, the search for possible dark matter candidates began. We know there are three conditions that have to be met in order to be considered. Namely; no color charge, no electromagnetic charge, and the particle should be stable. From all the particles known to the Standard Model, only neu-trino’s would qualify. However, with the amount of neuneu-trino’s in the universe and the current limits set on its masses, they cannot account for all the dark matter [7]. This means we have to continue our search beyond the currently known Standard Model.

One of the proposed candidates are the (as of the present) hypothetical par-ticles known as Weakly Interacting Massive Parpar-ticles (WIMPs) [7]. These particles only interact through gravity and the weak force, and are an at-tractive candidate for several reasons. Known mechanisms shortly after the Big Bang explain their thermal production in the early universe. Supersym-metric extensions on the Standard Model also already predict a stable new particles that have the right properties to be dark matter candidates. Lastly, since WIMPs are able to interact through the weak force, their presence can theoretically be detected.

2.2.1 WIMP production in the Early Universe

The early universe was a very hot and dense place, with temperatures ex-ceeding the rest energies of all known particles. This allows for a thermal and chemical equilibrium between particles. We assume there exists a super-symmetric particle χ, with irrelevant mass (as T mχ), and is able to be

produced from collisions between Standard Model particles. The opposite process would also work with χ annihilating with its antiparticle ¯χ.

χ ¯χ ↔ e+e−, µ+µ−, τ+τ−, ν ¯ν, q ¯q (2) Meanwhile, the universe experienced a very rapid expansion known as in-flation. This expansion lowered the temperature and as a consequence of this the creation of χ halted. Not short after, the χ concentration was low enough that virtually no annihilation took place, freezing the particle out. This happened all within the first nanosecond of the universe.

2.2.2 Detection

With a very small cross-section of the order of 10−39 cm2, one of the only ways of detecting WIMPs would be through direct detection. WIMPs can

(11)

transfer their energy in collisions with ordinary matter, of which the recoil energy could be detectable. The velocity of WIMPs is expected to be around 232 km s−1. Combining this with the velocity of our Solar system around our galaxy’s center of 220 km s−1, this would be sufficient for detection. This is currently being attempted with the use of dual-phase xenon Time Projection Chambers (TPC).

(12)

3 Dual-phase xenon Time Projection

Cham-bers & XAMS

3.1 Working Principles of Time Projection Chambers

One of the current efforts to detect dark matter is through the use of dual-phase xenon Time Projection Chambers (TPCs). These chambers are filled with liquid xenon (LXe) with a layer of gaseous xenon (GXe) on top. When particles interact with the LXe, we can see this in the form of two distinct signals measured by photo-multiplier tube (PMT) arrays placed at the top and bottom of the TPC. The first signal is known as S1 and is a product of scintillation light produced by the excitation of atoms during the initial inter-action. The second signal is due to ionized electrons that have been liberated during the interaction. They are drifted to the top of the TPC with the use of an electric field produced by grid wires. When the electrons reach the GXe they are subjected to an even stronger electric field, providing the second, of-ten much larger, scintillation signal S2 [8]. All of this is visualized in Figure 5. From the S1 and S2 signals we can gather a lot of information. Firstly, we can reconstruct the z-position of the interaction by looking at the drift time of the electrons and the lateral position from the hit distribution of the S2 signal on the top photo-multiplier array. The electrons travel up the TPC with a constant drift-velocity due to the produced electric field, which means the z-position of the interaction is directly proportional to the drift time. To calculate the x and y-position, a χ2 _{or likelihood function is used to}

deter-mine the most probable position based on the amount of hits on each PMT, which will be discussed further in Chapter 4.4. Additionally, we can use relative S2 and S1 sizes to distinguish whether the signals are from nuclear recoil (NR) or electronic recoil (ER) origin. While S1s from both NR and ER are expected to be of similar sizes, the S2s from NR sources are smaller due to a mechanism called nuclear quenching, in which part of the energy is lost through atomic collisions. ER, therefore, has a larger S2/S1-ratio than NR. Most of the background interactions produce an ER, which enables us to filtering these interactions out based on their S2/S1-ratio [8].

Background sources can be split up into intrinsic and extrinsic sources. Intrinsic sources are due to impurities in the xenon. In order to cut back on these, the xenon is continuously filtered by circulating it through a chemi-cal filter that removes electronegative impurities [12]. One major intrinsic contamination source is radon-220, a product of uranium decay. Extrinsic

(13)

Figure 5: When a particle interacts in the LXe, we can measure its scintilla-tion light as S1. The collision ionizes the xenon and the liberated electrons are attracted to the top of the TPC. When they reach the GXe, they pro-duce another scintillation signal we know as S2. The light distribution on the light sensors allows for lateral position recreation. The time measured between the two peaks is the drift time and corresponds to the depth of the interaction.

sources are those not coming from the target material, such as those on the detector surfaces. One way of cutting back on extrinsic backgrounds is by only using the inner LXe volume for measurements, while using the outer volume’s self-shielding properties to filter out most of the backgrounds from reaching the middle of the TPC. This process of maximizing the active vol-ume, while simultaneously minimizing the amount of backgrounds, is known as fiducialization. Afterwards, position reconstruction can be used to check whether the event indeed happened within the active or fiducial volume.

3.1.1 Liquid noble elements as target material

Besides xenon, other possible target materials for WIMP searches should be considered. Liquid noble gases such as xenon, argon and neon each have their own advantages and disadvantages.

(14)

Element Xenon Argon Neon

Atomic Number Z 54 18 10

Atomic Mass A 131.3 40.0 20.2

Boiling Point Tb [K] 165.0 87.3 27.1

Liquid Density @ Tb [g/cm3] 2.94 1.40 1.21

Fraction in Earth’s Atmosphere [ppm] 0.09 9340 18.2

Price $$$$ $ $$

Scintillator _X _X _X

Wph (α, β) [eV] 17.9 / 21.6 27.1 / 24.4

Scintillation Wavelength [nm] 178 128 78

Ionizer _X _X

-W (E to generate e-ion pair) [eV] 15.6 23.6

Table 1: Properties of the liquid noble gases as target materials in time projection chambers [9][10].

in Table 1. Heavier elements will always bring a multitude of conveniences compared to their lighter counterparts. A higher atomic weight and liquid density brings a shorter path length to background radiation, which means heavier elements have better self-shielding properties and would allow for larger fiducial volumes to be used. With a low boiling point of 27.1 K for neon, the liquification process faces a few challenges. This is easier and cheaper for argon and xenon, as their boiling points are high enough for liq-uid nitrogen (Tb = 77.4 K) to be used. While all three elements are great

scintillators with high light yields, only neon is not considered a good ionizer, due to its low charge yield. Neon would, therefore, usually not be considered for a TPC. With a scintillation wavelength of 178 nm, xenon has another benefit over argon and neon. Namely that they can be detected using com-mercially available photo-cathodes, while LAr based TPCs would require wavelength shifters to be implemented [9].

Besides heavier elements having better self-shielding properties, they also tend to have larger cross-sections, as this scales with A2_{, making}

WIMP-nucleus interactions much more common if a heavier target material is used. However, due to more energy being lost for large momentum transfers, the expected nuclear recoil spectra is form factor suppressed at high energy levels [9]. This requires lower detection thresholds in the case of LXe. While this would not be needed for LAr, the overall interaction rate would always be lower. The price of xenon is high, due to its low natural abundance, whereas that of argon is quite modest. Xenon, however, is radioactively clean in the

(15)

Figure 6: The expected nuclear recoil spectra from WIMP interactions with LXe and LAr. WIMP mass is assumed as 100 GeV/c2_{, with a cross-section}

of σ = 10−43 cm2. At lower energies, a higher interaction rate is expected for LXe. At higher energies, the interaction rate is form factor suppressed for LXe, but not for LAr. LXe therefore requires a low detection threshold. Coloured areas indicate experimentally achieved thresholds. Image taken from [9].

sense that no decay modes have energy levels within our region of interest and argon is not. In natural argon, the radioactive 39_{Ar isotope is naturally}

present at 1 Bq/kg that has a low energetic β decay mode. This can lead to unwanted backgrounds that would have to be taken into consideration for data-processing as well as finding ways to suppress their presence [11].

3.2 XAMS

To better understand and improve the workings of dual-phase xenon TPCs, a small scale version named Xenon AMSterdam (XAMS) was build at the Na-tional Institute for Subatomic Physics (Nikhef), Amsterdam. This allows for new hardware and software to be tested, as this would often be too costly or risky to do on large-scale detectors. XAMS has an active volume of 154 cm3_,

in which 434 g of liquid xenon is held at -90◦C. At the top and bottom of the detector, arrays of photomultiplier tubes (PMTs) have been installed. The

(16)

top array has been upgraded to eight silicon photomultiplier tubes (SiPMs), able to detect light individually, which allows position reconstruction of S2s. The top SiPM array is connected to a CAEN mod. v1730 digitizer and the bottom PMT array is connected to a CAEN mod. v1724 digitizer.

3.2.1 DAQ and trigger using 22-Na gamma-ray source

In order to replicate events, a radioactive 22Na source of (368±11) kBq is used. 22_{Na has a half-life t}

1/2 of 2.6 years with its main decay channels

being positron emission (90.4%) and electron capture (9.6%). In the case of positron emission, the positron annihilates with an electron, producing two back-to-back 511 keV gamma-rays that travel in opposite directions, due to conservation of momentum. 22 11Na → 22 11Na ∗ + β++ ν (3) β++ β− → 2γ (511 keV) (4) The excited 22

11Ne can subsequently decay into its stable state, emitting a

gamma-ray of 1.275 MeV.

22 11Na

∗ _→22

11 Na + γ (1.275 MeV) (5)

Directly behind the source, a sodium iodide (NaI) crystal is placed that will be used as a trigger in data acquisition (DAQ) (see Figure 7). On the opposite side, between the source and TPC, a collimator consisting of two lead blocks is placed. The lead blocks have a cavity of ∼0.5 mm drilled into them in order to only allow gamma-rays travelling along the cavity’s axis to pass through. This increases the chance the opposite-travelling gamma-ray hits the NaI crystal, as well as improving the drift-time distribution’s accu-racy that uses the same set-up.

(17)

Figure 7: The radioactive source is held in the source holder (red). Between the source and the TPC vessel, two lead blocks acting as a source collimator are placed. The diameter of the cavity is 0.5 mm in diameter and allows one of the 511 keV gamma-ray pair to pass. Behind the source, the NaI crystal is placed to catch the other gamma-ray travelling away from the TPC. Image taken from [12].

In order to maximize the usefulness of data that is being collected, the set-up works with a build-in trigger (see Figure 8). If the observed events coming from the bottom PMT array and the NaI crystal are within a coincidence window of 120 ns, it is likely due to a 511 keV gamma-ray pair. This triggers the data acquisition for the V1724, to which they are both connected, and tells the V1730 digitizer to record the SiPM array’s signals. The signals are then stored on the DAQ computer for processing.

(18)

Figure 8: Trigger set-up for data acquisition. The NaI crystal and bottom PMT array are both connected to the v1724 digitizer. The top SiPM array is connected to the v1730 digitizer. When the NaI crystal and bottom PMT array both observe an event within a 120 ns time window, the data acquisition for the v1724 is activated. Only then is the v1730 told to record the signals for the next 40950 ns. Data from both digitizers are then stored on a DAQ computer. Image taken from [12].

(19)

4 Reconstruction Software

4.1 Data Collection

For the digital signal processing of XENON100 and XENON1T’s raw data, the Processor for Analysing XENON (PAX) was used [13]. With the upgrade of XENON1T to XENONnT, this processing software had also been upgraded to the much faster version Strax [14]. Strax works as an analysis framework for pulse-only digitization data and is primarily developed for the XENONnT experiment. This package can be implemented for similar experiments, such as XAMS, to work in combination with their processing software that is specific to their set-up [15]. In the case of XAMS, this would be Amstrax [16].

Figure 9: This dependency graph shows how each plugin relies on other plugins. The dependencies are denoted by the arrows between the plugins.

(20)

Before the calculations of the plugins are done, the data is fragmented into chunks, which are then sorted based on time. Each plugin returns an array which includes the information that has been calculated with their appropri-ate data-types, and is reliant on the previous plugin(s)’s results, as shown in Figure 9. After the data has been stored on the DAQ computer, it is con-verted into collections of samples, known as records. Each record contains the most basic information of the samples, such as the start time of each record, the length of each sample, the channel this data came from and the data in raw ADC counts, which is shown in Figure 10. After the data has been converted into ’raw records’, it is now ready for processing.

Figure 10: Raw data of a sample plotted over time.

To ensure the amstrax software is correctly implemented, we make use of a real dataset that has been collected on the 4th of June 2019, which is the same dataset used by A.A. Loya Villalpando in [12]. All figures of waveforms and peaks in the following segments that are shown are from this dataset.

4.2 Pulse Processing

In order to get anything useful out of the data, a baseline has to be de-termined. The raw data coming from the digitizer signals are filled with electronic noise that is due to the hardware used. To determine the level of the baseline, the data is first flipped before an average and standard devi-ation is taken over the first forty samples of each channel. This average is then subtracted from all the data samples. Now, we can look for hits in the baselined data. We set up a threshold that determines the minimum height for the data to be considered a ’hit’. This is set to 15 ADC counts above the

(21)

baseline. If a hit is found, the next sample will be checked to see if it also exceeds the threshold and is part of the same hit. When this is not the case anymore, the hit’s properties, such as the time, total length, sample length, channel and height, are saved. Each hit is then subsequently integrated over all the contributing sample heights to get the total area of each hit. An example of a hit is shown in Figure 11.

Figure 11: Plotted baselined data of a hit found in the signals of one sensor.

4.3 Peak Processing & Classification

To rebuild the signal peaks from the hits it has to be determined which hits contribute to a peak. This is done by setting up a few criteria, such as the minimum amount of contributing SiPMs, the minimum area of the hit and a time window in which the hit is considered part of the peak. The val-ues for these criteria are set at peak min pmt = 6, min area = 100 PE and peak right extension = 300 ns. A hit is selected and, if it meets the min-imum area and contributing SiPM criteria, a peak is started with the hit’s properties. If the next hit is found within the time window and meets the other criteria, it is considered part of the peak and its properties are added to the peak. If it falls outside of the time window, the peak has ended. The final quantities, such as the time, its total length and the area of each SiPM, are saved. Then, a single waveform is created from all the hits’ sample data and saved as the peak’s data. Lastly, an interquartile range is calculated, slicing up the data and returning at which points from the middle of the peak a certain area fraction is reached. This is commonly knowns as the

(22)

width and can later be used to filter.

Figure 12: Plotted data of a build peak.

While building the waveforms of the peaks, the data is also converted from its raw ADC count to photo electrons [PE]. The PE is the amount of current that is induced when a photon hits the sensor. This conversion factor depends on the SiPM array and digitizer properties, such as the sample duration, the digitizer voltage range, the amount of digitizer bits, the load resistance of the SiPM circuit and the total amplification.

ADC to PE = sample duration · digitizer voltage range

2digitizer bits_{· SiPM circuit load resistor · total amplification}

(6) The total amplification is dependent on the gain of each SiPM and the amplification factor as total amplification = gain · factor. The gain is the multiplication factor of the current induced by incident light and can differ between light sensors. The values used for these terms are displayed in Table 2.

4.3.1 Peak Classification

After the waveforms and their properties have been build and calculated, it is possible to start labelling them as S1s or S2s. This will be used to identify S1 and S2 pairs, and in the position reconstruction of S2s. If the peak’s width is below 100 ns, it is considered an S1, while those above 100 ns are considered S2s. Additionally, both S1 and S2 require a minimum area of 4

(23)

ADC to PE conversion values

Sample duration 2 · 10−9 s

Digitizer voltage range 2 V

Digitizer bits 13

SiPM circuit load resistor 50 Ω

Gain (3.06 - 3.26) ·105

Amplification factor 10

Table 2: Values used in calculating the conversion rate from raw ADC counts to PE.

PE. Using PAX, the limit for S1 widths were set on 60 ns. However, our peak clustering method indicates that this limit is too low. So, for the time being, we changed this to 100 ns. Figure 13 shows the classification areas of S1 and S2 on an area [PE] versus width [ns] plot of our dataset. In this case, the width is the length of the middle 50% of the peak’s area. The S1 band is visible at the bottom of the figure. This figure will be discussed more in Chapter 5.1.1.

4.4 Peak Positions

With the use of separate sensors, it is possible to estimate the S2 position based on the distribution of hits on the SiPMs array. At the moment, three different methods can be used in the lateral position recreation. For all of these, an isotropic distribution of light from a single location just above the liquid level is assumed.

4.4.1 SiPM geometry

XAMS has eight SiPMs installed in its top array. The geometry is shown in Figure 14, along with their locations. The SiPM holder is placed 10 mm above the LXe level. At the time of data collection, SiPM 8 was not working properly. The data collected by this sensor has, therefore, not been taken into account throughout the entirety of data processing. This does have a negative effect on the accuracy of the position reconstruction at the top SiPMs, but will be barely noticeable at the lower parts of the array.

4.4.2 Center of Gravity method

The ’Center of Gravity’ (COG) method is the simplest, but also most flawed of the three. It takes the position and amount of hits of each SiPM into

(24)

Figure 13: The total area [PE] of all peaks are plotted against their 50% area widths [ns]. The red lines indicate the classification criteria. These is cur-rently set on min/max width = 100 ns and min area = 4 PE. The denser the peak population, the brighter yellow the logarithmic bins.

consideration and calculates a center of gravity as:

R = 1 N m X i=1 niri, (7) where N = Pm

i=1ni is the total hits.

The COG method is not considered accurate when it comes to position reconstruction. This is mainly due to not taking the light intensity’s inverse-square behaviour into account. While the photon flux will decrease as 1/r2, the model assumes a 1/r decrease when calculating the positions. This tends to recreate events more inwards to the detector, especially for events at the edge of the TPC. For this reason, the COG method will not be further discussed in this section. One thing it has going for it, given the simplicity of the calculation, is that this method is rather quick and requires a short computation time.

(25)

SiPM Location (x, y, z) 1 (0, -15, 10) 2 (-13, -7.5, 10) 3 (13, -7.5, 10) 4 (-4, 0, 10) 5 (4, 0, 10) 6 (-13, 7.5, 10) 7 (13, 7.5, 10) 8* (0, 15, 10)

Figure 14: Top view of XAMS SiPMs. Each SiPM has an area of 3x3 mm and their geometry is shown on the right. (0, 0, 10) is considered the center of the SiPM holder. At the time of data collection, SiPM 8 was not working properly. Its signals have been ignored throughout the data processing.

4.4.3 χ2 _method

Both the χ2 _{and the log likelihood methods work by simulating events from}

various positions in the TPC and checking how well each position fits with the observed hit distribution of the S2. From each simulated position, an expected amount of hits is calculated. To determine which position is the best fitting to the observed amount of hits, the χ2 of each recreated position is calculated in order to determine the goodness of fit. In the case of the log likelihood method, this is done by calculating the likelihood.

Firstly, the amount of observed photon hits on each channel is dependent on the hit probability of each SiPM. This hit probability is the fraction of the total emitted photons that hit the SiPMs, corrected for the SiPM’s photon detection efficiency ε (25%):

phit,i=

nhits,i· ε

nU V

, (8)

with, uU V being the total amount of UV photons generated in the S2. From

this, the amount of photons simulated to be observed by each SiPM can be easily calculated as

(26)

From each simulated position, a different amount of photons is expected to hit the SiPMs. This expected amount of hits nexp is dependent on the

solid angle dΩ between the simulated event position and each SiPM’s area, and ε as

nexp,i = n0· ε · dΩ. (10)

The solid angle is defined as dΩ = Aef f

4π|~∆|2 =

Aef f

4π| ~XU V(x, y, z) − ~XSiP M(x, y, z)|2

. (11)

Thus, equation 10 becomes nexp,i = n0· ε ·

Aef f

4π| ~XU V(x, y, z) − ~XSiP M(x, y, z)|2

= µ. (12)

To find out which simulated position fits the observed amount of hits best, the χ2 _{of each position is calculated. It describes the goodness of fit,}

or how well the expected hits agree with the observed hits. χ2 _{is defined as}

χ2 =X

i

(nexp,i− nobs,i)2

nexp,i

. (13)

The best fitting model is the one where the expected amount of hits differs the least from the observed amount of hits, which is defined by the lowest χ2 _{value. In order to find this, the χ}2 _{distribution is minimized. This is}

visualised in Figure 15, which shows the χ2 distribution as a hyperplane, of which the minimum represents the best model. All of this is implemented in iminuit, a Python interface used for function minimization methods [17].

After determining the best recreated position, the x and y- positions, as well as the distance to the middle of the TPC, are returned for each S2. The recreated event position of all S2 classified peaks can be seen in Figure 16.

(27)

Figure 15: Visualisation of χ2 _{distribution of simulated positions. The deeper}

purple, the better the goodness of fit is. The amount of simulated observed hits on each SiPM is indicated by the size of the SiPM squares. The simulated event position is marked by the blue cross, while the best reconstructed position is marked by the white dot. The hit probabilities calculated from the simulated event position are given in Table 3 and are used to calculate the simulated observed hits in Equation 9. The χ2 _{distribution takes an}

almost uniform shape.

Number of SiPMs = 7, Generated hits from [ -7.5, -15, 0]

SiPM [x, y, z] of SiPM Total phit Normalized phit Efficiency

1 [0, -15, 10] phit = 0.00092 phit = 0.374 ε = 0.25 2 [-13, -7.5, 10] phit = 0.00073 phit = 0.300 ε = 0.25 3 [13, -7.5, 10] phit = 0.00010 phit = 0.041 ε = 0.25 4 [-4, 0, 10] phit = 0.00033 phit = 0.134 ε = 0.25 5 [4, 0, 10] phit = 0.00019 phit = 0.077 ε = 0.25 6 [-13, 7.5, 10] phit = 0.00014 phit = 0.057 ε = 0.25 7 [13, 7.5, 10] phit = 0.00005 phit = 0.020 ε = 0.25

Table 3: Simulated hit probabilities of each SiPM using nU V = 105simulated

(28)

Figure 16: Position recreation of S2s using a χ2 _{minimization. Within the}

active volume of the TPC, the pattern of the anode mesh is showing through. Unfortunately, a lot of S2s are recreated outside of the TPC radius. There also seems to be more recreated positions at the middle of the TPC. These indicate that the position reconstruction is not yet working optimally, which will be discussed further in the Chapter 5.1.2.

4.4.4 Log Likelihood method

The log likelihood method works similarly to the χ2 _{method, but uses a}

more complex and accurate calculation to determine the goodness of fit of each recreated position. The amount of hits on each SiPM is subject to a Poisson distribution, of which the product equals the likelihood as:

L =Y i P (nobs,i|nexp,i) = Y i P (n|µ) =Y i µn_e−µ n! (14)

For convenience, we take the − logL and approximate using Sterling’s rule. This simplifies the equation and allows us to minimize the likelihood distribution at a later stage.

− logL = −X i log µne−µ n! =X i log n! µn_e−µ (15) − logL =X i (µ − n log µ + log n!) (16)

(29)

Similarly to χ2, using iminuit we minimize the log likelihood distribution in order to find the best reconstructed position. Figure 17 shows the distri-bution visualised. Again, a hyperplane is described, of which the minimum represents the best model. From looking at the distribution, we can see that the likelihood method takes the geometry of the SiPMs into account, unlike the χ2 _{method. For example, because the hits on SiPM 7 (top right) are}

com-paratively low, it is recognized that it is very unlikely the event happened around this lateral location. This is reflected in the likelihood distribution by giving this area a higher value.

Figure 17: Similar plot as Figure 15, but obtained by minimizing the negative log likelihood of each reconstructed position. The same simulated event position and hit probabilities of each SiPM is used, as shown in Table 3. In contrast to χ2 method, the likelihood method takes the geometry of the SiPM into account.

After the best recreated positions have been determined, the lateral po-sition coordinates and distances to the middle of the TPC are returned for each S2. The recreated event positions of all S2 classified peaks, using the log likelihood method, can be seen in Figure 18.

(30)

Figure 18: Position recreation of S2s using a negative log likelihood mini-mization. It seems that the anode mesh is better visible than in Figure 16. Similarly to the χ2 _{method, S2s are still recreated outside of the TPC and}

there also seem to be more events recreated at the center of the TPC. This will be further discussed in the Chapter 5.1.2.

4.5 Event composition

Once all the peaks have been classified, it is possible to determine which S1s and S2s are considered pairs, originating from the same event. This is done by firstly determining the amount of nearby peaks for each peak by looking within a time window of 3·106 _{ns. These peaks are grouped before the main}

S1 and S2 are determined. This is done by first determining the largest S2 peak. If this peak’s area is larger than 100 PE, an event is build with this peak as its main S2. The main S1 is currently chosen by looking left of the main S2 and taking the largest S1. Once the main S1 and S2 have been de-termined, the event’s properties are calculated and returned. This includes the drift time between the main S1 and S2, their areas and widths, and the S2’s reconstructed position. If more S1s or S2s are included in the event, their area and width are also returned as alternatives.

(31)

It is now possible to select events based on their S1 and S2 area. This allows us to find which events are induced by 511 keV gamma-rays from the

22_{Na source. The S1s would be most populated around an area associated}

with their maximum energy deposition and should plateau at lower area values for Compton scattered gamma-rays. The S2 properties are expected to be mostly affected by different drift-times. The longer the drift-time, the more likely it is electrons are absorbed on their way through the LXe. Thus, events deeper in the TPC tend to have lower S2 area’s and longer S2 widths. The identification of 511 keV events will be discussed more in Chapter 5.1.1.

Figure 19: Events’ main S1 and S2 area’s plotted against each other. The events with an S1 area of 0 show that a significant portion of the events are build without a main S1 found. No statement about the identification of 511 keV gamma-rays can be made from this plot with certainty. The event clustering seems to not be working optimally yet and will be further discussed in Chapter 5.1.1.

(32)

5 Discussion

5.1 Limitations

5.1.1 Peak and Event Clustering

While the data processing seems to be successful for a good chunk of the data-set, we do come across some limitations. One indicator for this can be spotted in the area versus width plot and is the region marked by ”?” in Figure 20. The properties of the peaks in this region do not align with

Figure 20: Area versus width (middle 50% area) of peaks, focused on the lower area and width values. The classification criteria are indicated by the red lines. We are unsure of the nature and source of the peaks found in the area marked by ?.

what we would expect for usual 511 keV S1s or S2s. While their area’s seem to be quite low, the peaks are very stretched, which is unusual for peaks with low area’s. Even though this could be the cause of single electrons, it is more likely that this is due to poor peak clustering. This could be due to the values chosen for the thresholds and time windows used during the data processing. These values should, therefore, be more carefully optimized in order to find the source of these discrepancies. Other indications of poor peak clustering can be seen in Figure 21. What stands out is the sudden drop in S2 population around the 200 PE area, along the whole S2 width range, on the left side of the figure.

(33)

Figure 21: The total area [PE] of all peaks are plotted against their 50% area widths [ns]. Notice the sudden drop in S2 peak population for peak areas around 200 PE and the stretched out S1 band at the bottom.

Lastly, the S1 classified peaks seem to be very stretched out over the area range, as can be seen in Figure 21. This causes problems in event identification. The 511 keV gamma-ray induced S1s’ and S2s’ characteristics have been previously mentioned in Chapter 4.5. While they allow for some leniency, the fact that these S1s are so spread out in their area’s means there is not enough certainty in identifying the 511 keV, as shown in Figure 22. The cause of these high-area S1s can simply be from background radiation, but more likely suggests a not optimally functioning peak or event clustering process. It could also be linked to the unknown area previously mentioned in Figure 20. The thought of the event clustering not working optimally is also supported by some events showing to have no main S1s, as can be seen in Figure 22. Due to having limited time and reasons discussed in Chapter 5.1.3, not a lot of time has been spend in optimizing the event building criteria and process.

5.1.2 Peak Reconstruction

There is still much room for improvement in the peak position recreation. The two main issues are its speed and sensitivity to noise or disturbances. Minimizing χ2 _{and likelihood distributions using iminuit are intensive}

(34)

pro-Figure 22: Events’ main S1 and S2 area’s plotted against each other. The events with an S1 area of 0 show that a significant portion of the events are build without a main S1 found. No statement about the identification of 511 keV gamma-rays can be made from this plot with certainty.

cesses that take time. One option to improve this is by making a quick grid scan of the χ2 or likelihood distribution, before narrowing down on a region of interest and iterating again. This should be done using the log likelihood method, as this narrows the region of interest down much more effectively, as can be seen from comparing Figure 15 with 17. This will simultaneously speed up and improve the accuracy of the calculation, as the position recre-ation can be focused on a smaller region.

An issue that is noticeable in Figures 23 is that some S2s are recreated outside of the TPC radius. This is likely due to the models not being robust enough. Disturbances or noise in SiPMs that don’t align with the signals in the other SiPMs can throw off the position reconstruction and cause the S2 to be recreated outside of the TPC. While this is less extreme using the likelihood method, it is still very much present. This can be avoided by implementing a way of determining which signals are most likely to be from the same source. The signals that are determined to be counteractive to the position reconstruction will be ignored, thus improving the reconstruction accuracy. There also seem to be more S2s positioned at the center of the TPC (0, 0). This is likely due to failed position recreations, which are then

(35)

(a) χ2 method (b) Log likelihood method

Figure 23: Position recreation of S2s using the (a) χ2 _{and (b) log}

likeli-hood method. Positions recreated outside the TPC radius are likely due to disturbances or noise throwing off the position reconstruction.

given the x = 0 and y = 0 positions. The cause of this is currently unknown, but it suggests the calculations are still not working optimally.

5.1.3 Chunking Issues

While the data is processed in each plugin, the data is often split up into ’chunks’. When a plugin is reliant on processed data from multiple plugins, these chunks are merged. This is an easy way of assembling the properties of each data sample, but the requirement is that the chunks are of equal size. While re-chunking is allowed in certain plugins, such as ’Peaks’ and ’Events’, it is supposed to stay unchanged throughout the rest. It seems, however, that the data is sometimes unintentionally re-chunked, causing merging issues in plugins that rely on these chunks. This has mostly come up in the plugin, or plugins relying on, ’n competing’. These issues have caused delays in accessing much of the event processing plugins.

(36)

References

[1] R.L. Liboff, Generalized Newtonian Force Law and Hidden Mass, Astro-physical Journal Letters v.397, 1992

[2] Klypin, Zhao & Somerville, ΛCDM-based models for the Milky Way and M31, I: Dynamical Models, 2002

https://arxiv.org/abs/astro-ph/0110390

[3] ESA/Hubble & NASA, Hubble Space Telescope Wide Field Camera 3

https://www.nasa.gov/content/hubble-space-telescope-wide-field-camera-3 [4] Planck collaboration, Planck 2018 results I. Overview, and the

cosmolog-ical legacy of Planck, 2020

[5] Planck collaboration, Planck 2015 results XIII. Cosmological Parameters, 2016

[6] D. Paraficz et al. The Bullet cluster at its best: weighing stars, gas and dark matter, 2016

[7] G. Bertone, D. Hooper & J. Silk, Particle Dark Matter: Evidence, Candi-dates and Constraints, 2008 https://arxiv.org/abs/hep-ph/0404175 [8] E. Hogenbirk, A spark in the dark: Scintillation time dependence and

neutron-induced signals in dual-phase xenon TPCs, Ipskamp 2019 [9] M. Schumann, Dark Matter Search with Liquid Noble Gases, 2012

https://arxiv.org/abs/1206.2169

[10] E. Aprile & L. Baudis, Dark Matter Particle, ed. G. Bertone (Cambridge University Press, 2010)

[11] ArDM collaboration, Backgrounds and pulse shape discrimination in the ArDM liquid argon TPC, 2017

[12] A.A. Loya Villalpando, Characterization of Silicon Photomultipliers for Event Position Reconstruction in a Dual-Phase Xenon Time Projection Chamber, Master’s Thesis

[13] PAX on Github

https://github.com/XENON1T/pax [14] Strax on Github

(37)

[15] Strax documentation on readthedocs.io https://strax.readthedocs.io

[16] Amstrax XAMS on Github

https://github.com/XAMS-nikhef/amstrax [17] iminuit on Github

(38)

Acknowledgements

Since the start of November, I have been fortunate enough to be able to join the Nikhef dark matter group to work on my Bachelor’s Project. I want to thank Auke-Pieter Colijn and Peter Gaemers for setting up an entirely remote project for me during the current Covid-19 pandemic and making me feel so welcome. I also want to thank Ivo van Vulpen for being my second examiner.

Joining the daily coffee meetings in the morning have been really fun and motivating. They were a great way of staying grounded and connected to the group while working from home every day. Thank you to the rest of the group members, Joran, Serena, Stefan, Alessia, Barbara, Leonora, Tina and Patrick.

I want to thank Peter especially, for helping me through the project and answering my ignorant Python questions. You were a great help. I thought it was funny that throughout the project you called me dude twice; once out of frustration and once out of excitement.

Optimisation of XAMS data processing software Amstrax