data acquisition

(1)

HiSPARC data acquisition

Designing and implementing a new baseline determining algorithm

Jorian van Oostenbrugge 6306500

Onder begeleiding van:

Bob van Eijk Arne de Laat Tweede beoordelaar:

Marcel Vreeswijk

Nikhef

Natuur- en Sterrenkunde Universiteit van Amsterdam

August 29, 2014

(2)

Abstracts

Abstract

This research looks at the analysis software of the HiSPARC project. In this project high schools work in close collaboration with scientific institutes to create a large network to measure (extensive) air showers. High school students get the chance to participate in scientific research and build their own detectors, which will be deployed at their school. These detectors are connected to a computer which acts as a buffer to temporarily stores and analyze the measured data before it is send to a central server located at the dutch National Institute for Subatomic Physics (Nikhef). The software used to collect and analyze the data is written in LabVIEW, called theHiSPARC DAQ, a graphical programming language specifically designed for data acquisition. The goal of this research is developing software which improves the analysis/data selection procedures. To do this, first the current implementation needed to be understood, to find out how the analysis was done. Here we focussed on determining the baseline and standard deviation of an event. The next step was to take a more detailed look at the software in order to find bugs or errors in the code. The last step was rewriting the whole analysis in the low-level programming language C. This because it would increase the performance of the HiSPARC DAQ

since the C code would be turned into aDLLwhich consists of the compiled C code.

This DLL is callable fromLabVIEWso LabVIEWhas access to all exported functions.

Tests were written to make sure the new software behaves as expected, would return the correct error messages and when future adaptations are made the tests will show if the adaptations work correctly. The final step was to compare the current and new implementation, firstly the current implementation was rewritten in C to make sure that they both gave the same output - a matching percentage of more than 99%

was reached. The second step was to update the code to fix all errors present in the current implementation. After that, a matching percentage of 83% percentage was found and the new analysis software will be implemented in the upcoming version of the HiSPARC DAQ.

(3)

Popular scientific abstract

Dit onderzoek maakt deel uit van het HiSPARC project. Dit is een project waar- bij middelbare scholen en wetenschappelijke instellingen samenwerken, ze vormen een groot netwerk om zo kosmische straling met extreem hoge energie te kunnen meten. De leerlingen van de middelbare scholen bouwen zelf meetopstellingen en deze worden op het dak van de school geplaatst. De gegevens die door de meetopstellingen verzameld zijn, worden dan via het internet naar een centrale server, die op het wetenschappelijke instituut Nikhef staat, gestuurd. Op het Nikhef worden deze gegevens geanalyseerd om meer te weten te komen over de herkomst van de kosmische deeltjes. In dit verslag wordt specifiek gekeken naar de software die draait op de computers die verbonden zijn met de meetopstellingen. Het is de taak van deze computers om de meetgegevens tijdelijk op te slaan alvorens ze door te sturen naar de centrale server. De software die de detectoren aanstuurt, uitleest en er een eerste analyse op uitvoert is geschreven in de grafische programmeertaal

LabVIEW. De eerste analyse van een meting bestaat onder andere uit het bepalen van de nullijn en de standaard deviatie ten opzichte van deze nullijn. Het doel van dit onderzoek is het verbeteren van deze analyse software. De eerste stap was het begrijpen van de manier waarop de software nu werkt, dit betekent stukje bij beetje de hele programmatuur bekijken net zolang tot je precies snapt wat er gebeurt. De volgende stap was de software nogmaals kritisch nalopen om zo eventuele fouten te vinden. Als laatste is de hele nullijn analyse herschreven in de programmeertaal C.

Deze programmeertaal heeft als voordeel dat als je van de broncode een uitvoerbaar bestand maakt, hij heel snel deze code zal uitvoeren en daarnaast heb je volledige controle over ieder aspect van je programma. Van deze C code is een zogenaamde bibliotheek, een DLL, gemaakt welke door LabVIEW aangeroepen kan worden. Lab- VIEW heeft zo dus toegang tot alle functies die nodig zijn voor de nullijn analyse.

Om zeker te weten dat de code werkt en om er zeker van te zijn dat na aanpassingen de code nog steeds werkt, zijn er verschillende tests geschreven die de verscheidene componenten van de C code testen. De nieuwe code is ook vergeleken met de orig- inele implementatie om er zeker van te zijn dat deze naar behoren functioneert en de uitkomst van beide implementaties kwam dusdanig overeen dat de nieuwe code in de nieuwe versie van de analyse software gemplementeerd zal worden.

(4)

Introduction

Motivation

In the construction of the shower direction, which can be done by determining the arrival time in three detectors, the shower core plays an important role since the shower front is not a flat plane, but rather has a certain thickness. This thickness in- troduces an uncertainty since in advance it is unknown where a particle is measured, near the front of the shower or lagging behind. As you move further away from the shower core the median arrival time delay increases, as can be seen in figure 1[2].

4.3. Measurement Uncertainties 87

0 20 40 60 80 100

0 5 10 15 20

Core distance [m]

Arrivaltimediﬀerence|t2t1|[ns]

Figure 4.5 – The measured arrival time distributions of vertical showers. The diﬀerence in arrival time in two detectors is graphed. The showers are generated by a 1 PeV proton. Only measurements with at least one charged particle in all three corner detectors are taken into account. The dots show the median arrival time and the gray bands contain 50 % of the events, evenly distributed around the median. This figure can be compared to Figure 2.3, which shows the actual simulated arrival time distributions. The data in this graph is an approximation derived from measuring the arrival times in two detectors.

4.3.3 Propagation through Analysis

The uncertainty in reconstructing the azimuthal angle introduced by timing uncertainties is given by

æ²_¡= æ²tµØØØ Ø

@¡

@t0

ØØ ØØ

2

+ ØØ ØØ

@¡

@t1

ØØ ØØ

2

+ ØØ ØØ

@¡

@t2

ØØ ØØ

2∂

, (4.15)

with æ¡,ætthe standard deviation, of ¡ and t respectively. The first-order derivatives of

¡(t0, t1, t2) are:

@¡

@t0= 1

1 +tan²¡·r2cos¡2° r1cos¡1+ (r2sin¡2° r1sin¡1)tan¡

r1r2 c sinµ°

sin¡2cos(¡ °¡1) °sin¡1cos(¡ °¡2)¢ , (4.16)

@¡

@t1= 1

1 +tan²¡· °r2(sin¡2tan¡ +cos¡2)

r1r2 c sinµ°

sin¡2cos(¡ °¡1) °sin¡1cos(¡ °¡2)¢ , (4.17)

@¡

@t2= 1

1 +tan²¡· r1(sin¡1tan¡ +cos¡1)

r1r2

c sinµ°sin¡2cos(¡ °¡1) °sin¡1cos(¡ °¡2)¢ . (4.18)

Figure 1: The arrival time distributions of simulated vertical 1 PeV proton showers. The difference in arrival time between two detectors is graphed. The dots show the median arrival time and the gray bands contain 50 percent of the events. So the thickness of the showerfront grows as you move away from the core.[1]

Since knowing the particle density at a point is an indication of where the parti- cles are located relative to the core, as particle densities in a shower are the highest near the shower core and fall off steeply as shown in figure 2. Knowing the particle density gives us an idea of the uncertainty in the arrival time which in turn provides us with an idea of the uncertainty in arrival times and hence the shower direction.

(6)

MOTIVATION INTRODUCTION

2.1. Design Criteria 31

10¹ 10² 10³

10 ⁶ 10 ⁴ 10 ² 10⁰ 10² 10⁴

14 15 16 17 18

Electrons

10¹ 10² 10³

e

µ

E= 10¹⁶eV

Core distance [m]

Particledensity[m−2]

Figure 2.1 – Lateral distribution functions (LDFs) for proton-induced EAS. The LDF is summed over electrons and positrons for primary energies ranging from 10¹⁴eVto 10¹⁸eV(left). The two horizontal lines show the particle densities of 1.39 m⁻²and 2.46 m⁻², i.e. the 50 % detection probabilities for one and two detectors, respectively. For EAS with primary energies of 10¹⁴eV, the particle densities are too low to reach a detection probability of 50 % for any core distance. A two detector setup can only measure EAS of 10¹⁵eVup to 20 m with probabilities higher than 50 %. EAS of 10¹⁷eV, on the other hand, can be detected at distances up to 200 m. The LDF for electrons (e + e⁺) and muons (µ + µ⁺) is shown for a primary energy of 10¹⁶eV(right).

The muon distribution is much flatter. At 600 m, the densities are equal. The muon particle density does not contribute significantly to the charged particle density for core distances smaller than a few hundred meters.

depicted. For EAS with primary energies of 10¹⁴eV, the particle densities are too low to reach a detection probability of 50 % for any core distance. A two-detector setup will only measure EAS of 10¹⁵eV up to 20 m with probabilities higher than 50 %. EAS of 10¹⁷eV, on the other hand, can be detected at distances up to 200 m. On the right hand side of the figure the LDF for electrons and positrons, and muons is shown for a primary energy of 10¹⁶eV. The muon distribution is much flatter. At 600 m, the densities are equal. The muon particle density does not contribute significantly to the charged particle density

Figure 2: On the left the lateral distribution functions for proton-induced Extensive Air Showers (EAS) are shown. The function is summed over electrons and positron for primary energies ranging from 10¹⁴eV to 10¹⁸eV. On the right the lateral distribution function is shown for electron and muons for a primary energy of 10¹⁶eV[3]

Thus the particle density yields important information and the more accurate this density is known the better. This is where the baseline comes in. The particle density is calculated using the pulse integral, which in turn depends on the baseline.

The pulse integral is calculated by first shifting all values to all use the baseline as zero and then summing all values which are more than the threshold away from zero i.e. the baseline. A precise baseline yields a more accurate pulse integral and hence leads to more accurate shower direction reconstructions. Also the fact that both the current and new baseline determining implementation are described here, means that we create a future reference where the workings of both implementation are described.

(7)

THEORY INTRODUCTION

Theory

Data acquisition

The current HiSPARC data acquisition unit (HiSPARC III) supports two photomulti- plier tubes (PMTs). Each PMT is connected by two cables: a cable which is used to control and provide power to the PMT and one cable which receives the (negative) analog signal. The signal is then converted by an analog to digital converter (ADC).

The milivolts coming from thePMTare converted toADCcounts, which have a 12 bit output. The HiSPARC unit uses two ADCs perPMT, each ADC is driven at 200MHz and by using a stable crystal that acts as a clock at 200MHz, the two ADC are read out alternately meaning the PMT signal can be sampled at 400 MHz or every 2.5 nanoseconds. A careful ADC alignment procedure is preformed after setting up the unit to make sure that the baselines for both ADCs are the same and there is no offset between the two ADCs.

On the other end the PMTs are connected to a rectangular 0.5m² scintillator.

Scintillators are cheap but efficient devices, they emit light when a particle travels through it. This light is then detected by a the PMT which sends a signal back to the HiSPARC unit.

LabVIEW terminology

LabVIEW¹ is a graphical programming language used to create programs called Vir- tual Instruments (VIs). Each VI created in LabVIEW consists of three components:

a front panel - the user interface of the program - build up from controls and indicators, the block diagram - the graphical source code for the program - using nodes i.e. objects (analogues to functions) that receive input, performs an operation then returns the result and a connector pane. The connector pane is used when a VI, in this case called a subVI, is used in another VI. The controls and indicators from the subVI are used to receive and output data from/to the main VI, the connector pane is the graphical representation of the controls and indicators used to wire everything together. Since LabVIEW follows a dataflow model rather then a control flow model, as used in text-based programming languages, this means a node only executes when all required inputs are received and when finished outputs the result to the next node[5, 6]. This means that the order of nodes in a VI is of no impor- tance, as a node only executes when it has received all required inputs. See figure 3 for an example.

1 http://netherlands.ni.com/LabVIEW

(8)

LABVIEW TERMINOLOGY INTRODUCTION

(a) Nodes placed in sequential order. (b) Nodes not placed in sequential order.

Figure 3: A simpleLabVIEW VIwhich first adds two numbers, then substract- ing a third and showing the result. Comparing (a) to (b) shows the result is the same no matter where the nodes are place relative to each other. The dataflow programming model does not executes the nodes sequentially but rather the minus function node needs to wait until is has all required inputs until in can execute. So in both (a) and (b) first the plus function node executes, outputting the result to the minus function node which then outputs the result back to the user.

(9)

APPROACH INTRODUCTION

Approach

The first step in improving the current baseline determining algorithm is to understand the current algorithm and its limitations. In order to understand the current implementation, we first took a detailed look at the current implementation to see what is does. After understanding the Calculate Baseline VIthe next step was to build our own implementation in LabVIEW, fixing and improving the current implementation. To compare both implementations asubVI was created which turned a comma separated values (CSV) file into a one dimensional array of unsigned 16 bit integers² and the same trace array, as CSV file, was supplied to both implementations. After confirming both implementations gave the same answer, the new implementation in

LabVIEW could be turned into a DLL. As the source code for the DLL was written in C, we were not limited to LabVIEW structures, and its properties, and using the freedom C gave us and we could write a more elegant algorithm³. After finishing the algorithm and all components were tested, the source code was compiled into the

DLL and ready to be implemented into the main LabVIEW data acquisition setup.

2 This because the current implementation expects exactly this data type

3 For example, loops inLabVIEWhave to start at zero.[4]

(10)

Results

Describing the current setup

A high level overview

In the HiSPARC VI the data processing begins when data, consisting of a one dimensional array of unsigned 8-bit integers - from now on referred to as the ’trace’, from either the master or slave DAQ, is passed on from the Create Event VI to the

Process Traces VI. After receiving the data and confirming it is the data from the master, the data array is split in half. Each half corresponding to one of the two channels in the HiSPARC unit. For each channel the data is send to the Process Trace VIand besides the data, this VIis also told from which channel the data it received came from, some data filter options and a peak threshold. The Process Trace VI is responsible for filtering the data to remove the periodic clock synchronization pulse from the data and to return all properties of the trace. After receiving the trace the data is converted from the 8 bit integers to 16 bit integers⁴ and the next step is to actually filter the data. This filtering only happens if the Filter Data boolean is set to true, otherwise the unfiltered data is sent to the Calculate Baseline VI and all properties of the unfiltered trace are computed and returned. The next subsection gives a thorough description of how the data filter works.

The mean filter

After receiving the converted trace, this filter does the following: first the data array is split into two, one part containing all even values and the other part all odd values. So now each half consists of data corresponding to one ADC. The separate data is then transferred to the Mean Filter VI where the actual filtering happens, before it is added back into one array preserving the original order and then again filtered by the Mean Filter VI to completely remove the synchronization pulse. The mean filter is supplied with: the trace, a threshold and a use threshold boolean and consists of a for loop which iterates over the all elements of the data array. Starting with the zeroth iteration, it takes the first four elements and computes the average.

If the filter with threshold boolean is true then for each of the four elements the absolute value of the difference between the element and the average is computed and compared with the supplied threshold. The four booleans which follow from this comparing are AND compared and if the result is true the average is stored, if the result is false however the four original values are stored. If the filter with threshold

4 Remember theHiSPARC ADCstores all data as 12 bit integers. So by using the nearest possible integer value that exists i.e. 16 bits, one array element corresponds to a single measured value.

(11)

DESCRIBING THE CURRENT SETUP RESULTS

boolean is false the computed average is nevertheless stored. So after the zeroth iteration the output consists of four values, see Table 1. for all possible values.

filter with threshold

true false

All within threshold Store average

true false

Store average Store original values

Table 1: A summary of the different possible values of the first four elements in the output after the zero iteration of theMean Filter VI

For all other iterations if filter with threshold is false the filter takes four elements starting from the current index and XNOR⁵ compares the last two with the average of all four. If both values are on the same side of the average i.e. both lie below or above the average, the last element is added to the output. So in essence if there is no threshold used, all oscillations around an average are filtered out, but the significant pulses are not modified. If there is a threshold set and filter with threshold is true, the filter takes again four elements starting from the current index and computes the absolute value of the difference between the third and fourth element. If this value is larger than two times the threshold the fourth element is added to the array. If the value is less than the threshold the third and fourth element are XNOR compared with the average of all four, if the result is true the fourth element is added to the output. If this is not the case and the last two elements lay on opposite sides of the average then only if the absolute value of the difference of the last element and the average so far is bigger than the threshold the last element is added to the output, else the average of the four elements is added to the output. So if there is a threshold used, all oscillations around the average, but within the threshold are filtered out.

The significant pulses are not modified, see Table 2. for a summary.

filter with threshold

true false

All oscillations around the average, within the threshold, are filtered out.

All oscillations around the average are filtered out.

The significant pulses are not modified

Table 2: The two possible filtering options ofMean Filter VI listed with their effect.

Calculating the baseline and standard deviation

The next step, after filtering the trace, is to calculate the baseline and standard deviation. The trace, length and threshold are sent to Calculate Baseline VI. Here in a while loop, three booleans are AND compared. Firstly, the difference between the current and previous element is compared with a threshold; secondly, the difference between the current element and the average of all elements up to and including the current element is compared to the threshold; lastly, the current index is compared

5 XNOR is the logical negotiation of the Exculsive OR (XOR). This means that if both inputs are the same i.e. true or false, the output is true else the output is false.

(12)

PROBLEMS AND ISSUES WITH THE CURRENT SETUP RESULTS

against the length of the trace array. If all these booleans are true, the while loop continues and the three new booleans get compared again. If the result is false however, the loop stops and the current index is compared to fifty. If the loop iterated over more than fifty elements, the average over all element up to the current element⁶ is returned as the baseline. Now using this baseline the standard deviation is determined, by taking the squared difference between all elements - again up to the current element - and the baseline, averaging this result and then taking the square root. This standard deviation is then also returned. If however there are less than fifty element the loop iterated over, a value of -999 is returned for the baseline to indicate the occurrence of an error and the standard deviation returns 32767⁷ also indicating that an error occurred, see Table 3. Besides the baseline and standard deviation, the threshold and original trace are also returned and theCalculate Baseline VI is finished.

Return value for the Calculate Baseline VI

Baseline Standard deviation

No Error Error No Error Error

The average of all elements up to point of failure.

-999 The standard deviation of all elements with respect to the baseline up to point of failure.

32767

Table 3: All possible return values for the Calculate Baseline VI for both the baseline and standard deviation.

In the Process Trace VI the next step, after returning from the Calculate Baseline VI, is to see if the value of the baseline is equal to -999. If this is not the case the baseline and standard deviation are passed on to theCalculate Trace Variables VI. After this VIfinishes its execution it returns all properties of the trace for further analysis.

If however the value of the baseline is equal to -999 the VI uses a clever trick to still try and calculate the baseline. The trick is that it reverses the trace array, so the last element becomes the first element etc., and passes the new array again to the Calculate Baseline VI. For the standard deviation nothing changes, the original value is always passed on to theCalculate Trace Variables VIwhatever the value for the baseline may be.

Problems and issues with the current setup

Sometimes a single trace contains not one but two pulses, if these pulses lay on either end of the trace, a problem can occur. Firstly the Calculate Baseline VI starts at the beginning and starts computing the average. Say the while loop quits because of the first pulse, before iterating over more than fifty elements, then a value of -999 is returned. The Process Trace VI does its clever trick and reverses the trace array and starts computing the baseline once more. Now say the while loop quits again, before iterating over more than fifty elements, because of the second pulse. For the second time a value of -999 is returned. Because of the lack of a baseline the

6 Usually this element is the start of a pulse and the value of the element lies outside of the threshold thus results in the AND comparing returning false.

(13)

SOLVING THE ISSUES RESULTS

Calculate Trace Variables VI will return also return the -999 error value for some other properties of the trace, which depend on the baseline. Thus unless someone manually inspects the trace, its does not contribute to the further data analysis. And this scenario is one of the biggest problems of the current implementation. Another issue is that the value of the standard deviation never gets updated should the baseline be calculated a second time, should it return -999 the first time. This is not a desired effect since the standard deviation could easily change after reversing the trace array. The last implementation issue is found in the Calculate Baseline VI itself.

When comparing the value of the current element, in the main while loop, against the average, the average is taken over all elements including the current element.

This means that this difference is not the real distance between the baseline, up until the current element, and the element itself. So it could affect the value of the real baseline and thus forms an issue in the current implementation. A LabVIEW

related issue is that LabVIEW, being a graphical programming language, follows a dataflow model for runningVIs [6]. This combined with the sparse use of comments makes for hard to understand and to maintain code.

Solving the issues

In order to solve the aforementioned issues the decision was made to remove the

Calculate Baseline VI from the Process Trace VI and replace it with a DLL⁸[7]. Using a

DLLinstead of a LabVIEW VIhas many advantages. The first is that the functions in the DLLcan be written in a low-level programming language, giving full control over every aspect of the program and if done right delivers easy to read and maintain code. The second is that the DLL, because it is precompiled, executes faster than the original VI and thus the overall program gains performance. The third is that

LabVIEW shows the imported DLL as a ’Call Library Function Node’, this node only shows all incoming / outgoing parameters and the return value which reduces the visual clutter[8]. How the DLL is implemented in the new setup and how the implementation specific problems are solved will be explained in the next section.

8 A dynamic-link library orDLLis a set of functions, grouped together in a file, which can be used and reused by applications.

(14)

DESCRIBING THE NEW IMPLEMENTATION RESULTS

Describing the new implementation

An overview of the

DLL

The DLL was implemented in the programming language C⁹, a mature, relatively

”low level” general-purpose language[9] and contains all functions used to calculate the baseline and standard deviation. All the source code in theDLL is plain C, with only two exceptions. The first is that the main function, and its declaration, that will be used from the outside i.e. the function that receives the trace array and after some computation return the properties, is preceded by a special _declspec(dllexport) keyword[11][12]. This is needed to export the function so LabVIEW is able call the function and get the results. The second exception is that because thisDLLis used by

LabVIEW, the standard C data types cannot be used. LabVIEWuses its own manager functions and requires the extcode.h header file and its own data types[13]. So for example instead of the int data type, the int32_t data type should be used. This makes for a less portable DLL, but because this is only used by LabVIEW this is not a problem. The new implementation and compiling of the source code to the DLL

will be described in the following sections.

The new algorithm

The source code consists of three files: the main.c file containing the main function, which calculates and returns the baseline, a library file called sequence.c containing all functions used by the compareSequences function - which will be described in more detail later on - and the sequence.h header file containing all function declarations for sequence.c. The main.c file contains one ’main’ function, called findBaseline, this is the recursive function that receives the trace array returns a baseline and standard deviation. The function works as follows, firstly the supplied arguments are checked to make sure valid data is received. Secondly the function calculateBaseline is called. This function receives a start, end and threshold value and tries to calculate the baseline. It iterates over all elements as long the current element lies less than threshold away from the average, the previous and current element lay less than the double threshold value away from each other or we reach the end of the array. While it iterates over all elements it calculates and stores the average of all elements up until the current element. If the loop ends, by one of the three reasons mentioned above, a check is preformed to see how many elements where included in the baseline. If there are more than fifty elements included, a pointer to the rounded off baseline is set, the standard deviation - with respect to the baseline - is calculated and a pointer is assigned to this value¹⁰.The findBaseline function then returns -1. Sometimes a pulse occurs so early on in the trace that less than fifty elements are present in the baseline calculation, instead of reversing the array and trying to calculate the baseline again, the findBaseline function returns the element where the loop failed. Because the indices of an array should always be equal or bigger than zero, the decision was made to let -1 mean that a baseline is found, so every positive value can be used to indicate the baseline calculation stopped here. Table 4. summarizes return values of the findBaseline

9 For brevity we use C, but in fact we use the standard ANSI C99[10]

10Strictly speaking the pointer points to the memory address where the value of the standard

(15)

function. In the code the return value is stored in a variable, startOfError, and the next step is to compare this value to either -1 or <-1. If startOfError equals -1 the function returns and all is done, else if it is less than -1 the baseline and standard deviation are both set to -999 and the original error value is returned.

LabVIEW then uses these pointers to get access to the calculated values and use them for further processing.

Return value of the findBaseline function

< -1 -1 > -1

Some specific error occurred.

Found baseline and standard deviation and assigned pointers.

No baseline found, less than fifty elements where included in the baseline. Return the element where the loop failed.

Table 4: All summary of the possible return values the findBaseline function.

So at this point the only possible value of startOfError is positive, indicating the element where a pulse began. The value of this element is given as an argument to the function compareSequences, as defined in sequence.c. Another argument this function needs is the width of a sequence, here by sequence we mean a subset of the trace array. The compareSequences then does exactly as its name implies.

It defines two sequences, one starting from startOfError and one starting from startOfError + width both extending width elements. For each sequence, the average and standard deviation are calculated, grouped together and stored in a struct.

Next, after making sure the standard deviation of the first sequence falls within a limit,¹¹ it finds a¹² smooth sequence whereby the next sequence is less smooth by comparing the standard deviations of both sequences, and returns the starting value of that sequence. If there is no such sequence a value of INT_MAX¹³ is returned, indicating the inability to find a baseline, and hence the standard deviation. This is a recursive function, so if the second sequence is smoother than the first it just update the starting positions of the sequences and calls itself again, comparing the next two sequences. After returning, either a new starting value or INT_MAX, the final step is to check and see if this value is equal to INT_MAX. If so return -999 else findBaseline calls itself again but with a new starting value and everything starts from the beginning again. Note that for complexity we choose not to let the compareSequences function calculate the baseline, it just returns the starting position of a sequence it thinks contains a smooth baseline. It is the findBaseline function that in the next function call calculates the baseline, if possible of course.

The whole process is graphically represented in appendix A, figure 1.

11This is done because we want a smooth baseline and if the first sequence already contains very random data, we better shift to the next pair of sequences and compare them.

12Note: it does not say ’the’ because we are not interested in the smoothest sequence in the trace, just smooth sequence suffices.

13This is the biggest value a integer can have and will never should be considered a point where the baseline calculation could possibly fail.

(16)

Compiling the

DLL

After finishing the source code, the next step is to compile it to create the DLL. For the compiler Microsoft Visual Studio 2013 was used, this because the buiild- ing and compiling process was well documented[12]. The first step was to create a new Win32 Project and specifying the Application type as DLL and that the project should be empty. After moving all source files into the project directory, the

LabVIEW cintools directory was added to the Include Directories and Library Directories under Project>Properties>Configuration Properties>VC++

Directories because the DLL needs to have access to the extcode.h header file which is stored in the cintools directory[14]. Also the directory which contains the sequence.h header should be specified in the Additional Include Directories under Project>Properties>Configuration Properties>C/C++>General.

The last step before compiling is to set, in the Release configuration under Project>

Properties>Configuration Properties>C/C++ , the optimization to Max- imize Speed (/O2) and set Enable Intrinsic Functions to Yes (/Oi) ensuring the compiler generates to fastestDLLpossible[15]. After building theDLL, it is ready to be imported into LabVIEW.

Calling the

DLL

from

LabVIEW

Within LabVIEW the DLL is called using a Call Library Function node. After specifying the path to the DLL, setting the Calling convention to C declaring all parameters is the next step. All parameters the exported function, in the DLL, expects should be declared. The Call Library Function node itself looks like a tables where each row corresponds to a function parameter. A wire coming in from the left represent a value being passed to the DLL and a wire coming out of the right side represent a value being returned, either directly or via a pointer. The Call Library Function node with the DLL is shown in figure 4.

Figure 4: TheDLLas called by the Call Library Function node, all parameter of the exported function are shown as rows.

(17)

Because the DLL should run without any user intervention it is not possible to direct any errors or warnings to any sort of popup message, because in order to let theDLLcontinue its execution someone has to manually close that message. To solve this problem all errors and warning, for any VIin current use, are recorded in a log file by using a standard error out interface present in any VI. The format for this error out interface is a cluster containing a boolean to indicate the presence of an error, a integer containing the specific error value and a string containing the specific error message. We we’re not able to get this error out interface properly working in combination with the DLL, but because of the standard error format we could build our own. In the Call Library Function node, see figure 4, the first three arguments represent the separate error specifiers. The first argument, errorValue, is just the return value of the exported function and thus specifies the specific error value. The second argument, errorBoolean, indicates the occurrence of an error.

It is zero or, by comparing it to ’bigger than zero’, false if no error occurred and one, or true, on the occurrence of an error. The third argument, errorMessage, specifies some debugging information by giving some more information about the specific error. These three arguments are then combined into a cluster, which now has the standard error format, for further processing. Since those three arguments must be returned after the function finishes, they are all pointers¹⁴. This also means Lab- VIEW must know in advance how much memory to allocate for the errorMessage in other words we must initialize a string of the appropriate length to store the errorMessage in. Since LabVIEW has no inbuilt support for initializing a string, the string is initialized as an array of unsigned bytes¹⁵ to a length bigger than the longest errorMessage, as can be seen of the left side of the errorMessage row in figure 4. The next seven arguments, from startOfBaseline to minPointsInBaseline all represent arguments passed to the DLL, since they have no wires as output. The first argument startOfBaseline determines the starting element from which the baseline should be calculated. The second argument endOfBaseline determines the first element not be included in the baseline i.e. if the calculation of the baseline does not stop before reaching this element, the calculation will stop when that element is reached. This value can be set to the size of the trace array so it the baseline will be calculated until reaching the first, big enough, pulse as is currently implemented. The next argument, array specifies the trace array. The size argument specifies the length of the array¹⁶ and the threshold argument specifies the threshold used to determining the baseline. The next argument, widthOfSequence, specifies the width of a sequence as used when there are to few elements in the baseline, as specified by the minPointsInBaseline argument. So as the baseline gets determined, the moment a value is bigger than the threshold the calculation stops and if the baseline consists of less elements than minPointsInBaseline, the element where the calculation stopped is returned. Then using sequences of width widthOfSequence a new starting point for the calculation is searched for. If a baseline and standard deviation are found, these values are stored in the pBaseline and pStdev arguments. These two arguments are just return values as can be seen from the fact that they have no wires going as input.

14Since after the function finishes execution we no longer have access to any local variables.

15Since a string can be represented by an array of eight bit ASCII values.

16Necessary because in C we have no way of knowing the length of any array unless specified.

(18)

COMPARING THE CURRENT AND NEW IMPLEMENTATION RESULTS

Comparing the current and new implementation

In order to test the algorithm and compare it to the current implementation, not only traces were needed but also the value of the baseline and standard deviation as calculated by the current implementation. As HiSPARCstores all its raw data in the datastore in a binary Hierarchical Data Format, Version 5 (HDF5), it can easily be accessed by using a high levelPYTHON library, PyTables[2]. By using PyTables, the traces, matching baselines and standard deviations could be retrieved. Using another library, ctypes[16], a wrapper was created which is able to call the DLL, specifying a trace array and retrieving the baseline and standard deviation. So by using both libraries a program was written which used the PyTables library to retrieve traces and then using the ctypes wrapper to calculate the baseline and standard deviation of this trace using the DLL. By then retrieving the original baseline and standard deviation these original values could be compared to the calculated values. A log file was created which shows the matches i.e. both values match or mismatches by type e.g. only the baselines do not match. For a few of these comparisons the log files are shown in appendix B.

(19)

Discussion

After comparing the logs files containing the outcome of the comparison of the current and new implementation, see Appendix B for the actual logs, it can be seen that around eighty percent of all calculations match. This means that the, using the

DLL, calculated and stored value for both the baseline and the standard deviation have the same value. On the other hand this also means that around 17 percent of the time a mismatch occurred. A closer look at the log files then tells us that this is mainly due to mismatches in only the standard deviation. The mismatches in only the baseline or in both the baseline and standard deviation are almost negligible, but the latter is what you expect to see the most, since the standard deviation depends on the baseline, a difference between both baselines should result in a mismatch between the standard deviations. So this means that it most often occurs that the calculated baselines are the same, however the standard deviations are not. This could be explained by noting that the current implementation, if it fails to determine the baseline from the front of the baseline, returns the standard deviation as calculated from the front of the trace array. However the baseline then get calculated starting from the back of the trace array. At this point the standard deviation does not depend on the ’new’ baseline. In the new implementation if a new baseline is calculated, the standard deviation is also updated. If along the whole trace the baseline is fairly constant, this could result in baselines that match but where the standard deviations differ. To test this theory in all log files from Appendix B it was counted how often the number 32767 occurred - shown in the log files as ”Total stdev unable to calculate”. As this number indicates an error in calculating the standard deviation, what would happen if a baseline could not be found, If this number occurred a lot it would support the reasoning above and would be a reasonable explanation of the results. However on average the number 32767 occurred 8.2 times. As this number is so low it seems unlikely that the reasoning above is the main cause for the high number of standard deviation mismatches. A closer look at a plot of fifty traces where standard deviation mismatches occurred, see Appendix C figure 7, also does not show any very abnormal traces compared to a plot of fifty traces where both values match, see Appendix C figure 8. Another point of concern was that LabVIEW uses a rounding method consistent with IEEE Standard 754, which is also known as bankers rounding[17]. This means that a floating point number is rounded to the nearest integer, with the only exception that number that are halfway between two integers are rounded to the even integer.

For random data this can reduce, or for truly random data remove, the statistical error that would otherwise be introduced from constantly rounding up. So we made sure the DLL uses the same rounding method to avoid rounding errors.

(20)

Conclusion

The new algorithm that was described in this report, solves the problems that were encountered in the current baseline determining implementation. These problems were solved by designing a new baseline calculating algorithm, which was then build into a DLL. By using the ”low level” programming language C, a faster, more read- able and easier maintainable solution was build. A comparison between the new and current baseline determining implementation revealed mismatches between both implementations, however these mismatches could be implementation related and they are not necessarily error related. The next step would be to implement the newDLL

in the fullHiSPARCdata acquisition setup, so that it completely replaces the current baseline determining implementation. After a successful implementation, it would be recommended to reposition the Mean Filter VIin the full setup. Repositioning it would speed up the whole data analysis because it now iterates three time over a trace array¹⁷ and it is only being used to provide a smoother baseline for viewing, not necessary for any further analysis or calculation. If the filter is repositioned so that it is only called upon when a user wants to look at a trace, it speeds up the whole process and thus removes complexity out of the main program.

(21)

Bibliography

[1] Fokkema, D., The HiSPARCExperiment, data acquisition and reconstruction of shower direction. Dissertation, Universiteit Twente, p. 87, Figure 4.5

[2] Fokkema, D., The HiSPARCExperiment, data acquisition and reconstruction of shower direction. Dissertation, Universiteit Twente

[3] Fokkema, D., The HiSPARCExperiment, data acquisition and reconstruction of shower direction. Dissertation, Universiteit Twente, p. 31, Figure 2.1

[4] National Instruments, 2010, viewed 10 July 2014, http://zone.ni.com/reference/en-XX/help/371361G-

01/lvconcepts/for loop and while loop structures/

[5] National Instruments, 2012, viewed 11 July 2014, http://www.ni.com/gettingstarted/labviewbasics/environment.htm

[6] National Instruments, 2012, viewed 30 June 2014, http://www.ni.com/gettingstarted/labviewbasics/dataflow.htm

[7] Microsoft Developer Network, 2014, viewed 30 June 2014, http://msdn.microsoft.com/en-us/library/ms682589.aspx

[8] National Instruments, 2013, viewed 01 July 2014, http://zone.ni.com/reference/en-XX/help/371361K-

01/glang/call library function/

[9] Kernighan, B. W. & Ritchie, D. M. 1988, The C programming language Second edition, Prentice Hall, New Jersey

[10] Kochan, S. G. 2005, Programming in C,Third Edition, Sams Publishing, Indi- ana

[11] Microsoft Developer Network, 2014, viewed 01 July 2014, http://msdn.microsoft.com/en-us/library/a90k134d.aspx

[12] National Instruments, 2013, viewed 01 July 2014, http://www.ni.com/white- paper/3056/en/

[13] National Instruments, 2013, viewed 01 July 2014, http://zone.ni.com/reference/en-XX/help/371361K-

01/lvhowto/completing c file/

[14] LabVIEW Tricks 2013, Extending LabVIEW: creating a DLL in Visual Studio, online video, viewed 23 May 2014, http://youtu.be/1k1-YPu1ZAc

(22)

BIBLIOGRAPHY BIBLIOGRAPHY

[15] Microsoft Developer Network, 2014, viewed 02 July 2014, http://msdn.microsoft.com/en-us/library/k1ack8f1.aspx

[16] PYTHON Documentation, 2014, viewed 03 July 2014,

https://docs.python.org/2/library/ctypes.html#module-ctypes

[17] National Instruments, 2014, viewed 17 August 2014, http://digital.ni.com/public.nsf/allkb/7ED5A95B08D7DF37862565A800819D2D

(23)

Appendices

(24)

A

findBaseline

Error?

Return error value calculateBaseline

Return value Return specific error

Found baseline, return 0

compareSequences

No baseline found Update start and end values

yes no

< -1

-1

> -1

INT MAX

< INT MAX

Figure 5: A flowchart representation of the findBaseline function.

(25)

B

#

# This file contains the outcome of the comparison of the

# old and new implementation of the baseline filter

#

# 17:23:50 15-08-2014

# Station 501 at 18-06-2014

#

*---*

| statistics |

*---*

Total baseline only errors: 63 Total stdev only errors: 39042 Total baseline and stdev errors: 43 Total matches: 195668

Total traces looked at: 234816 Total stdev unable to calculate: 6

Percentage of baseline only errors: 0.0268295175797%

Percentage of stdev only errors: 16.626635323%

Percentage of baseline and stdev errors: 0.0183122104116%

Percentage of matches: 83.328222949%

(a) Station 501 at 18-06-2014

#

# 15:44:54 15-08-2014

# Station 501 at 10-06-2014

#

*---*

| statistics |

*---*

(b) Station 501 at 10-06-2014

#

# 15:16:19 15-08-2014

# Station 501 at 03-06-2014

#

*---*

| statistics |

*---*

(c) Station 501 at 03-06-2014

#

# 15:03:52 15-08-2014

# Station 501 at 31-05-2014

#

*---*

| statistics |

*---*

(d) Station 501 at 31-05-2014

#

# 11:35:41 15-08-2014

# Station 501 at 05-04-2014

#

*---*

| statistics |

*---*

(e) Station 501 at 05-04-2014

Figure 6: A collection of log files for station 501 at different days showing the outcome of the comparison between the current and new implementation.

(26)

C

Figure 7: A plot of fifty traces, for which the calculated and stored baselines are the same but the standard deviations differ.

(27)

C

Figure 8: A plot of fifty traces, for which both the calculated and stored baselines and standard deviations match.

data acquisition

HiSPARC data acquisition

Designing and implementing a new baseline determining algorithm

Jorian van Oostenbrugge 6306500

Onder begeleiding van:

Bob van Eijk Arne de Laat Tweede beoordelaar:

Marcel Vreeswijk

Nikhef

Natuur- en Sterrenkunde Universiteit van Amsterdam

August 29, 2014

Abstracts

Abstract

Popular scientific abstract

Contents

Introduction

Motivation

Theory

Data acquisition

LabVIEW terminology

Approach

Results

Describing the current setup

A high level overview

The mean filter

Calculating the baseline and standard deviation

Problems and issues with the current setup

Solving the issues

Describing the new implementation

An overview of the

The new algorithm

Compiling the

Calling the

from

Comparing the current and new implementation

Discussion

Conclusion

Bibliography

Appendices

A

B

C