PySciDON: a Python scientific framework for development of ocean network applications

(1)

by

Nathan Vandenberg

B.Sc., University of Victoria, 2011

A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of

MASTER OF SCIENCE

in the Department of Computer Science

c

Nathan Vandenberg, 2016 University of Victoria

(2)

PySciDON: A Python Scientific Framework for Development of Ocean Network Applications by Nathan Vandenberg B.Sc., University of Victoria, 2011 Supervisory Committee Dr. Y. Coady, Co-Supervisor (Department of Computer Science)

Dr. I. Barrodale, Co-Supervisor (Department of Computer Science)

Dr. M. Costa, Outside Member (Department of Geography)

(3)

Supervisory Committee

Dr. Y. Coady, Co-Supervisor (Department of Computer Science)

Dr. I. Barrodale, Co-Supervisor (Department of Computer Science)

Dr. M. Costa, Outside Member (Department of Geography)

ABSTRACT

The Salish Sea is a ecologically important coastal region located on the southwest part of British Columbia. Optical measurements were taken using a set of hyper-spectral radiometers, the SAS Solar Tracker developed by Satlantic. This sensor is installed on the Queen of Oak Bay ferry, that runs between Nanaimo and Vancouver, as part of the Ferry Ocean Colour Observation Systems (FOCOS) project. We devel-oped a computer program to process the raw sensor data and generate remote sensing reflectance (Rrs) values. This performs similar functions to Prosoft, Satlantic’s own software to process the data. However, we added new features such as an additional preprocessing step to filter the data based on longitude, and new meteorological flag testing and wind speed calculations. The system was tested using Pearson correla-tion to compare our output with the output from Satlantic Prosoft. Testing helped us identify a few issues, such as adding longitude flags to remove data at the start and end of the trip where the sensor could produce inaccurate results if aiming at land instead of water. Another issue was where the SAS Solar Tracker does not update its pointing angle fast enough when the ferry makes sharp turns and could result in inaccurate data.

(4)

List of Tables

Table 2.1 Calibration file sensor definition line definition . . . 8

Table 2.2 Calibration file data type keywords definition . . . 9

Table 2.3 Data calibration equations . . . 14

Table 4.1 Test Dataset Information . . . 56

Table 4.2 Pearson Correlation: Level 1a . . . 56

Table 4.3 Pearson Correlation: Level 1b . . . 57

Table 4.4 Pearson Correlation: Level 2 . . . 57

Table 4.5 Pearson Correlation: Level 2s . . . 58

(7)

List of Figures

1.1 Example of in situ Rrs data . . . 2

1.2 The Queen of Oak Bay showing SAS Solar Tracker in circle . . . 3

1.3 The SAS Solar Tracker . . . 3

1.4 Route taken by the Queen of Oak Bay ferry (in yellow) . . . 4

2.1 System flow diagram . . . 5

2.2 System architecture diagram . . . 6

2.3 Raw Binary File opened in Notepad++ . . . 7

2.4 Fixed frame calibration file opened in Notepad++ . . . 10

2.5 Variable frame calibration file opened in Notepad++ . . . 11

2.6 HDF Class diagram . . . 12

2.7 Raw File Header Example . . . 13

2.8 Before NIR correction . . . 17

2.9 After NIR correction . . . 17

3.1 Main-Controller Interface . . . 19

3.2 Code for Controller.processFilesMultiLevel() . . . 20

3.3 Code for Controller.processMultiLevel() . . . 20

3.4 Code for Controller.processL1a() . . . 21

3.5 Code for Controller.processL1b() . . . 21

3.6 HDFRoot Class . . . 22

3.7 HDF Group Class . . . 23

3.8 HDF Dataset Class . . . 24

3.9 Code to convert between Python dictionaries and NumPy arrays . . 25

3.10 Code for longitude flags, part 1 . . . 26

(8)

3.15 Code for reading a calibration file . . . 30

3.16 Code for reading a sensor definition line . . . 31

3.17 Code for reading a raw file . . . 32

3.18 Code for reading raw file header . . . 33

3.19 Code to read raw file, message frames . . . 34

3.20 Code for converting raw binary data to data type . . . 35

3.21 Code to build the HDFDataset from raw data . . . 36

3.22 Code to read DATETAG/TIMETAG2 data . . . 37

3.23 Code to retrieve datasets for calibration . . . 38

3.24 Code to perform calibration depending on fit-type . . . 39

3.25 Code to perform calibration . . . 40

3.26 Code to perform dark correction . . . 41

3.27 Code to perform interpolation using interp1d() . . . 42

3.28 Code to perform interpolation using InterpolatedUnivariateSpline() 42 3.29 Code to convert datasets format . . . 43

3.30 Code to perform time interpolation . . . 44

3.31 Code to choose dataset to interpolate to . . . 45

3.32 Code to perform wavelength interpolation . . . 46

3.33 Code to split dataset based on time interval . . . 47

3.34 Code to calculate Rrs, part 1 . . . 48

3.37 Code to check meteorological flags . . . 51

4.1 PySciDON’s plot of Level 2 LT values for June 21 . . . 60

4.2 PySciDON’s plot of Level 2s LT values for June 21 . . . 60

4.3 Prosoft’s plot of Level 2s LT values for June 21 . . . 60

4.4 HDF File Example . . . 62

4.5 HDF File Level 1a to Level 2 Datasets . . . 63

4.6 HDF File Level 2s to Level 3a Datasets . . . 64

(9)

ACKNOWLEDGEMENTS

I would like to thank: My parents for their support. My supervisors, Dr. Yvonne Coady and Dr. Ian Barrodale for their support and patience. Dr. Maycira Costa for the opportunity to do this project. And the people in the Remote Sensing and Spec-tral Lab for their encouragement and support. BC Ferries for providing access and allowing us to install the SAS Solar Tracker. Ocean Networks Canada for providing the interface to access the data from the ferries. And Satlantic for providing support for their instruments.

(10)

DEDICATION To my mom and dad.

(11)

Introduction

Remote sensing reflectance (Rrs) is commonly used in oceanographic research. Rrs measures the above-water spectral distribution of visible light reflected back from below the ocean surface. This is used to determine chlorophyll a concentration, which is a proxy for phytoplankton (Sauer, Roesler, Werdell, & Barnard, 2012) in order to determine primary productivity, the rate at which energy is converted into organic compounds through photosynthesis. Primary production is used as an indication of the general health of an ecosystem. It is particularly important for determining the health of coastal zones that contribute 90% of the world’s fisheries (Pauly et al., 2002). In general, the ability for ocean color satellite remote sensing to provide large spatial-temporal information allows for more accurate estimates of phytoplankton primary production over a larger area.

Ocean colour satellite remote sensing began in the late 1970s with the Coastal Zone Color Scanner (CZCS) on the Nimbus-7 satellite (McCain, Hooker, Feldman, & Bontempi, 2006). Since then there have been several other satellite launches with improved sensor technology (McClain, 2009). This includes the Wide Field-of-view Sensor (SeaWiFS) and Moderate Resolution Imaging Spectroradiometer (MODIS). The newest is the Sentinel-3 mission that was launched earlier in 2016 (Donlon, C., Berruti, B., Buongiorno, A., Ferreira, M., Féménias, P., Frerick, J., Goryl, P., Klein, U., Laur, H., Mavrocordatos, C., Nieke, J., Rebhan, H., Seitz, B., Stroede, J., & Sciarra, R., 2012).

Using satellites for remote sensing may work well for open waters, as shown in a study of the Atlantic Meridional Transect (Brewin, Dall’Olmo, Pardo, van Dongen-Vogels, & Boss, 2016) for example. But recent papers show how they are not as effective in coastal zones: There have been noted problems with atmospheric

(12)

cor-rection algorithms as well as detecting chlorophyll concentrations in coastal waters (Chen, Zhang, Cui, & Wen, 2013). There is also need for new tools and applications to deal with gaps in the current technology (Mouw, C. B., Greb, S., Aurin, D., Di-Giacomo, P. M., Lee, Z., Twardowski, M., Binding, C., Hu, C., Ma, R., Moore, T., Moses, W., & Craig, S. E., 2015).

Some of these problems are due to the lack of sufficient in situ reflectance data for validation of atmospheric correction of the satellite imagery (O’Reilly et al., 1998). Specifically in the Salish Sea on the south-west coast of British Columbia, Canada, reflectance data are limited to infrequent cruises by the Department of Fisheries and Oceans / Pacific Biological Station during the spring and summer. In situ reflectance data from the Ferry Ocean Color Observation Systems (FOCOS) that was recently installed on BC Ferries should help address these issues by allowing data to be col-lected daily. However, new tools are needed in order to batch process and apply corrections to the data. Figure 1.4 shows an example plot of in situ data, where the satellite data are shown with the solid lines and ferry data are shown with dashed lines.

The goal of the PySciDON project is to set up a framework for evaluating remote sensing techniques. This can be expanded in the future for testing various atmospheric correction algorithms from satellites and comparing them with shipboard data. The first part of the project, described in this thesis is developing the algorithm for cali-brating shipboard data acquired with the sensors (FOCOS) aboard the Queen of Oak Bay ferry.

(13)

1.1 Study Area and Data Acquisition

The Salish Sea is located on the southwest portion of British Columbia, Canada between Vancouver Island and the mainland. It is an an ecologically important area of study due to its importance as a habitat for many fish species including salmon (Beamish, Neville, Sweeting, & Lange, 2012). Sediment from several rivers including the Fraser River flow into the Salish Sea resulting in a nutrient rich environment (Thomson, 1981). The timing of the phytoplankton blooms in this area can vary from year to year. It is shown that the Spring phytoplankton bloom is linked to fish productivity, including salmon (Malick, Cox, Mueter, & Peterman, 2015) and herring (Schweigert et al., 2013).

Figure 1.2: The Queen of Oak Bay showing

SAS Solar Tracker in circle _{Figure 1.3: The SAS Solar Tracker}

Data are collected using the SAS Solar Tracker (Fig. 1.2, 1.3) installed on the Queen of Oak Bay ferry (Fig. 1.2) that runs between Vancouver and Nanaimo (Fig. 1.4). The SAS Solar Tracker is a radiometer developed by Satlantic for measuring above-water ocean colour. It is equipped with a set of hyperspectral radiometer sensors, a pyrometer, GPS, electronic tilt and compass sensor, and ship navigation feed. The hyperspectral sensors are capable of measuring light between 350 and 800 nm (ap-proximately 136 individual channels). They are used to measure spectral sea-surface radiance (Lt), sky radiance (Li), and irradiance (Es), that are required for computing remote sensing reflectance (Rrs). The SAS Solar tracker is mounted on a rotating platform that allows for autonomous tracking of the sun to provide the best viewing angle to minimize the effects of sun-glint on the data.

(14)

Figure 1.4: Route taken by the Queen of Oak Bay ferry (in yellow)

1.2 Problem Statement

The goal of this thesis is to create an automated system to process raw data acquired by the three sensors aboard the Queen of Oak Bay ferry, generate remote sensing reflectance measurements, and perform data quality checking on the results to detect erroneous data and remove sky-glint.

1.3 Document Structure

This section describes the structure of this document.

Chapter 1 contains the introduction, problem statement, and document structure. Chapter 2 describes the design of our system.

Chapter 3 describes implementation details about our system. Chapter 4 provides an evaluation of our system.

(15)

Chapter 2 Design

This section describes the design of PySciDON. This program is designed to be an alternative to Prosoft (Satlantic, 2016). Prosoft is the application developed by Sat-lantic in order to process raw data from their sensors to a final data product. Having an alternative allows us to extend it with additional features specific to our system such as meteorological flags. The main difference is that PySciDON only needs to support the SAS Solar Tracker, while Prosoft needs to work with all of Satlantic’s other instruments. PySciDON is designed to work in a similar way to Prosoft so that it is possible to evaluate how well PySciDON works compared to Prosoft. There are multiple levels of processing, as follows (Fig. 2.1):

(16)

Data are passed from one level to the next. Each level reads input files, performs processing, then generates output files. For level 1a, the input files are raw (*.raw) and calibration (*.cal/*.tdf) files. Otherwise, the input and output files are Hierarchical Data Format (HDF) files (The HDF Group, 1997-2016). HDF files are a common method of storing scientific data.

2.1 System Architecture

The system architecture (Fig. 2.2) was designed to follow the Model-View-Controller (MVC) design pattern (Krasner & Pope, 1988). The MVC design pattern is a method used to split up the problem and simplify the design of the system. It has a model which represents the data, a view that represents the user interface, and a controller that modifies the model (data) based on user input on the view (interface).

Figure 2.2: System architecture diagram

The above (Fig. 2.2) show how PySciDON is split into MVC format. The model is used to represent an HDF file using the HDFRoot, HDFGroup, and HDFDataset classes. The Main class functions as the view and calls the Controller to process the data. The Controller class links to several ProcessLX(X)(x) classes used to process data for each level.

(17)

2.2 File Format Specification

This section describes information on the file format of the input/output files used by PySciDON.

2.2.1 Raw Data File

Raw data files (Fig. 2.3) are binary files received as output from the instrument (SAS Solar Tracker). Raw data files are downloaded from the Ocean Networks Canada website (ONC, 2016). Each file consists of multiple message frames containing data from the instrument. Header frames appear at the top of the file, and are followed by multiple data frames. Every message frame consists of multiple fields of data, the exact format of each message frame is described by the sensor definition lines in the calibration files.

Figure 2.3: Raw Binary File opened in Notepad++

2.2.2 Calibration File

Calibration files are ascii text files describing the format of the message frames that appear in raw data files. These files are provided by Satlantic for use with their instruments. The calibration file format is described in the Satlantic Instrument File Standard document (Satlantic, 2011). Each file consists of multiple comments or sensor definition lines. Comment lines starts with ’#’ and are ignored. Sensor definition lines are used to determine the format of each message frame in the raw

(18)

data file. The Satlantic document describe each sensor definition line as a separate "sensor".

A sensor definition line consists of seven fields delimited by white space, and may be followed by multiple lines of coefficients used during calibration. A sensor defini-tion line has the following format:

TYPE ID UNITS FIELD-LENGTH DATA-TYPE CAL-LINES FIT-TYPE

The following table (Table 2.1) gives the definitions of each field in the sensor definition line.

Table 2.1: Calibration file sensor definition line definition

Field Description

TYPE Describes the type or name for the sensor described in this section of the message frame

ID Extra information about the sensor

UNITS Physical units after data are calibrated. This field is enclosed by single quotes.

FIELD-LENGTH For fixed length message frames, this is the size (in bytes) of the field in the message frame. If the message frame has variable size, this is set to ’V’.

DATA-TYPE Data keyword describing the format of the data.

CAL-LINES Number of calibration lines immediately following the sensor defi-nition line.

FIT-TYPE Fit-type keyword used to describe how the data should be pro-cessed during calibration.

(19)

The following table (Table 2.2) describes the keywords used by the data type field: Table 2.2: Calibration file data type keywords definition

Data-Type Description

BU Unsigned binary integer in Big-Endian format (most significant byte first). Field length must be between 1 to 4 bytes inclusive.

BULE Unsigned binary integer in Little-Endian format (most significant byte last). Field length must be between 1 to 4 bytes inclusive.

BS Signed binary integer in Big-Endian format (most significant byte first). Field length must be between 1 to 4 bytes inclusive.

BSLE Signed binary integer in Little-Endian format (most significant byte last). Field length must be between 1 to 4 bytes inclusive.

BF IEEE single-precision binary floating point number. Field length must be set to 4 bytes.

BD IEEE double-precision binary floating point number. Field length must be set to 8 bytes.

AI ASCII integer number. AF ASCII floating point number. AS ASCII text string.

(20)

Message Frame Header

Each calibration file describes a message frame in the raw file. A message frame is a block of binary data output from an instrument. In order to differentiate between different instruments, every Satlantic instrument outputs a different frame synchro-nization or frame header string. The frame header is the first series of bytes to appear in the message frame. The frame header can also be used to differentiate between fixed length and variable length message frames.

Fixed Length Frames

Fixed length frames (Fig. ??) start with a sensor definition line where the TYPE field is set to INSTRUMENT for instrument name. The ID field gives the frame header string that should appear at the start of the message frame. The INSTRUMENT sensor definition line may be optionally followed by an SN line for serial number. The ID field give the serial number string that should be appended to the frame header. In the example, the frame header string is "SATHSE0526".

(21)

Variable Length Frames

Variable length frames (Fig. ??) begin with sensor definition lines with TYPE of VLF_INSTRUMENT and optional VLF_SN, for instrument name and serial number respectively. These are used to generate the frame header string in the same way as fixed length frames. In the example, the frame header is "$GPRMC" as there is no VLF_SN sensor.

Variable length frame specify sensors in a similar way to fixed length frames. However, because the length is variable, a FIELD delimiter sensor is used to specify the character(s) used by the delimiter. Every FIELD delimiter sensor precedes a variable length sensor where the FIELD-LENGTH is set to ’V’. Finally, the last sensor should be a TERMINATOR sensor that specifies the characters that identify the end of a variable length frame.

(22)

2.2.3 Hieratical Data Format (HDF) Files

HDF files are generated after each processing level. HDF5 files contains two types of data structures: Datasets and Groups. Datasets are multidimensional arrays for stor-ing data. Groups are containers that hold datasets or other groups. Each calibration file is mapped to a single group. And every unique TYPE from the calibration file is mapped to a dataset in that group. HDF also has the concept of attributes that can be attached to Groups or Datasets to store additional information.

HDF files are converted to an in-memory representation by HDFRoot, HDFGroup, and HDFDataset classes. The HDFRoot object stores multiple HDFGroups, HDF-Groups stores multiple HDFDatasets, as shown in the following diagram (Fig. 2.6).

Figure 2.6: HDF Class diagram

2.2.4 Context Files

Prosoft has the ability to read a context file (*.cfs), which allows setting additional parameters on the context for each instrument. The instrument context describes additional information about the calibration files. For example, it would be possible to set which sensor calibration file uses light/dark data for doing shutter dark data correction. PySciDON features a limited ability to store context information as part of its config file, for describing whether the calibration file has light or dark data.

(23)

2.3 Data Processing

This section describes the design of the data processing component of PySciDON.

2.3.1 Level 0 - Data Preprocessing

This is an additional preprocessing step (not performed in Prosoft) to prepare the data. In case the raw data file contains multiple ferry trips, the raw data file is split into multiple raw files for each trip. Any trips where the ferry is travelling west are ignored (where the SAS Solar Tracker is facing the sun and data are likely compromised due to glint).

Some issues with the data are also fixed. If the ferry passes to close to land, the SAS Solar Tracker would detect radiance signal from land, called adjacency effect, and result in contaminated data. If the ferry makes a sharp turn, the SAS Solar Tracker does not react very effectively resulting in poor data. Both these cases occur at the start and end of the trip, a longitude flag is added remove the bad data.

2.3.2 Level 1a - Raw Data Extraction

The raw data file is converted into an HDF file. The top of the raw file consists of multiple header messages, followed by multiple message frames (Fig. 2.7). All header messages beings with a "SATHDR" string identifier and are 128 bytes in length. Once a header is read, it gets stored as an attribute attached to the root HDF object.

(24)

Before the message frames can be read, the format of the calibration files is de-termined. Every message frame begins with an frame header string that corresponds to INSTRUMENT and SN for fixed length frames, or VLF_INSTRUMENT and VLF_SN for variable length frames. Fixed length frames can be read directly from the calibration file because it specifies the number of bytes to be read as part of the FIELD-LENGTH field in the sensor definition lines. However, variable length frames are read using delimiters specified by FIELD delimiter sensors.

2.3.3 Level 1b - Data Calibration

Raw data are further calibrated according to the calibration file. The FIT-TYPE value from the sensor definition line in the calibration file is used to determine the method of calibration. Each sensor definition line has a CAL-LINES field that de-termine the number of coefficient lines following the sensor. Coefficients are constant values, and the number of coefficients in each line depends on the method of calibra-tion. The main calibration equations used in PySciDON are listed below (Table 2.3), for the full list see the Satlantic Instrument File Standard (Satlantic, 2011).

Table 2.3: Data calibration equations

Fit Type Coefficients Equations

OPTIC3 a0 a1 Im CInt y = Im a1 (x - a0) (CInt / AInt), if sensor is immersed

y = a1 (x - a0) (CInt / AInt), if sensor is not immersed

where: Im is the immersion coefficient used when the sensor is immersed, CInt is the integration time, and AInt is the adaptive integration time used during sensor sampling (in seconds). POLYU a0 a1 ... an y = n P k=0 akxk= a0x0+ a1x1+ ... + anxn COUNT N/A y = x

2.3.4 Level 2 - Dark Data Correction

Shutter dark data correction is a technique to remove sensor measurement noise from the data. Each sensor has both a light dataset and dark dataset (taken when the shutter is closed). A dark measurement is taken once per several frames of light data. Frame rates are dependant on the integration time of the device. Where the device integration time specifies how long the instrument waits while collecting light

(25)

measurements in order to maximize signal. To perform dark correction, the dark data must first be interpolated so the number of dark measurements matches the number of light measurements. Cubic spline interpolation (Dierckx, 1995) is used to generate a curve that more closely matches the optical properties of light. Then dark data correction is performed by subtracting the dark data from the light data for each wavelength.

2.3.5 Level 2s - SAS Time Interpolation

Each Li/Lt/Es sensor has LI/LT/ES datasets that includes the data, time stamps, and wavelength values. In order to perform the Rrs calculation, the time stamps and wavelength values must match. Because the sensors are calibrated differently, they have different time stamps and wavelength values. On this level, the data for two of the sensors are interpolated to match the time stamps of the third sensor. The sensor dataset chosen to interpolate to is the one with the slowest frame rate (in our case, the LT dataset). This method matches an unreleased beta version of Prosoft as well as (Brewin et al., 2016).

2.3.6 Level 3a - Wavelength Interpolation

Each Li/Lt/Es sensor store data for approximately 136 channels between 350 and 800 nm. Because each sensor is calibrated differently, the wavelengths stored are different for every sensor. Because each sensor needs to have matching wavelength values before performing the Rrs calculation. On this level, the data for each sensor is interpolated to matching wavelength values. New wavelength values that are chosen to interpolated to are the lowest and highest wavelength values contained in all three sensors, and a user specified interval.

2.3.7 Level 4 - Remote Sensing Reflectance Generation

In the final level, Remote Sensing Reflectance (Rrs) values are calculated for every channel. However, PySciDON does not use the same method as Prosoft because Prosoft only calculates one Rrs value per channel for the entire dataset. PySciDON uses a different method where the dataset is sliced into pieces according to a user specified interval and Rrs is calculated for each piece. In addition, this level imple-ments meteorological flags, wind speed calculations, and near infrared correction. Rrs

(26)

is calculated using the formula by (Mobley, 1999), see (Eq. 2.1). Based on work done by Hooker & Morel, the Rrs calculation uses values for the lowest 5% of Li so only data with minimum glint are considered (Hooker & Morel, 2003).

Rrs = (Lt − ρ_sky ∗ Li)/Es (2.1)

Meteorological Condition Flags

Meteorological flags are used to detect when data are affected by undesirable mete-orological conditions. This is based on work done by (Garaba, Schulz, Wernand, & Zielinski, 2012) and modified by (Wang et al., 2016). The following flags are imple-mented, where λ is the specified channel:

Es(λ=480) > 2µW cm−2nm−1 (2.2)

Threshold for significant Es (Eq. 2.2).

Es(λ=470)/Es(λ=680)> 1 (2.3)

For masking spectra affected by dawn/dusk radiation (Eq. 2.3).

Es(λ=720)/Es(λ=370) > 1.095 (2.4)

For masking spectra affected by rainfall or high humidity (Eq. 2.4). Wind Speed Calculation

Wind speed calculation are used to calculate ρ_sky for the Rrs equation. This is based on work done by (Ruddick, De Cauwer, Park, & Moore, 2006), where they give two equations depending on if the sky is clear (Eq. 2.5) or cloudy (Eq 2.6), where λ is the specified channel and W is the wind speed in meters per second:

ρ_sky = 0.0256 + 0.00039W + 0.000034W2 f orLi(λ=750)/Es(λ=750) < 0.05 (2.5)

(27)

Near Infrared Correction

This also implements near infrared (NIR) correction. It was shown that for clear waters, reflectance values at NIR can be taken as zero (Gordon & Wang, 1994). NIR correction can be applied at this level to remove contamination from sky and sun glint. Based on (Brewin et al., 2016), an average value for Rrs is calculated for wavelengths of 750 to 800 nm and the subtracted from the entire spectra. Figures 2.8 and 2.9 shows an example of an Rrs plot before and after applying NIR correction, respectively.

Figure 2.8: Before NIR correction Figure 2.9: After NIR correction

2.4 Design Tradeoffs

This section describes some of the design tradeoff encountered.

2.4.1 HDF4 vs HDF5

There are two versions of Hierarchical Data Format (HDF) files still in use, version 4 and 5 (The HDF Group, 1997-2016). Unfortunately, HDF5 is not compatible with the older HDF4 due to supporting different features. Most Python libraries are being developed for HDF5, but Satlantic Prosoft only outputs files in HDF4. Having the ability to read HDF4 files would be useful for comparing PySciDON’s output with Prosoft and would be useful for testing. Unfortunately, it turns out Prosoft is using a very old version of HDF4 (HDF 4.1 Release 3, May 1999) that is not compatible with the newer Python libraries for HDF4.

(28)

2.4.2 Other Tradeoffs

Because the system is designed to work on only a single instrument (SAS Solar Tracker), it does not have the full feature set as Satlantic Prosoft. Additional features were not implemented due to not having any ability to test them. These include the following:

• Support for unused calibration fit-types • CAL, BIN, or NULL dark correction • Profiler instrument support (Level 2) • Depth based interpolations (Level 2s)

• Support for other data products besides Remote Sensing Reflectance

Some features should be relatively easy to add support for if required (additional calibration fit-types, and methods of dark correction). Other features may be more difficult (support for profiler instruments, depth-based interpolations, and new data products).

2.5 Summary

In summary, PySciDON is designed using a Model-View-Controller (MVC) design pattern. The Main class calls the Controller in order to process the data. PySci-DON implements everything from level 1 to level 3a based on the implementation of Prosoft. This also allows us to compare the output of both programs for testing. However, there are some features specific to PySciDON. PySciDON implements a new level 0, preprocessing step that does longitude flag detection and SAS Solar Tracker angle detection. Level 4 is implemented based on Hooker and Morel, with additional slicing and quality flags. The next chapter goes into details on how PySciDON is implemented.

(29)

Chapter 3 Implementation

PySciDON is implemented using the Anaconda version of Python 3.0. Python is chosen because is a general purpose language with good support for data analysis. It also has access to several scientific libraries (Numpy, Scipy, etc.). PySciDON is implemented based on information from the Prosoft User Manual (Satlantic, 2016). The source code for PySciDON is available on GitHub (Vandenberg, 2016).

3.1 Main Class

The Main function acts as a view. It allows the user to process the data by calling the appropriate functions from the Controller class (Fig. 3.1).

# C a l l s C o n t r o l l e r to p r o c e s s c a l i b r a t i o n f i l e s

c a l i b r a t i o n M a p = C o n t r o l l e r . p r o c e s s C a l i b r a t i o n C o n f i g ( c o n f i g N a m e , c a l i b r a t i o n F i l e s ) # C a l l s C o n t r o l l e r to p e r f o r m single - l e v e l p r o c e s s i n g for the s p e c i f i e d l e v e l C o n t r o l l e r . p r o c e s s F i l e s S i n g l e L e v e l ( f i l e N a m e s , c a l i b r a t i o n M a p , l e v e l )

# C a l l s C o n t r o l l e r to p e r f o r m multi - l e v e l p r o c e s s i n g up to the s p e c i f i e d l e v e l C o n t r o l l e r . p r o c e s s F i l e s M u l t i L e v e l ( f i l e N a m e s , c a l i b r a t i o n M a p , l e v e l )

(30)

3.2 Controller Class

The Controller calls functions to process the data. Functions to process the data follow the same general format. The program first reads data in from an HDF file (or a raw file in level 1a). Then the data are processed according to the current level. Finally, the processed data are written back to disk as an HDF file.

Calling the processFilesMultiLevel() function causes the Controller to perform multi-level processing in all the files from a list of file paths (Fig. 3.2).

# U s e d to p r o c e s s e v e r y f i l e in a l i s t of f i l e p a t h s @ s t a t i c m e t h o d

def p r o c e s s F i l e s M u l t i L e v e l ( files , c a l i b r a t i o n M a p , l e v e l =4) : for fp in f i l e s :

C o n t r o l l e r . p r o c e s s M u l t i L e v e l ( fp , c a l i b r a t i o n M a p , l e v e l )

Figure 3.2: Code for Controller.processFilesMultiLevel()

The processMultiLevel() performs multi-level processing on the given raw file (Fig. 3.3). This processes the file up to level 4 depending on the level parameter.

@ s t a t i c m e t h o d def p r o c e s s M u l t i L e v e l ( fp , c a l i b r a t i o n M a p , l e v e l =4) : C o n t r o l l e r . p r o c e s s L 1 a ( fp , c a l i b r a t i o n M a p ) C o n t r o l l e r . p r o c e s s L 1 b ( fp , c a l i b r a t i o n M a p ) C o n t r o l l e r . p r o c e s s L 2 ( fp ) if l e v e l >= 2: C o n t r o l l e r . p r o c e s s L 2 s ( fp ) if l e v e l >= 3: C o n t r o l l e r . p r o c e s s L 3 a ( fp ) if l e v e l >= 4: w i n d S p e e d D a t a = C o n t r o l l e r . p r o c e s s W i n d D a t a ( fp ) C o n t r o l l e r . p r o c e s s L 4 ( fp , w i n d S p e e d D a t a )

(31)

The Controller calls the processLX(X)(x) functions in order to process the data. Each of these functions follows the same general format (Fig. 3.4). They first read the input file, then call the ProcessLX(X)(x) class to process the data, and finally write the output file. All input and output files are HDF files except for Level 1a, where the input file is a raw file.

@ s t a t i c m e t h o d def p r o c e s s L 1 a ( fp , c a l i b r a t i o n M a p ) : ( dirpath , f i l e n a m e ) = os . p a t h . s p l i t ( fp ) n a m e = os . p a t h . s p l i t e x t ( f i l e n a m e ) [0] f i l e p a t h = os . p a t h . j o i n ( dirpath , f i l e n a m e ) if not os . p a t h . i s f i l e ( f i l e p a t h ) : r e t u r n N o n e p r i n t ( " P r o c e s s L 1 a " ) r o o t = P r o c e s s L 1 a . p r o c e s s L 1 a ( c a l i b r a t i o n M a p , f i l e p a t h ) if r o o t is not N o n e : r o o t . w r i t e H D F 5 ( os . p a t h . j o i n ( dirpath , n a m e + " _ L 1 a . hdf " ) )

Figure 3.4: Code for Controller.processL1a()

Level 1b and above use HDF files as the input file (Fig. 3.5). The code reads the HDF file using HDFRoot.readHDF5(), calls the Controller.ProcessX(X)(x) function to process the data, then calls HDFRoot.writeHDF5() to write the output HDF file.

@ s t a t i c m e t h o d def p r o c e s s L 1 b ( fp , c a l i b r a t i o n M a p ) : ( dirpath , f i l e n a m e ) = os . p a t h . s p l i t ( fp ) f i l e n a m e = os . p a t h . s p l i t e x t ( f i l e n a m e ) [0] f i l e p a t h = os . p a t h . j o i n ( dirpath , f i l e n a m e + " _ L 1 a . hdf " ) if not os . p a t h . i s f i l e ( f i l e p a t h ) : r e t u r n N o n e p r i n t ( " P r o c e s s L 1 b " ) r o o t = H D F R o o t . r e a d H D F 5 ( f i l e p a t h ) r o o t = P r o c e s s L 1 b . p r o c e s s L 1 b ( root , c a l i b r a t i o n M a p ) if r o o t is not N o n e : r o o t . w r i t e H D F 5 ( os . p a t h . j o i n ( dirpath , f i l e n a m e + " _ L 1 b . hdf " ) )

(32)

3.3 HDF Classes

The HDF classes (HDFRoot, HDFGroup, and HDFDataset) are used to store the in-memory representation of an HDF file. There is a single HDFRoot class containing multiple HDFGroup objects, each HDFGroup contains multiple HDFDataset objects. Each class contains a read and write function that are called to read and write HDF files on disk. The HDFRoot class contain the root object of the HDF file (Fig. 3.6).

c l a s s H D F R o o t : def _ _ i n i t _ _ ( s e l f ) : s e l f . id = " " s e l f . g r o u p s = [] s e l f . a t t r i b u t e s = c o l l e c t i o n s . O r d e r e d D i c t () # R e a d i n g H D F 5 f i l e @ s t a t i c m e t h o d def r e a d H D F 5 ( fp ) : r o o t = H D F R o o t () w i t h h 5 p y . F i l e ( fp , " r " ) as f : # set n a m e to t e x t a f t e r l a s t ’/ ’ n a m e = f . n a m e [ f . n a m e . r f i n d ( " / " ) + 1 : ] if len ( n a m e ) == 0: n a m e = " / " r o o t . id = n a m e # R e a d a t t r i b u t e s for k in f . a t t r s . k e y s () : r o o t . a t t r i b u t e s [ k ] = f . a t t r s [ k ]. d e c o d e ( " utf -8 " ) # R e a d g r o u p s for k in f . k e y s () : i t e m = f . get ( k ) if i s i n s t a n c e ( item , h 5 p y . G r o u p ) : gp = H D F G r o u p () r o o t . g r o u p s . a p p e n d ( gp ) gp . r e a d ( i t e m ) e l i f i s i n s t a n c e ( item , h 5 p y . D a t a s e t ) : p r i n t ( " H D F R o o t s h o u l d not c o n t a i n d a t a s e t s " ) r e t u r n r o o t # W r i t i n g to H D F 5 f i l e def w r i t e H D F 5 ( self , fp ) : w i t h h 5 p y . F i l e ( fp , " w " ) as f : # W r i t e a t t r i b u t e s for k in s e l f . a t t r i b u t e s : f . a t t r s [ k ] = np . s t r i n g _ ( s e l f . a t t r i b u t e s [ k ]) # W r i t e g r o u p s for gp in s e l f . g r o u p s : gp . w r i t e ( f )

(33)

The HDFGroup class represents HDF group objects (Fig. 3.7). c l a s s H D F G r o u p : def _ _ i n i t _ _ ( s e l f ) : s e l f . id = " " s e l f . d a t a s e t s = c o l l e c t i o n s . O r d e r e d D i c t () s e l f . a t t r i b u t e s = c o l l e c t i o n s . O r d e r e d D i c t () def r e a d ( self , f ) : n a m e = f . n a m e [ f . n a m e . r f i n d ( " / " ) + 1 : ] s e l f . id = n a m e # R e a d a t t r i b u t e s for k in f . a t t r s . k e y s () : if t y p e ( f . a t t r s [ k ]) == np . n d a r r a y : s e l f . a t t r i b u t e s [ k ] = f . a t t r s [ k ] e l s e : # s t r i n g a t t r i b u t e s e l f . a t t r i b u t e s [ k ] = f . a t t r s [ k ]. d e c o d e ( " utf -8 " ) # R e a d d a t a s e t s for k in f . k e y s () : i t e m = f . get ( k ) if i s i n s t a n c e ( item , h 5 p y . G r o u p ) : p r i n t ( " H D F G r o u p s h o u l d not c o n t a i n g r o u p s " ) e l i f i s i n s t a n c e ( item , h 5 p y . D a t a s e t ) : ds = H D F D a t a s e t () s e l f . d a t a s e t s [ k ] = ds ds . r e a d ( i t e m ) def w r i t e ( self , f ) : try : f = f . c r e a t e _ g r o u p ( s e l f . id ) # W r i t e a t t r i b u t e s for k in s e l f . a t t r i b u t e s : f . a t t r s [ k ] = np . s t r i n g _ ( s e l f . a t t r i b u t e s [ k ]) # W r i t e d a t a s e t s for key , ds in s e l f . d a t a s e t s . i t e m s () : ds . w r i t e ( f ) e x c e p t : e = sys . e x c _ i n f o () [0] p r i n t ( e )

(34)

The HDFDataset class represents HDF dataset objects (Fig. 3.8). c l a s s H D F D a t a s e t : def _ _ i n i t _ _ ( s e l f ) : s e l f . id = " " s e l f . a t t r i b u t e s = c o l l e c t i o n s . O r d e r e d D i c t () s e l f . c o l u m n s = c o l l e c t i o n s . O r d e r e d D i c t () s e l f . d a t a = N o n e def r e a d ( self , f ) : n a m e = f . n a m e [ f . n a m e . r f i n d ( " / " ) + 1 : ] s e l f . id = n a m e # R e a d a t t r i b u t e s for k in f . a t t r s . k e y s () : if t y p e ( f . a t t r s [ k ]) == np . n d a r r a y : if t y p e ( f . a t t r s [ k ]. t o l i s t () [ 0 ] ) == b y t e s : s e l f . a t t r i b u t e s [ k ] = [ k . d e c o d e ( " utf -8 " ) for k in f . a t t r s [ k ]] e l s e : s e l f . a t t r i b u t e s [ k ] = [ k for k in f . a t t r s [ k ]] e l s e : if t y p e ( f . a t t r s [ k ]) == b y t e s : s e l f . a t t r i b u t e s [ k ] = f . a t t r s [ k ]. d e c o d e ( " utf -8 " ) e l s e : s e l f . a t t r i b u t e s [ k ] = f . a t t r s [ k ] # R e a d d a t a s e t s e l f . d a t a = f [:] # G e t s c o n v e r t e d to n u m p y . n d a r r a y def w r i t e ( self , f ) : if s e l f . d a t a is not N o n e : d s e t = f . c r e a t e _ d a t a s e t ( s e l f . id , d a t a = s e l f . data , d t y p e = s e l f . d a t a . d t y p e ) e l s e : p r i n t ( " D a t a s e t . w r i t e () : D a t a is N o n e " )

(35)

The HDFDataset class also contains functions for converting between Python dictionaries and Numpy arrays (Fig. 3.9).

# C o n v e r t s n u m p y a r r a y i n t o c o l u m n s ( s t o r e d as a P y t h o n d i c t i o n a r y ) def d a t a s e t T o C o l u m n s ( s e l f ) : s e l f . c o l u m n s = c o l l e c t i o n s . O r d e r e d D i c t () for k in s e l f . d a t a . d t y p e . n a m e s : s e l f . c o l u m n s [ k ] = s e l f . d a t a [ k ]. t o l i s t () # C o n v e r t s c o l u m n s i n t o n u m p y a r r a y def c o l u m n s T o D a t a s e t ( s e l f ) : if not s e l f . c o l u m n s : p r i n t ( " W a r n i n g - c o l u m n s T o D a t a s e t : c o l u m n s is e m p t y " ) r e t u r n d t y p e = [] for n a m e in s e l f . c o l u m n s . k e y s () : # N u m p y d t y p e c o l u m n n a m e c a n n o t be u n i c o d e in P y t h o n 2 if sys . v e r s i o n _ i n f o [0] < 3: n a m e = n a m e . e n c o d e ( ’ utf -8 ’ ) i t e m = s e l f . c o l u m n s [ n a m e ] [ 0 ] if i s i n s t a n c e ( item , b y t e s ) : d t y p e . a p p e n d (( name , " | S " + str ( len ( i t e m ) ) ) ) # N o t e : h d f 4 o n l y s u p p o r t s 32 bit int , c o n v e r t to f l o a t 6 4 e l i f i s i n s t a n c e ( item , int ) : d t y p e . a p p e n d (( name , np . f l o a t 6 4 ) ) e l s e : d t y p e . a p p e n d (( name , t y p e ( i t e m ) ) ) s h a p e = ( len ( l i s t ( s e l f . c o l u m n s . v a l u e s () ) [ 0 ] ) , ) s e l f . d a t a = np . e m p t y ( shape , d t y p e = d t y p e ) for k , v in s e l f . c o l u m n s . i t e m s () : s e l f . d a t a [ k ] = v

(36)

3.4 Process Level 0

Level 0 performs additional preprocessing to prepare the data. The processRaw() function reads a raw file and removes any data not within the specified startLongitude and endLongitude, and any data where the ferry is travelling in the unwanted direction (to avoid glint). The code first reads the raw binary file until a "$GPRMC" string is found (Fig. 3.10). @ s t a t i c m e t h o d def p r o c e s s R a w F i l e ( f i l e p a t h , dataDir , c a l i b r a t i o n M a p , s t a r t L o n g i t u d e , e n d L o n g i t u d e , d i r e c t i o n ) : # g p s S t a t e : set to 1 if w i t h i n s t a r t / end l o n g i t u d e , e l s e 0 g p s S t a t e = 0 # S a v e s the s t a r t i n g and e n d i n g GPS HDF G r o u p g p G P S S t a r t = N o n e g p G P S E n d = N o n e # S a v e s the raw f i l e h e a d e r h e a d e r = b " "

# I n d e x of s t a r t / end m e s s a g e b l o c k for s t a r t / end l o n g i t u d e i S t a r t = -1

i E n d = -1

Figure 3.10: Code for longitude flags, part 1

The program starts reading the raw file in blocks of size MAX_TAG_READ = 32 (Fig. 3.11). It searches the block for a string that starts with “$GPRMC”. If found it will read the GPS message, otherwise the file position is set to RESET_TAG_READ = 16 before reading the next block.

(37)

# S t a r t r e a d i n g raw b i n a r y f i l e w i t h o p e n ( f i l e p a t h , ’ rb ’ ) as f : w h i l e 1: # R e a d s a b l o c k of b i n a r y d a t a f r o m f i l e pos = f . t e l l () b = f . r e a d ( P r e p r o c e s s R a w F i l e . M A X _ T A G _ R E A D ) f . s e e k ( pos ) if not b : b r e a k # S e a r c h for f r a m e tag s t r i n g in b l o c k for i in r a n g e (0 , P r e p r o c e s s R a w F i l e . M A X _ T A G _ R E A D ) : t e s t S t r i n g = b [ i :]. u p p e r ()

# R e s e t f i l e p o s i t i o n on max r e a d ( f r a m e tag not f o u n d ) if i == P r e p r o c e s s R a w F i l e . M A X _ T A G _ R E A D -1: f . r e a d ( P r e p r o c e s s R a w F i l e . R E S E T _ T A G _ R E A D ) b r e a k # R e a d s h e a d e r m e s s a g e t y p e f r o m f r a m e tag if t e s t S t r i n g . s t a r t s w i t h ( b " S A T H D R " ) : if i > 0: f . r e a d ( i ) h e a d e r += f . r e a d ( P r e p r o c e s s R a w F i l e . S A T H D R _ R E A D ) b r e a k # R e a d s m e s s a g e s t h a t s t a r t s w i t h \ $ G P R M C and # r e t r i e v e s the C a l i b r a t i o n F i l e f r o m map e l s e : num = 0 for key in c a l i b r a t i o n M a p : cf = c a l i b r a t i o n M a p [ key ] if t e s t S t r i n g . s t a r t s w i t h ( b " \ $ G P R M C " ) and t e s t S t r i n g . s t a r t s w i t h ( cf . id . u p p e r () . e n c o d e ( " utf -8 " ) ) : if i > 0: f . r e a d ( i )

After the GPS tag is found, the program reads a block of size, MAX_BLOCK_READ = 1024 (Fig. 3.12). This is a large value that should contain the full GPS message. The GPS message is converted to an HDFGroup, which contains HDFDatasets storing GPS data.

(38)

# R e a d b l o c k f r o m s t a r t of m e s s a g e pos = f . t e l l () msg = f . r e a d ( P r e p r o c e s s R a w F i l e . M A X _ B L O C K _ R E A D ) f . s e e k ( pos ) # C o n v e r t m e s s a g e to H D F G r o u p gp = H D F G r o u p () num = cf . c o n v e r t R a w ( msg , gp ) # Set f i l e p o i n t e r to end of m e s s a g e if num >= 0: f . r e a d ( num )

The GPS dataset is read to determine if the data are within the start and end longitude (Fig. 3.13). The file position of the first and last GPS message within the start/end longitude is saved for future use.

if gp . h a s D a t a s e t ( " L O N P O S " ) : l o n D a t a = gp . g e t D a t a s e t ( " L O N P O S " ) l o n g i t u d e = l o n D a t a . c o l u m n s [ " N O N E " ] [ 0 ] # D e t e c t if we are in s p e c i f i e d l o n g i t u d e if l o n g i t u d e > s t a r t L o n g i t u d e and l o n g i t u d e < e n d L o n g i t u d e : if g p s S t a t e == 0: i S t a r t = pos g p G P S S t a r t = gp e l s e : i E n d = f . t e l l () g p G P S E n d = gp g p s S t a t e = 1 # Not w i t h i n s t a r t / end l o n g i t u d e e l s e : if g p s S t a t e == 1: P r e p r o c e s s R a w F i l e . c r e a t e R a w F i l e ( dataDir , g p G P S S t a r t , g p G P S E n d , d i r e c t i o n , f , header , iStart , i E n d ) g p s S t a t e = 0

If the data being read is no longer within the specified start/end longitude then the following code (Fig. 3.14) executes. This writes a new raw file containing all the data from within the start and end longitude to disk. It also checks if the ferry is travelling in the specified direction.

(39)

# C o n v e r t s d a t e f r o m i n t e g e r @ s t a t i c m e t h o d def d a t e F r o m I n t ( d ) : s = str ( int ( d ) ) . z f i l l (6) r e t u r n t i m e . s t r f t i m e ( " % Y % m % d " , t i m e . s t r p t i m e ( s , " % d % m % y " ) ) # C r e a t e s a raw f i l e c o n t a i n i n g d a t a w i t h i n s t a r t / end l o n g i t u d e @ s t a t i c m e t h o d

def c r e a t e R a w F i l e ( dataDir , g p G P S S t a r t , g p G P S E n d , d i r e c t i o n , f , header , iStart , i E n d ) : # D e t e r m i n e f i l e n a m e f r o m d a t e / t i m e s t a r t D a t e = str ( int ( g p G P S S t a r t . g e t D a t a s e t ( " D A T E " ) . c o l u m n s [ " N O N E " ] [ 0 ] ) ) e n d D a t e = str ( int ( g p G P S E n d . g e t D a t a s e t ( " D A T E " ) . c o l u m n s [ " N O N E " ] [ 0 ] ) ) s t a r t T i m e = str ( int ( g p G P S S t a r t . g e t D a t a s e t ( " U T C P O S " ) . c o l u m n s [ " N O N E " ] [ 0 ] ) ) . z f i l l (6) e n d T i m e = str ( int ( g p G P S E n d . g e t D a t a s e t ( " U T C P O S " ) . c o l u m n s [ " N O N E " ] [ 0 ] ) ) . z f i l l (6) # R e f o r m a t e d a t e to T i m e s t a r t D a t e = P r e p r o c e s s R a w F i l e . d a t e F r o m I n t ( s t a r t D a t e ) e n d D a t e = P r e p r o c e s s R a w F i l e . d a t e F r o m I n t ( e n d D a t e ) # D e t e r m i n e d i r e c t i o n l o n S t a r t = g p G P S S t a r t . g e t D a t a s e t ( " L O N P O S " ) . c o l u m n s [ " N O N E " ] [ 0 ] l o n E n d = g p G P S E n d . g e t D a t a s e t ( " L O N P O S " ) . c o l u m n s [ " N O N E " ] [ 0 ] c o u r s e = ’ W ’ if l o n S t a r t > l o n E n d : c o u r s e = ’ E ’ if d i r e c t i o n == c o u r s e : # C o p y b l o c k of m e s s a g e s b e t w e e n s t a r t and end pos = f . t e l l () f . s e e k ( i S t a r t ) m e s s a g e = f . r e a d ( iEnd - i S t a r t ) f . s e e k ( pos ) f i l e n a m e = s t a r t D a t e + " T " + s t a r t T i m e + " _ " + e n d D a t e + " T " + e n d T i m e + " . raw " p r i n t ( " W r i t e : " + f i l e n a m e ) # W r i t e f i l e d a t a = h e a d e r + m e s s a g e w i t h o p e n ( os . p a t h . j o i n ( dataDir , f i l e n a m e ) , ’ wb ’ ) as f o u t : f o u t . w r i t e ( d a t a )

The output for this stage is a raw file containing data from the middle of the trip (between the start and end longitudes). The cleaned raw files are now processed in level 1a.

(40)

3.5 Process Level 1a

Level 1a converts a raw data file into an HDF file. Before the raw data can be processed, the program must first read the calibration files to determine the format.

3.5.1 Reading Calibration Files

The program reads a calibration file (Fig. 3.15) and processes each line to determine if it is a comment line, or a sensor definition line. Comments lines start with a ’#’ character and are ignored. A sensor definition line is read into a list of CalibrationData structures. The program exits the loop after reaching the end of file.

c l a s s C a l i b r a t i o n F i l e : # R e a d s a c a l i b r a t i o n f i l e and g e n e r a t e s c a l i b r a t i o n d a t a def r e a d ( self , f ) : ( dirpath , f i l e n a m e ) = os . p a t h . s p l i t ( f . n a m e ) s e l f . n a m e = f i l e n a m e w h i l e 1: l i n e = f . r e a d l i n e () l i n e = l i n e . d e c o d e ( " utf -8 " ) if not l i n e : b r e a k l i n e = l i n e . s t r i p () # I g n o r e c o m m e n t s and e m p t y l i n e s if l i n e . s t a r t s w i t h ( " # " ) or len ( l i n e ) == 0: c o n t i n u e cd = C a l i b r a t i o n D a t a () cd . r e a d ( l i n e ) c d t y p e = cd . t y p e . u p p e r () # D e t e r m i n e s the f r a m e s y n c h r o n i z a t i o n s t r i n g by a p p e n d i n g # ids f r o m I N S T R U M E N T and SN l i n e s if c d t y p e == " I N S T R U M E N T " or c d t y p e == " V L F _ I N S T R U M E N T " or \ c d t y p e == " SN " or c d t y p e == " V L F _ S N " : s e l f . id += cd . id # R e a d in c o e f f i c i e n t s for i in r a n g e (0 , cd . c a l L i n e s ) : l i n e = f . r e a d l i n e () cd . r e a d C o e f f i c i e n t s ( l i n e ) s e l f . d a t a . a p p e n d ( cd )

(41)

This following code is used to read a sensor definition line (Fig. 3.16). A sensor definition line consists of seven parts separated by white space. The program sets field length to -1 if the message is a variable length frame, otherwise it is the size of the message in number of bytes. If there are additional coefficient lines then they are read after and stored in a list of coefficients.

c l a s s C a l i b r a t i o n D a t a : # R e a d s a s e n s o r d e f i n i t i o n l i n e f r o m a cal f i l e def r e a d ( self , l i n e ) : p a r t s = l i n e . s p l i t () s e l f . t y p e = p a r t s [0] s e l f . id = p a r t s [1] s e l f . u n i t s = p a r t s [ 2 ] [ 1 : - 1 ] s e l f . f i e l d L e n g t h = -1 if p a r t s [ 3 ] . u p p e r () == ’ V ’ e l s e int ( p a r t s [ 3 ] ) s e l f . d a t a T y p e = p a r t s [4] s e l f . c a l L i n e s = int ( p a r t s [ 5 ] ) s e l f . f i t T y p e = p a r t s [6] # R e a d s a c o e f f i c i e n t s l i n e in the cal f i l e def r e a d C o e f f i c i e n t s ( self , l i n e ) : s e l f . c o e f f i c i e n t s = l i n e . s p l i t ()

Figure 3.16: Code for reading a sensor definition line

After reading the calibration files to determine the format of the raw files, the program move on to processing the raw files.

(42)

3.5.2 Reading Raw Files

Reading a raw data file (Fig. 3.17) is very similar to how the file is processed in the preprocessing step, except it will search for the message ids for every calibration file. The code also appends a POSFRAME dataset containing the index of the message in the raw file, which is also added by Prosoft.

# D e t e c t s m e s s a g e t y p e f r o m f r a m e tag if t e s t S t r i n g . s t a r t s w i t h ( b " S A T H D R " ) : if i > 0: f . r e a d ( i ) hdr = f . r e a d ( R a w F i l e R e a d e r . S A T H D R _ R E A D ) ( k , v ) = R a w F i l e R e a d e r . r e a d S A T H D R ( hdr ) r o o t . a t t r i b u t e s [ k ] = v b r e a k e l s e : num = 0 for key in c a l i b r a t i o n M a p : cf = c a l i b r a t i o n M a p [ key ] if t e s t S t r i n g . s t a r t s w i t h ( cf . id . u p p e r () . e n c o d e ( " utf -8 " ) ) : if i > 0: f . r e a d ( i ) pos = f . t e l l () msg = f . r e a d ( R a w F i l e R e a d e r . M A X _ B L O C K _ R E A D ) f . s e e k ( pos ) gp = c o n t e x t M a p [ cf . id ] if len ( gp . a t t r i b u t e s ) == 0: gp . id = key gp . a t t r i b u t e s [ " C a l F i l e N a m e " ] = key gp . a t t r i b u t e s [ " F r a m e T a g " ] = cf . id num = cf . c o n v e r t R a w ( msg , gp ) if num >= 0: # G e n e r a t e P O S F R A M E ds = gp . g e t D a t a s e t ( " P O S F R A M E " ) if ds is N o n e : ds = gp . a d d D a t a s e t ( " P O S F R A M E " ) ds . a p p e n d C o l u m n ( u " C O U N T " , p o s f r a m e ) p o s f r a m e += 1 f . r e a d ( num ) b r e a k if num > 0: b r e a k

(43)

Raw data files begins with multiple header messages that start with a "SATHDR" identifier. Header message are in the format "SATHDR <value> (<name>)\r\n" and are 128 bytes in length. The following code is used to read header (Fig. 3.18).

def r e a d S A T H D R ( hdr ) :

end = hdr . f i n d ( b y t e s ( b " \ x0D \ x0A " . d e c o d e ( " u n i c o d e _ e s c a p e " ) , " utf -8 " ) ) sp1 = hdr . f i n d ( b " " ) sp2 = hdr . r f i n d ( b " " ) s t r 1 = hdr [ sp1 +1: sp2 ]. d e c o d e ( ’ utf -8 ’ ) s t r 2 = hdr [ sp2 +2: end - 1 ] . d e c o d e ( ’ utf -8 ’ ) if len ( s t r 1 ) == 0: s t r 1 = " M i s s i n g " r e t u r n ( str2 , s t r 1 )

Figure 3.18: Code for reading raw file header

After header messages, the program loops to read all data message frames (Fig. 3.19). There is a small difference depending on if the message frame is variable or fixed length. For variable frames, a message is read until a specified delimiter is found. A fixed length frame can be read directly. After reading a message frame, it is processed according to the data type.

(44)

c l a s s C a l i b r a t i o n F i l e : def c o n v e r t R a w ( self , b ) : n R e a d = 0 i n s t r u m e n t I d = " " for i in r a n g e (0 , len ( s e l f . d a t a ) ) : v = 0 cd = s e l f . d a t a [ i ] # R e a d v a r i a b l e l e n g t h m e s s a g e f r a m e s ( f i e l d l e n g t h == -1) if cd . f i e l d L e n g t h == -1: d e l i m i t e r = s e l f . d a t a [ i + 1 ] . u n i t s d e l i m i t e r = d e l i m i t e r . e n c o d e ( " utf -8 " ) . d e c o d e ( " u n i c o d e _ e s c a p e " ) . e n c o d e ( " utf -8 " ) end = msg [ n R e a d :]. f i n d ( d e l i m i t e r ) if end == 0: v = 0.0 e l s e : b = msg [ n R e a d : n R e a d + end ] v = cd . c o n v e r t R a w ( b ) n R e a d += end # R e a d f i x e d l e n g t h m e s s a g e f r a m e s e l s e : if cd . f i t T y p e . u p p e r () != " D E L I M I T E R " : if cd . f i e l d L e n g t h != 0: b = msg [ n R e a d : n R e a d + cd . f i e l d L e n g t h ] v = cd . c o n v e r t R a w ( b ) n R e a d += cd . f i e l d L e n g t h

(45)

The following code converts binary data according to the data type (Fig. 3.20). See the Satlantic Instrument File Standard for the list of data types (Satlantic, 2011).

c l a s s C a l i b r a t i o n D a t a : def c o n v e r t R a w ( self , b ) : v = 0 d a t a T y p e = s e l f . d a t a T y p e . u p p e r () if d a t a T y p e == " BU " : v = int . f r o m _ b y t e s ( b , b y t e o r d e r = ’ big ’ , s i g n e d = F a l s e ) e l i f d a t a T y p e == " B U L E " : v = int . f r o m _ b y t e s ( b , b y t e o r d e r = ’ l i t t l e ’ , s i g n e d = F a l s e ) e l i f d a t a T y p e == " BS " : v = int . f r o m _ b y t e s ( b , b y t e o r d e r = ’ big ’ , s i g n e d = T r u e ) e l i f d a t a T y p e == " B S L E " : v = int . f r o m _ b y t e s ( b , b y t e o r d e r = ’ l i t t l e ’ , s i g n e d = T r u e ) e l i f d a t a T y p e == " BF " : v = s t r u c t . u n p a c k ( " f " , b ) [0] e l i f d a t a T y p e == " BD " : v = s t r u c t . u n p a c k ( " d " , b ) [0] e l i f d a t a T y p e == " HS " : v = int ( b , 16) e l i f d a t a T y p e == " HU " : v = int ( b , 16) e l i f d a t a T y p e == " AI " : if s e l f . t y p e . u p p e r () == " N M E A _ C H E C K S U M " : v = int ( b , 16) e l s e : v = int ( b ) e l i f d a t a T y p e == " AU " : v = int ( b ) e l i f d a t a T y p e == " AF " : v = f l o a t ( b ) e l i f d a t a T y p e == " AS " : v = b e l s e : v = -1 p r i n t ( " d a t a T y p e u n k n o w n : " , d a t a T y p e ) r e t u r n v

(46)

The converted data are stored into HDF datasets depending on the data type (Fig. 3.21). # S t o r e s the i n s t r u m e n t id to c h e c k for D A T E T A G / T I M E T A G 2 if cd . t y p e . u p p e r () == " I N S T R U M E N T " or cd . t y p e . u p p e r () == " V L F _ I N S T R U M E N T " : i n s t r u m e n t I d = cd . id # S t o r e s raw d a t a i n t o hdf d a t a s e t s a c c o r d i n g to t y p e if cd . f i t T y p e . u p p e r () != " N O N E " and cd . f i t T y p e . u p p e r () != " D E L I M I T E R " : c d t y p e = cd . t y p e . u p p e r () if c d t y p e != " I N S T R U M E N T " and c d t y p e != " V L F _ I N S T R U M E N T " and \ c d t y p e != " SN " and c d t y p e != " V L F _ S N " : ds = gp . g e t D a t a s e t ( cd . t y p e ) if ds is N o n e : ds = gp . a d d D a t a s e t ( cd . t y p e ) ds . a p p e n d C o l u m n ( cd . id , v ) e l s e : if sys . v e r s i o n _ i n f o [0] < 3: # P y t h o n 3 gp . a t t r i b u t e s [ c d t y p e . e n c o d e ( ’ utf -8 ’ ) ] = cd . id e l s e : # P y t h o n 2 gp . a t t r i b u t e s [ c d t y p e ] = cd . id # N o n e t y p e s are s t o r e d as a t t r i b u t e s if cd . f i t T y p e . u p p e r () == " N O N E " : c d t y p e = cd . t y p e . u p p e r () if c d t y p e == " SN " or c d t y p e == " D A T A R A T E " or c d t y p e == " R A T E " : if sys . v e r s i o n _ i n f o [0] < 3: gp . a t t r i b u t e s [ c d t y p e . e n c o d e ( ’ utf -8 ’ ) ] = cd . id e l s e : gp . a t t r i b u t e s [ c d t y p e ] = cd . id

(47)

Some message frames also contain DATETAG and TIMETAG2 values that are not specified by sensor definition lines in the calibration files. They are read in by the following code (Fig. 3.22).

# S o m e i n s t r u m e n t s p r o d u c e a d d i t i o n a l b y t e s for # D A T E T A G (3 b y t e s ) , and T I M E T A G 2 (4 b y t e s ) if i n s t r u m e n t I d . s t a r t s w i t h ( " S A T H E D " ) or \ i n s t r u m e n t I d . s t a r t s w i t h ( " S A T H L D " ) or \ i n s t r u m e n t I d . s t a r t s w i t h ( " S A T H S E " ) or \ i n s t r u m e n t I d . s t a r t s w i t h ( " S A T H S L " ) or \ i n s t r u m e n t I d . s t a r t s w i t h ( " S A T P Y R " ) or \ i n s t r u m e n t I d . s t a r t s w i t h ( " S A T N A V " ) : # R e a d D A T E T A G b = msg [ n R e a d : n R e a d +3] if sys . v e r s i o n _ i n f o [0] < 3: v = int ( b i n a s c i i . h e x l i f y ( b ) , 16) e l s e : v = int . f r o m _ b y t e s ( b , b y t e o r d e r = ’ big ’ , s i g n e d = F a l s e ) n R e a d += 3 ds1 = gp . g e t D a t a s e t ( " D A T E T A G " ) if ds1 is N o n e : ds1 = gp . a d d D a t a s e t ( " D A T E T A G " ) ds1 . a p p e n d C o l u m n ( u " N O N E " , v ) # R e a d T I M E T A G 2 b = msg [ n R e a d : n R e a d +4] if sys . v e r s i o n _ i n f o [0] < 3: v = int ( b i n a s c i i . h e x l i f y ( b ) , 16) e l s e : v = int . f r o m _ b y t e s ( b , b y t e o r d e r = ’ big ’ , s i g n e d = F a l s e ) n R e a d += 4 ds1 = gp . g e t D a t a s e t ( " T I M E T A G 2 " ) if ds1 is N o n e : ds1 = gp . a d d D a t a s e t ( " T I M E T A G 2 " ) ds1 . a p p e n d C o l u m n ( u " N O N E " , v )

Figure 3.22: Code to read DATETAG/TIMETAG2 data

(48)

3.6 Process Level 1b

After reading the raw data into an HDF file, the data still needs to be calibrated according to FIT-TYPE from the calibration file. The first thing the program does is call processDataset() on the INTTIME dataset to obtain the calibrated integration time values. It needs to do this first because the calibrated integration time values are used when processing datasets with the OPTIC3 fit-type. After processing the INTTIME dataset, the program will call processDataset() to process all the other datasets (Fig. 3.23). @ s t a t i c m e t h o d def p r o c e s s G r o u p ( gp , cf ) : i n t t i m e = N o n e for cd in cf . d a t a : if cd . t y p e == " I N T T I M E " : ds = gp . g e t D a t a s e t ( " I N T T I M E " ) P r o c e s s L 1 b . p r o c e s s D a t a s e t ( ds , cd ) i n t t i m e = ds for cd in cf . d a t a : if gp . h a s D a t a s e t ( cd . t y p e ) and cd . t y p e != " I N T T I M E " : ds = gp . g e t D a t a s e t ( cd . t y p e ) P r o c e s s L 1 b . p r o c e s s D a t a s e t ( ds , cd , i n t t i m e )

(49)

Datasets are processed according to fit-type (Fig. 3.24). However, some fit-types are not used by our system so they are not implemented. There is also an optional immersed parameter, which is used to specify if the instrument is immersed under-water.

@ s t a t i c m e t h o d

def p r o c e s s D a t a s e t ( self , cd , i n t t i m e = None , i m m e r s e d = F a l s e ) : if cd . f i t T y p e == " O P T I C 1 " : p a s s # not i m p l e m e n t e d e l i f cd . f i t T y p e == " O P T I C 2 " : p a s s # not i m p l e m e n t e d e l i f cd . f i t T y p e == " O P T I C 3 " : s e l f . p r o c e s s O P T I C 3 ( cd , i m m e r s e d , i n t t i m e ) e l i f cd . f i t T y p e == " O P T I C 4 " : p a s s # not i m p l e m e n t e d e l i f cd . f i t T y p e == " T H E R M 1 " : p a s s # not i m p l e m e n t e d e l i f cd . f i t T y p e == " P O W 1 0 " : s e l f . p r o c e s s P O W 1 0 ( cd , i m m e r s e d ) e l i f cd . f i t T y p e == " P O L Y U " : s e l f . p r o c e s s P O L Y U ( cd ) e l i f cd . f i t T y p e == " P O L Y F " : s e l f . p r o c e s s P O L Y F ( cd ) e l i f cd . f i t T y p e == " D D M M " : p a s s e l i f cd . f i t T y p e == " H H M M S S " : p a s s e l i f cd . f i t T y p e == " D D M M Y Y " : p a s s e l i f cd . f i t T y p e == " T I M E 2 " : p a s s e l i f cd . f i t T y p e == " C O U N T " : p a s s e l i f cd . f i t T y p e == " N O N E " : p a s s e l s e : p r i n t ( " U n k n o w n Fit T y p e : " , cd . f i t T y p e )

(50)

The following functions are used to calibrated the data (Fig. 3.25). Hyperspectral sensor data (ES/LI/LT) are processed using processOPTIC3().

@ s t a t i c m e t h o d def p r o c e s s O P T I C 3 ( self , cd , i m m e r s e d , i n t t i m e ) : a0 = f l o a t ( cd . c o e f f i c i e n t s [ 0 ] ) a1 = f l o a t ( cd . c o e f f i c i e n t s [ 1 ] ) im = f l o a t ( cd . c o e f f i c i e n t s [ 2 ] ) if i m m e r s e d e l s e 1.0 c i n t = f l o a t ( cd . c o e f f i c i e n t s [ 3 ] ) k = cd . id for x in r a n g e ( s e l f . d a t a . s h a p e [ 0 ] ) : a i n t = i n t t i m e . d a t a [ cd . t y p e ][ x ] s e l f . d a t a [ k ][ x ] = im * a1 * ( s e l f . d a t a [ k ][ x ] - a0 ) * ( c i n t / a i n t ) @ s t a t i c m e t h o d def p r o c e s s P O W 1 0 ( self , cd , i m m e r s e d ) : a0 = f l o a t ( cd . c o e f f i c i e n t s [ 0 ] ) a1 = f l o a t ( cd . c o e f f i c i e n t s [ 1 ] ) im = f l o a t ( cd . c o e f f i c i e n t s [ 2 ] ) if i m m e r s e d e l s e 1.0 k = cd . id for x in r a n g e ( s e l f . d a t a . s h a p e [ 0 ] ) : s e l f . d a t a [ k ][ x ] = im * pow (10 , (( s e l f . d a t a [ k ][ x ] - a0 ) / a1 ) ) @ s t a t i c m e t h o d def p r o c e s s P O L Y U ( self , cd ) : k = cd . id for x in r a n g e ( s e l f . d a t a . s h a p e [ 0 ] ) : num = 0 for i in r a n g e (0 , len ( cd . c o e f f i c i e n t s ) ) : a = f l o a t ( cd . c o e f f i c i e n t s [ i ]) num += a * pow ( s e l f . d a t a [ k ][ x ] , i ) s e l f . d a t a [ k ][ x ] = num @ s t a t i c m e t h o d def p r o c e s s P O L Y F ( self , cd ) : a0 = f l o a t ( cd . c o e f f i c i e n t s [ 0 ] ) k = cd . id for x in r a n g e ( s e l f . d a t a . s h a p e [ 0 ] ) : num = a0 for a in cd . c o e f f i c i e n t s [ 1 : ] : num *= ( s e l f . d a t a [ k ][ x ] - f l o a t ( a ) ) s e l f . d a t a [ k ][ x ] = num

(51)

3.7 Process Level 2

This level applies shutter dark data correction to the data (Fig. 3.26). In order to perform shutter dark correction, the raw file should contain a light dataset and a dark dataset. Dark data frames are recorded when the sensors shutter is closed, and are used to determine a threshold value for ambient noise. Typically the SAS Solar Tracker records dark frames after every few light frames. After interpolating the dark data to match the light data, data can be corrected by subtracting the interpolated dark data from the light data.

@ s t a t i c m e t h o d def d a r k C o r r e c t i o n ( d a r k D a t a , d a r k T i m e r , l i g h t D a t a , l i g h t T i m e r ) : if ( d a r k D a t a == N o n e ) or ( l i g h t D a t a == N o n e ) : p r i n t ( " D a r k C o r r e c t i o n , d a t a s e t not f o u n d : " , d a r k D a t a , l i g h t D a t a ) r e t u r n # I n t e r p o l a t e D a r k D a t a s e t to m a t c h n u m b e r of e l e m e n t s as L i g h t D a t a s e t n e w D a r k D a t a = np . c o p y ( l i g h t D a t a . d a t a ) for k in d a r k D a t a . d a t a . d t y p e . f i e l d s . k e y s () : x = np . c o p y ( d a r k T i m e r . d a t a [ " N O N E " ]) . t o l i s t () y = np . c o p y ( d a r k D a t a . d a t a [ k ]) . t o l i s t () n e w _ x = l i g h t T i m e r . d a t a [ " N O N E " ] n e w D a r k D a t a [ k ] = U t i l i t i e s . i n t e r p ( x , y , new_x , ’ c u b i c ’ ) d a r k D a t a . d a t a = n e w D a r k D a t a # C o r r e c t l i g h t d a t a by s u b t r a c t i n g i n t e r p o l a t e d d a r k d a t a f r o m l i g h t d a t a for k in l i g h t D a t a . d a t a . d t y p e . f i e l d s . k e y s () : for x in r a n g e ( l i g h t D a t a . d a t a . s h a p e [ 0 ] ) : l i g h t D a t a . d a t a [ k ][ x ] -= n e w D a r k D a t a [ k ][ x ]

Figure 3.26: Code to perform dark correction

Linear interpolation uses the Scipy interp1d function (Fig. 3.27). Because bounds errors can occur if the new_x values are outside the range of the old x values, the following wraps the Scipy interp1d function to append additional x,y values if required (it’ll duplicate the closest x,y values).

(52)

@ s t a t i c m e t h o d def i n t e r p ( x , y , new_x , k i n d = ’ l i n e a r ’ ) : n0 = len ( x ) -1 n1 = len ( n e w _ x ) -1 if n e w _ x [ n1 ] > x [ n0 ]: x . a p p e n d ( n e w _ x [ n1 ]) y . a p p e n d ( y [ n0 ]) if n e w _ x [0] < x [ 0 ] : x . i n s e r t (0 , n e w _ x [ 0 ] ) y . i n s e r t (0 , y [ 0 ] ) n e w _ y = s c i p y . i n t e r p o l a t e . i n t e r p 1 d ( x , y , k i n d = kind , b o u n d s _ e r r o r = False , f i l l _ v a l u e = 0 . 0 ) ( n e w _ x ) r e t u r n n e w _ y

Figure 3.27: Code to perform interpolation using interp1d()

Spline interpolations are done using the Scipy InterpolatedUnivariateSpline() func-tion (Fig. 3.28). The k value sets the degree of the interpolafunc-tion, and is set to 3 for cubic spline interpolations. This function uses a compiled library, FITPACK, making it’s much faster than interp1d.

@ s t a t i c m e t h o d def i n t e r p S p l i n e ( x , y , n e w _ x ) : n0 = len ( x ) -1 n1 = len ( n e w _ x ) -1 if n e w _ x [ n1 ] > x [ n0 ]: x . a p p e n d ( n e w _ x [ n1 ]) y . a p p e n d ( y [ n0 ]) if n e w _ x [0] < x [ 0 ] : x . i n s e r t (0 , n e w _ x [ 0 ] ) y . i n s e r t (0 , y [ 0 ] ) n e w _ y = s c i p y . i n t e r p o l a t e . I n t e r p o l a t e d U n i v a r i a t e S p l i n e ( x , y , k =3) ( n e w _ x ) r e t u r n n e w _ y

PySciDON: a Python scientific framework for development of ocean network applications

Contents

List of Tables

List of Figures

Introduction

1.1

Study Area and Data Acquisition

1.2

Problem Statement

1.3

Document Structure

Chapter 2

Design

2.1

System Architecture

2.2

File Format Specification

2.2.1

Raw Data File

2.2.2

Calibration File

2.2.3

Hieratical Data Format (HDF) Files

2.2.4

Context Files

2.3

Data Processing

2.3.1

Level 0 - Data Preprocessing

2.3.2

Level 1a - Raw Data Extraction

2.3.3

Level 1b - Data Calibration

2.3.4

Level 2 - Dark Data Correction

2.3.5

Level 2s - SAS Time Interpolation

2.3.6

Level 3a - Wavelength Interpolation

2.3.7

Level 4 - Remote Sensing Reflectance Generation

2.4

Design Tradeoffs

2.4.1

HDF4 vs HDF5

2.4.2

Other Tradeoffs

2.5

Summary

Chapter 3

Implementation

3.1

Main Class

3.2

Controller Class

3.3

HDF Classes

3.4

Process Level 0

3.5

Process Level 1a

3.5.1

Reading Calibration Files

3.5.2

Reading Raw Files

3.6

Process Level 1b

3.7

Process Level 2