• No results found

Spatio-temporal modelling of urban sensor network data: mapping air quality risks in Eindhoven, the Netherlands

N/A
N/A
Protected

Academic year: 2021

Share "Spatio-temporal modelling of urban sensor network data: mapping air quality risks in Eindhoven, the Netherlands"

Copied!
150
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)SPATIO-TEMPORAL MODELLING OF URBAN SENSOR NETWORK DATA: MAPPING AIR QUALITY RISKS IN EINDHOVEN, THE NETHERLANDS. Veronella Maria van Zoest.

(2)

(3) SPATIO-TEMPORAL MODELLING OF URBAN SENSOR NETWORK DATA: MAPPING AIR QUALITY RISKS IN EINDHOVEN, THE NETHERLANDS. DISSERTATION. to obtain the degree of doctor at the University of Twente, on the authority of the rector magnificus, prof. dr. T. T. M. Palstra, on account of the decision of the Doctorate Board, to be publicly defended on Wednesday, January 15, 2020 at 14.45. by. Veronella Maria van Zoest born on August 5, 1992 in Voorburg, the Netherlands.

(4) This dissertation is approved by:. prof. dr. ir. A. Stein (supervisor) dr. F.B. Osei (co-supervisor) dr. ir. G. Hoek (co-supervisor). ITC dissertation number 372 ITC, P.O. Box 217, 7500 AE Enschede, The Netherlands ISBN: DOI: Printed by:. 978–90–365–4929–5 http://dx.doi.org/10.3990//1.9789036549295 ITC Printing Department, Enschede, The Netherlands. © Veronella Maria van Zoest, Enschede, The Netherlands All rights reserved. No part of this publication may be reproduced without the prior written permission of the author..

(5) Graduation committee Chair prof. dr. ir. A. Veldkamp Supervisor prof. dr. ir. A. Stein Co-supervisors dr. F.B. Osei dr. ir. G. Hoek Members prof. dr. R.V. Sliuzas prof. dr. R. Zurita Milla prof. dr. D.P. Lee prof. dr. ir. A.K. Bregt. University of Twente University of Twente University of Twente Utrecht University University of Twente University of Twente University of Glasgow Wageningen University. This research was conducted under the auspices of the Graduate School for Socio-Economic and Natural Sciences of the Environment (SENSE)..

(6)

(7) Summary. Low-cost urban air quality sensor networks are increasingly used to study the spatio-temporal variability in air pollutant concentrations. In Eindhoven, the Netherlands, a low-cost air quality sensor network was set up in 2013 as part of the civil initiative AiREAS. The aim of this thesis is to evaluate the data quality of the collected data and its usability in spatio-temporal modelling and health effect estimation.. The first research objective addresses outliers. Those could reflect measurement errors or unusually high or low air pollution events. In this first chapter I present a novel outlier detection method based upon a spatio-temporal classification. The focus is on hourly nitrogen dioxide (NO2 ) concentrations, as NO2 has a large spatio-temporal variability and strong association with health effects. Different spatio-temporal classes are defined, reflecting urban background vs. urban traffic stations, weekdays vs. weekends and four periods per day. Truncated normal distributions are used to set thresholds for the definitions of outliers in each spatio-temporal class. Based on this study, I conclude that this method is able to detect outliers while maintaining the spatio-temporal variability of air pollutant concentrations in urban areas.. The second research objective addresses the calibration of low-cost sensor networks. Field calibration is typically performed at one location, while little is known about the spatial transferability of correction factors. This chapter evaluates three calibration methods: (1) an iterative Bayesian approach for daily estimation of the parameters in a multiple linear regression model, (2) a daily updated correction factor and (3) a correction factor updated only when concentrations are uniformly low. Performance of the calibration methods is compared in terms of temporal stability, spatial transferability, and sensor specificity. A poor spatial transferability of the calibration parameters was found for all methods. This is consistent with different responses of individual sensors to environmental factors such as temperature and relative humidity. Due to their spatial and temporal variability, calibration parameters require regular updates and sensor-specific recalibrations. i.

(8) Summary The third research objective addresses prediction of air pollutant concentrations at unobserved locations. Spatio-temporal regression kriging was applied to map NO2 at a 25 m spatial resolution and hourly temporal resolution. The trend is modelled separately from autocorrelation in the residuals. The trend part of the model consists of a set of spatial and temporal covariates including population density, road type and meteorological variables. Spatio-temporal autocorrelation in the residuals is modelled by fitting a sum-metric spatio-temporal variogram model. The method provides local estimates of the strength and association of air pollution sources and sinks, and allows for near real-time prediction of air pollutant concentrations. The resulting maps visualize these in space and time and can be used to assess exposure for the evaluation of short-term health effects. The fourth research objective addresses health effect estimates related to air pollution, focusing on daily respiratory symptoms in children with asthma. Bayesian estimates of the exposure-response function were obtained by updating a priori information from a meta-analysis with data from a panel study. Positive associations between NO2 and lower respiratory symptoms and medication use were observed. Credible intervals substantially narrowed when adding prior information from the meta-analysis. Burden of disease maps showed a strong spatial variability in the number of asthmatic symptoms associated with ambient NO2 . Bayesian methods provided accurate local air pollution effect estimates and subsequent local burden of disease calculations. To summarize, this thesis evaluates the use of low-cost air quality sensor network data from data collection to application. After careful evaluation of the data quality and removal of outliers, it shows that the data can be used to map air pollutant concentrations at a fine spatial and temporal resolution. These maps can be used to estimate burden of disease at the within-city level. Future research may address a wide range of applications, including sensor network development, policy making, and further health risk assessment.. ii.

(9) Samenvatting. Relatief goedkope stedelijke netwerken voor het meten van luchtkwaliteit worden steeds vaker gebruikt om de ruimtelijk-temporele variatie in concentraties luchtvervuilende stoffen te bestuderen. In Eindhoven is in 2013 een dergelijk netwerk opgezet geïnitieerd door het burgerinitiatief AiREAS. Het doel van dit proefschrift is het evalueren van de kwaliteit van de sensordata die met dit netwerk verzameld zijn alsmede de bruikbaarheid van deze gegevens, zowel voor het ontwikkelen van ruimtelijk-temporele modellen, als voor het maken van schattingen voor effecten van luchtkwaliteit op de gezondheid. Het eerste onderzoeksdoel behandelt uitschieters. Deze kunnen worden veroorzaakt door meetfouten of door gebeurtenissen die leiden tot ongebruikelijk hoge of lage concentraties van luchtvervuilende stoffen. In dit eerste hoofdstuk presenteer ik een nieuwe detectiemethode voor uitschieters. Deze is gebaseerd op een ruimtelijk-temporele classificatie. Ik heb ervoor gekozen om de methode toe te passen op stikstofdioxide (NO2 ) concentraties, omdat die een hoge ruimtelijk-temporele variatie laten zien en een sterke associatie hebben met gezondheidseffecten. Ruimtelijk-temporele klassen zijn gedefinieerd als reflectie van de dagelijkse variatie in verkeersdrukte, weekdagen vs. weekenddagen, en achtergrond vs. verkeerslocaties in de stad. De grenswaarden voor de definities van uitschieters in iedere klasse zijn gebaseerd op de afgeknotte normale verdeling. De studie laat zien dat de methode in staat is om uitschieters te detecteren met behoud van de ruimtelijk-temporele variatie in luchtvervuilende stoffen in een stedelijk gebied. Het tweede onderzoeksdoel behandelt de kalibratie van relatief goedkope netwerken voor het meten van luchtkwaliteit. Kalibratie wordt vaak op één locatie gedaan, terwijl weinig bekend is over de ruimtelijke verplaatsbaarheid van de correctiefactoren. Dit hoofdstuk evalueert drie kalibratiemethoden: (1) een iteratieve Bayesiaanse benadering voor de dagelijkse inschatting van de parameters in een multivariaat lineair regressiemodel, (2) een dagelijks aangepaste correctiefactor en (3) een correctiefactor die slechts aangepast wordt wanneer de concentraties uniform en laag zijn. De kalibratiemethoden zijn met elkaar vergeleken op basis van temporele stabiliteit, ruimtelijke verplaatsbaarheid, en speiii.

(10) Samenvatting cificiteit van de sensoren. Alle methoden laten een beperkte ruimtelijke verplaatsbaarheid zien, die in overeenstemming is met de verschillende gevoeligheden van de individuele sensoren voor omgevingsfactoren zoals temperatuur en relatieve luchtvochtigheid. Op basis van de ruimtelijke en temporele variabiliteit in de kalibratieparameters adviseert dit hoofdstuk om reguliere aanpassingen en sensor-specifieke herkalibraties toe te passen. Het derde onderzoeksdoel behandelt de voorspellingen van concentraties luchtvervuilende stoffen op locaties waar geen metingen zijn gedaan. Om NO2 in kaart te brengen met een ruimtelijke resolutie van 25 m en een temporele resolutie van 1 uur, is ruimtelijk-temporele regressiekriging toegepast. Deze methode modelleert de trend apart van de autocorrelatie in de residuen. Het trenddeel bestaat uit ruimtelijke en temporele variabelen, zoals bevolkingsdichtheid, type van de weg en meteorologische variabelen. Met behulp van een ruimtelijk-temporeel variogram is de autocorrelatie in de residuen gemodelleerd. De methode verbetert lokale schattingen van de sterkte en associatie van factoren die van invloed zijn op de luchtvervuiling, en maakt near real-time voorspellingen van luchtvervuilende stoffen mogelijk. De resulterende kaarten kunnen worden gebruikt bij de schatting van korte termijn gezondheidseffecten. Het vierde onderzoeksdoel behandelt het schatten van gezondheidseffecten ten gevolge van luchtvervuiling, met een focus op de dagelijkse variatie in luchtwegsymptomen bij kinderen met astma. Bayesiaanse schattingen van de blootstelling-responsfunctie zijn verkregen door a priori informatie van een meta-analyse te verrijken met gegevens uit een panelstudie. De resultaten suggereren positieve associaties tussen NO2 en lagere luchtwegklachten en medicijngebruik. De betrouwbaarheidsintervallen zijn sterk verkleind door het gebruik van a priori informatie uit de meta-analyse. Kaarten van de gezondheidsbelasting tonen een sterke ruimtelijke variabiliteit in het aantal astmasymptomen gerelateerd aan NO2 in de buitenlucht. Bayesiaanse methoden geven accurate schattingen van de lokale luchtvervuilingseffecten en daarmee nauwkeuriger berekeningen van de gezondheidslast. Samenvattend evalueert dit proefschrift het gebruik van gegevens die met relatief goedkope netwerken voor het meten van luchtkwaliteit zijn verkregen. Na een zorgvuldige evaluatie van de gegevenskwaliteit en het verwijderen van uitschieters, laat het proefschrift zien dat de gegevens gebruikt kunnen worden om met een hoge ruimtelijk-temporele resolutie de concentraties luchtvervuilende stoffen in kaart te brengen.. iv.

(11) Acknowledgements. You will have to stand on the shoulders of giants to become a giant yourself. In the past four years I grew a lot to become an independent researcher. This would not have been possible without the people around me. My special thanks go to professor Alfred Stein, who has always supported me through the mountainous terrain of my PhD. You pushed me forward and gave me confidence during the hard times in the valleys, but also pulled me back when I ran too fast up the hill. I cannot wish for a promotor more skilled and devoted to his PhD students. When I had to spend a year without a daily supervisor, you showed your flexibility to take this role as well, efficiently and always there when needed. Many thanks go to Frank Osei, who took over the role of daily supervisor during the last year of my PhD. You encouraged me to work quickly and efficiently and boosted my self-confidence to climb the highest hills. We never planned a meeting, because your door was always open. Many thanks also go to Gerard Hoek, who always seemed to remember exactly what I was working on although our meetings were less frequent. You taught me many lessons in environmental epidemiology, but also in clear and transparent writing. Thanks for hosting me when I was working on the last research objective. I thank NWO for funding this research project and putting emphasis on the usability of the results. This encouraged me to focus on the practical applicability of the research outcomes and on communication of the results with a wider audience than just the scientific community. My gratitude goes to the members of the user group, who have been actively participating in user group meetings and always accessible when I needed more information or data. My stay at ITC has been very pleasant. For this I thank my colleagues and friends, with whom I shared many valuable experiences. I enjoyed the scientific and non-scientific discussions a lot and thank you for the memories. Because of you I have made friends all over the world, who will make me feel at home wherever I travel. Thanks especially to my officemates for creating a pleasant climate to work in. Peaceful and quiet v.

(12) Acknowledgements while working, but always there to discuss problems or to have a friendly chat. The supporting staff of ITC has also contributed to a pleasant working environment. Besides the many people that assisted me in one way or the other with administrative matters or technical issues, I would especially like to mention Roelof Schoppers for the welcoming smile every day, and Teresa Brefeld and Marga Koelen for all their support. My thanks also go to Marieke Oldenwening at Utrecht University for her support during the panel study. Doing a PhD keeps your mind working for much longer than eight hours a day. I thank my friends and family for their understanding. I especially thank my parents for their support, contributions and valuable discussions from a local perspective. My sincerest gratitude goes to Hugo for being patient and supporting, and following me wherever my path leads me. You kept loving me in stressful times and always encouraged me to get the best out of myself. Who knows, one day I might be a giant myself. Thanks to the support of the giants around me, I am confident that I will keep growing for many years to come.. vi.

(13) Contents. Summary. i. Samenvatting. iii. Acknowledgements. v. Contents. vii. List of Figures. ix. List of Tables. xi. List of Symbols. xiii. 1 Introduction 1.1 Motivation . . . . . . . . . . . . . . 1.2 Spatio-temporal big data analysis 1.3 Spatial data quality . . . . . . . . . 1.4 Modelling and mapping . . . . . . 1.5 Uncertainty . . . . . . . . . . . . . . 1.6 Air pollution: sources and sinks . 1.7 Health effects . . . . . . . . . . . . . 1.8 Limit values and guidelines . . . . 1.9 Problem statement . . . . . . . . . 1.10 Research objectives . . . . . . . . . 1.11 Outline . . . . . . . . . . . . . . . . . 1.12 Author contributions . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. 1 . 1 . 2 . 3 . 4 . 6 . 6 . 7 . 8 . 9 . 10 . 10 . 11. 2 Case study area 2.1 Study area: the city of Eindhoven 2.2 The ILM air quality network . . . . 2.3 Spatial sampling scheme . . . . . . 2.4 Airbox sensors . . . . . . . . . . . . 2.5 Reference measurements . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. 13 13 13 14 15 16. 3 Outlier detection in urban air quality sensor networks 19 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 vii.

(14) Contents 3.2 3.3 3.4 3.5 3.6. Data preprocessing Methods . . . . . . . Results . . . . . . . . Discussion . . . . . Conclusions . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. 4 Calibration of NO2 sensors in an urban 4.1 Introduction . . . . . . . . . . . . . . 4.2 Methods . . . . . . . . . . . . . . . . . 4.3 Results . . . . . . . . . . . . . . . . . . 4.4 Discussion and conclusions . . . . .. air . . . . . . . .. 5 Spatio-temporal regression concentrations 5.1 Introduction . . . . . . 5.2 Methods . . . . . . . . . 5.3 Application . . . . . . . 5.4 Results and discussion 5.5 Conclusions . . . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. quality . . . . . . . . . . . . . . . . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. 22 22 27 32 36. network . . . . . . . . . . . . . . . . . . . . . . . .. . . . .. . . . .. 37 38 39 47 55. . . . . .. 65 66 67 69 71 78. kriging for modelling urban NO2 . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. 6 Short-term impact of matic symptoms 6.1 Introduction . . 6.2 Methods . . . . . 6.3 Results . . . . . . 6.4 Discussion . . . 6.5 Conclusions . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. 81 83 84 91 97 98. 7 Synthesis 7.1 Main findings 7.2 Significance . 7.3 Limitations . 7.4 Prospects . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. 103 103 104 106 108. Bibliography. viii. . . . . .. . . . .. . . . .. NO2 exposure on local burden of asth-. 111.

(15) List of Figures. 2.1 Location of Eindhoven within the Netherlands . . 2.2 Locations of the airboxes in the city of Eindhoven, lands . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Airbox attached to light pole . . . . . . . . . . . . .. . . . . . . . . 14 the Nether. . . . . . . . 15 . . . . . . . . 16. 3.1 Locations of the airboxes measuring NO2 at urban background locations and urban traffic locations . . . . . . . . . . . . . . . . 3.2 Distribution of NO2 concentrations, before square root transformation and after square root transformation . . . . . . . . . 3.3 The truncated normal distribution of square root transformed NO2 concentrations and its underlying normal distribution . . 3.4 Boxplots of the outliers in each spatio-temporal class . . . . . . 3.5 NO2 concentrations measured at an urban background location. 3.6 NO2 concentrations measured at an urban traffic location. . . . 3.7 NO2 concentrations measured by a conventional monitor at an urban traffic location. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Scatterplot of traffic airbox outliers vs. the maximum NO2 concentration measured by the two conventional monitors. . . 3.9 Comparison of NO2 concentrations measured by an airbox and a conventional monitor on a weekday at urban traffic locations 4.1 Locations of the airboxes and conventional monitors . . . . . . 4.2 Scatterplots of hourly NO2 values in 2016: airbox vs. conventional monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Difference between mean airboxes and mean conventional monitors over time and fitted smooth curves. . . . . . . . . . . 4.4 Time series of the coefficients of the daily INLA models, using model 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Posterior mean estimates of different airboxes, for airbox NO2 vs. covariates ‘relative humidity’ and ‘temperature’ . . . . . . . 4.6 Posterior distributions of slopes for reference monitor NO2 per covariate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A4.1 Histogram and uniform(0,1) Q-Q plot of the PIT values on June 15, 2016 at location z1 . . . . . . . . . . . . . . . . . . . . . . . . . .. 24 25 25 29 30 30 32 32 35 40 47 49 51 54 55 59 ix.

(16) List of Figures A4.2 Time series of the relative correction factors γr el,d,z and absolute correction factors γabs,d,z . . . . . . . . . . . . . . . . . . . . A4.3 Time series of the correction factor γuni . . . . . . . . . . . . . . A4.4 Time series of the correction factor γnight,d,s . . . . . . . . . . . A4.5 Scatterplots before calibration, after calibration without random effects, and after calibration with random effects . . . . . A4.6 Residual plots for INLA models without random effects and with random effects . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Locations of the airboxes used for modelling NO2 in November 2016 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Spatio-temporal sample variogram and sum-metric fitted variogram model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Prediction maps of NO2 concentrations at four time stamps on Monday November 7, 2016 . . . . . . . . . . . . . . . . . . . . 5.4 Prediction maps of NO2 concentrations at four Sundays in November 2016, between 5 and 6 p.m. . . . . . . . . . . . . . . . 5.5 Kriging variance map . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Flowchart of daily asthma symptom calculation . . . . . . . . . 6.2 Results of the meta-analysis on NO2 and lower respiratory symptoms (LRS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Results of the meta-analysis on NO2 and cough . . . . . . . . . 6.4 (a) Number of children per neighborhood. (b) Mean NO2 exposure in 2016. (c) NO2 attributable cases of LRS per day . . . . . 6.5 Posterior densities of (a) odds ratio of lower respiratory symptoms (LRS), and (b) the attributable cases of LRS in neighborhood ’t Hofke. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 60 60 61 62 63. 70 74 75 76 77 84 92 93 95. 96. 7.1 Framework of this thesis . . . . . . . . . . . . . . . . . . . . . . . . 105. x.

(17) List of Tables. 1.1 WHO guidelines in comparison with EU directives and Dutch national law on the concentration limit for different pollutants. 8. 2.1 Variables measured and instruments used in the airboxes . . . 17 3.1 Upper thresholds for hourly average NO2 concentrations above which considered outliers, per spatio-temporal class . . . . . . 28 3.2 Percentage outliers per spatio-temporal NO2 concentration class for hourly values in 2016 . . . . . . . . . . . . . . . . . . . . 28 A3.1 Mean (± standard deviation) of the distribution underlying the truncated normal distribution of each spatio-temporal class . 36 4.1 Overview of potential covariates for the calibration model . . . 4.2 DIC performance statistics for different models . . . . . . . . . 4.3 RMSE before and after temporal and spatiotemporal calibration using different models . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 RMSE values before and after applying a daily correction factor 4.5 RMSE values before and after updating the correction factor γuni . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 RMSE before and after night-time calibration . . . . . . . . . . . A4.1 RMSE before and after temporal calibration, for different lengths of the calibration dataset . . . . . . . . . . . . . . . . . . . . . . . A4.2 RMSE for the models with random effects vs. without random effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 41 50 52 52 53 53 59 61. ˆ and p-values for the fixed effects part of the regression model 73 5.1 β 5.2 Spatio-temporal variogram parameter estimates for the fitted sum-metric variogram. . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.1 Uninformative priors for Bayesian estimation of the parameters in the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.2 Frequency of reported daily symptoms in the panel study . . . 91 6.3 Association between NO2 and daily symptoms, expressed as odds ratios (95% C.I.) based on panel study without informative prior information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 xi.

(18) List of Tables 6.4 Comparison between prior OR based on meta-analysis, local OR based on uninformative prior, and OR based on informative prior. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A6.1 Odds ratios (95% C.I.) related to a 10 µg m-3 increase in NO2 ambient air pollution, based on REML estimation of the model parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A6.2 Odds ratios (95% C.I.) related to a 10 µg m-3 increase in PM1 ambient air pollution, based on REML estimation of the model parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A6.3 Odds ratios (95% C.I.) related to a 10 µg m-3 increase in PM2.5 ambient air pollution, based on REML estimation of the model parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A6.4 Odds ratios (95% C.I.) related to a 10 µg m-3 increase in PM10 ambient air pollution, based on REML estimation of the model parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A6.5 Odds ratios (95% C.I.) related to a 10000 particle # increase in UFP ambient air pollution, based on REML estimation of the model parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . A6.6 Comparison of June, November, mean June and November, and annual mean concentrations measured at the two RIVM reference monitors in Eindhoven, 2016 . . . . . . . . . . . . . . . A6.7 Association between NO2 and daily symptoms, expressed as odds ratios (95% C.I.) based on panel study and prior information from the meta-analysis on LRS and cough . . . . . . . . . .. xii. 94. 99. 99. 100. 100. 100. 100. 101.

(19) List of Symbols. # α β0 ˆ0 β β0,d,z ˆ0,d,z β βc,d,z βNO2,y ˆc,d,z β γ Γ γ0 γabs,d,z γnight,d,s γr el,d,z γuni γv δ ∆d,t NO2 ∆xq−0 d,t,z ζ η ˆ η ¯ η θ1 θ2 θc,d,z κ λ0 µ ˆ µ. Number Significance level Intercept Estimated intercept Intercept at d, z Estimated intercept at d, z Regression coefficient of covariate c at d, z NO2 regression coefficient for symptom y Estimated regression coefficient of covariate c at d, z Semivariance Semivariance matrix of all possible combinations of space-time observations Vector of semivariances between observation locations and prediction location Absolute correction factor at d, z Nightly established correction factor at d, s Relative correction factor at d, z Correction factor at uniformly low concentrations Individual intercept per participant v Threshold for standard deviation NO2 difference between mean of airboxes and mean of conventional monitors, at day d and hour t NO2 exposure in q above baseline of zero  Error at d, t, z, d,t,z ∼ N 0, σ 2 Confidence interval size indicator Residual Predicted residual Vector of observed space-time residuals −i −i , sK , a, b} Set of parameters {mK −i −i Set of parameters {nK , tK } Set of parameters {µβc,d,z , τβc,d,z } Space-time anisotropy parameter Vector of kriging weights Mean / trend Predicted mean / trend xiii.

(20) List of Symbols µβc,d,z σ σ2 σ02 σγ σc 2 σjoint σs2 σt2 σy τβc,d,z 2 τjoint τs2 τt2 φ (·) Φ (·) φjoint φs φt χ ψ a ACcity ACq AFq ap b c C d ¯ D d+1 E E east f lu gc (·) h i j k K L (·) −i mK nK n−i K xiv. Mean of the posterior distribution of βc,d,z Standard deviation Variance Kriging variance Standard deviation of γv Standard deviation of covariate c Joint partial sill Spatial partial sill Temporal partial sill Standard deviation of βNO2,y Precision of the posterior distribution of βc,d,z Joint nugget Spatial nugget Temporal nugget Probability density function Cumulative distribution function Joint range Spatial range Temporal range Threshold for mean Fixed intercept Lower truncation limit Total number of attributable cases in the city Number of attributable cases in q Attributable fraction in q Air pollutant in set {NO2, P M10, P M2.5, P M1, U F P } Upper truncation limit Covariate Set of covariates Day Posterior mean of deviance Day after day d Approximation Expected value Easting coordinate Reported flu Covariate transformation function Spatial separation distance Index Index Observation index in {1 . . . NK } Spatio-temporal class Log likelihood function Mean of all observations in K except the ith Mean of underlying normal distribution of observations in K Mean of underlying normal distribution of observations.

(21) List of Symbols. N N+∞ NK Nqchild pop Nq Ns Nt Nz NO2. [ NO2 NO2ab NO2k NO2r ef O3 OR p pasthma pD Pq pv,d P M1 P M2.5 P M10 pop PR q Q r ˆ R RH r oad RMSEpost RMSEpr e RRq r w1 s S s0 sK−i sday t T t0 Td tK. in K except the ith Normal distribution Half-normal distribution truncated at zero Total number of observations in spatio-temporal class K Number of children in q Number of inhabitants in q Total number of airbox locations Total number of timestamps Total number of conventional monitor locations Nitrogen dioxide Predicted NO2 concentration Observed airbox NO2 concentration NO2 observation with index k Observed reference NO2 concentration Ozone Odds ratio P-value Proportion of children with asthma Effective number of parameters Population at risk in q Probability of occurrence of symptom yv,d Particulate matter <1 µm Particulate matter <2.5 µm Particulate matter <10 µm Population density Prevalence rate Neighborhood index Environmental variable Pearson’s correlation coefficient Gelman-Rubin diagnostic Relative humidity Road type RMSE after calibration RMSE before calibration Relative risk in q First-order random walk Spatial location / airbox location Set of airbox locations {s1 . . . s25 } Spatial prediction location Standard deviation of all observations in K except the ith Day of study participation Hour / timestamp Temperature Temporal prediction location Total number of non-missing hours in day d Standard deviation of underlying normal distribution of observations in K xv.

(22) List of Symbols tK−i u UF P Unif v WD wday WS xc xc,d,t,z ¯NO2,d,t x xNO2,d,t,s xNO2,d,t,z xk xk,i y y −d,t ∗ yd,t yd,t yd,t,z. ˆd,t,z y yNO2,d,t,z ˆNO2,d,t,z y yv,d z Z zi zj. xvi. Standard deviation of underlying normal distribution of observations in K except the ith Temporal separation distance Ultrafine particles Uniform distribution Participant index Wind direction Week/weekend day factor Wind speed Value of covariate c Value of covariate c at d, t, z Mean NO2 concentration of all airboxes at d, t NO2 concentration value at d, t, s NO2 concentration value at d, t, z Square root transformed NO2k ith observation of xk Vector of all square root transformed NO2 observations Vector of all square root transformed NO2 observations except d, t Replicate square root transformed NO2 observations at d, t Square root transformed NO2 observation at d, t Square root transformed reference NO2 concentration at d, t, z Estimate of yd,t,z Reference NO2 concentration at d, t, z Predicted and backtransformed NO2 concentration at d, t, z Symptom prevalence of participant v on day d Location of conventional monitor Set of conventional monitor locations {z1 , z2 } Location of conventional monitor i Location of another conventional monitor j.

(23) 1. Introduction. 1.1 Motivation Air pollution has major effects on human health (Cohen et al., 2017) and globally causes about 7 million premature deaths each year (World Health Organization (WHO), 2015). Whilst the air pollution levels increase in developing countries due to rising industry, developed countries are taking measures to reduce emissions. Even in developed countries, target limit values are not as strict as those suggested by the WHO to minimize health impacts (WHO, 2006). Meanwhile, the WHO states that the suggested guidelines cannot fully protect human health, as there are no lower limits known below which no health effects occur. A better quantification of health effects at lower air pollution levels is therefore required. To quantify health risks related to ambient air pollution levels, a good estimate of personal exposure is required. Since personal exposure monitoring is expensive and time-consuming (Brandt et al., 2015), it is rarely used and limited to short study periods (e.g. Linn et al., 1996; Spira-Cohen et al., 2011). Typically, health risk assessments are based on central monitors of national ambient air quality monitoring networks (Roemer et al., 1993; Van der Zee et al., 1999, 2000; Dales et al., 2009). In Europe, these monitoring networks are operated by national environmental agencies and comply with high quality standards (EC Working Group on GDE, 2010), with the aim to evaluate exceedance of limit values determined by European guidelines (European Parliament and Council of the European Union, 2008). Although national ambient air quality monitoring networks can be used to obtain high quality measurements, their spatial coverage is limited. Due to the high costs of the instruments, maintenance and calibration, typically only one or two monitors are located in each city. However, air pollution levels typically strongly vary within short distances. This spatial variation is strongest in urban areas where there is a strong variety of road types, traffic intensities and land uses (Hoek et al., 2008). To increase the spatial coverage of air pollution measurements within a 1.

(24) 1. Introduction city, low-cost urban air quality sensor networks have recently been set up by civil initiatives (Snyder et al., 2013).. 1.2 Spatio-temporal big data analysis Smart city sensor networks can capture multivariate data at multiple spatial locations and at a high temporal frequency (e.g. every 10 minutes). Over the span of multiple years, these data add up to big data sets. These data have a number of challenges when treating them statistically. Depending on the intended use, data may first have to be selected within a spatial and temporal window of interest. The resolution should fit the intended use and data quality, for which aggregation may be needed. To detect anomalies in the data that could point at errors or observations of interest, automatic filtering techniques may be useful. Next, one may be interested in the detection of spatio-temporal patterns and relations to build prediction models. To do so, data could be analyzed in a spatiotemporal statistical framework. Finally, the results of the spatio-temporal data analysis should be visualized to present and communicate results with users and stakeholders. We consider the continuous multivariate spatio-temporal field Q(s, t), where measurements of environmental variables Q are taken at any spatial location s and time stamp t (Caselton and Zidek, 1984; Sølna and Switzer, 1996). Here, a spatial location is a three-dimensional set of spatial coordinates. In practice, however, the vertical height of the measurement locations in sensor networks is typically kept constant and ignored in subsequent analyses. A spatio-temporal statistical framework allows to assess the data quality of a measurement at (s, t), e.g. by comparing to reference measurements or the expected value at (s, t) given spatio-temporal patterns in the data. Next, it allows prediction of Q at any unobserved spatio-temporal location (s0 , t0 ) based on spatiotemporal autocorrelation in Q and relations between Q and other variables measured at (s, t) (Cressie and Wikle, 2011; Sherman, 2011; Bivand et al., 2013). The advantage of using low-cost sensors is that a relatively high number of sensors Ns can be used in a relatively small area. In this way the measurements are better able to reflect the spatial variability in Q, and thus more useful to model its spatial autocorrelation and predict its values at unobserved locations. There is, however, a trade-off between the relative costs and data quality (Snyder et al., 2013). Reis et al. (2015) state that the number of local sensor networks is small due to the expectation that all low-cost sensors need to function at the same quality level as the reference instruments used for legislative purposes. This leads to high costs for instrumentation, calibration and maintenance. When combining data of multiple sensors in a model, information content becomes more 2.

(25) 1.3. Spatial data quality important than data quality, as long as the data quality is known (Reis et al., 2015). Another advantage of a smart city sensor network is the ability for sensors to connect with each other – observations of one sensor can for example be used to calibrate other sensors. In the city of Eindhoven, the Netherlands, a low-cost air quality sensor network has been set up by the AiREAS civil initiative (Close, 2016). This sensor network is used throughout this thesis. More information about this network can be found in Chapter 2.. 1.3 Spatial data quality As the development of low-cost sensor networks recently started, their data quality is still unknown (Snyder et al., 2013). It is important to know the spatial data quality of the sensor data, however, as it will influence the quality of the output models, maps and exposure estimates. In literature different lists are named of elements which could be included in spatial data quality assessment (Guptill and Morrison, 1995; van Oort, 2006). In order to assess spatial data quality in a transparent way, an international standard is needed. Such standard is provided by the International Organization for Standardization (ISO) in ISO 19157 (ISO/TC 211 Secretariat, 2013). Six elements of spatial data quality are defined: completeness, logical consistency, positional accuracy, thematic accuracy, temporal quality, and usability element (ISO/TC 211 Secretariat, 2013). In terms of completeness, an important issue for air quality sensor networks is missing data. Several methods exist to impute missing data in air pollution time series (e.g. Basu and Meckesheimer, 2007; Nguyen and Hoogerbrugge, 2014; Harrell, 2018), which can deal with longer periods of missing data. Logical consistency deals with the validity of attribute values and the adherence of relationships and compositions between objects to logical rules of structure and compatibility (Kainz, 1995). Negative air pollution values should for example be removed, as they are impossible to occur. Positional accuracy defines the accuracy of positions of features and is always related to some kind of spatial reference system (ISO/TC 211 Secretariat, 2013). It deals with the nearness of the true values in comparison to the observed values in this reference system (Drummond, 1995). In sensor networks where the sensors all have fixed and known locations, this is less of an issue. The reported positions, however, can be used to assess whether a sensor is at its usual location or moved for maintenance or calibration. Thematic accuracy refers to classification correctness, non-quantitative attribute correctness, and quantitative attribute accuracy. Quantitative attribute accuracy refers to the closeness of the value of a quantitative attribute to the true value (ISO/TC 211 Secretariat, 2013). This ‘true value’ often 3.

(26) 1. Introduction refers to a value of a reference dataset which is accepted to be true. There are different calibration methods available to calibrate low-cost sensors to reference stations, to improve quantitative attribute accuracy in the concentration values (Spinelle et al., 2015). An important step in assessing the non-quantitative attribute correctness of the raw sensor observations is detecting outliers in the data. Outliers, sometimes referred to as anomalies (Chandola et al., 2009), are those observations that differ from the expected observations (Basu and Meckesheimer, 2007; Zhang et al., 2012). Erroneous outliers are those observations that deviate from the true values. These are different from events, which are observations that can be detected as outliers, but do not deviate from the true values (Zhang et al., 2012). Events rather reflect a real change in the measured phenomenon and can therefore be of interest depending on the user perspective. Temporal quality explains the quality of temporal attributes and relationships. It consists of accuracy of a time measurement, temporal consistency, and temporal validity. Van Oort (2006) adds three elements which are not in ISO 19157 but which were present in the European pre-standard ENV 12656 (CEN/TC 287, 1998): last update, rate of change, and temporal lapse. The temporal lapse represents the average time between change in the real world and change representations in the data, and is thus related with the temporal resolution used when averaging the air pollutant concentrations over a period of time (e.g. ten minutes, hourly, daily). Usability element refers to the suitability of the data for a specific application. All previously mentioned elements can be used to describe and assess the usability of the data (ISO/TC 211 Secretariat, 2013).. 1.4 Modelling and mapping As air quality can only be monitored at point locations while the true air quality changes over the continuous spatio-temporal field Q, modelling is required to map air pollutant concentrations at unobserved spatiotemporal locations (s0 , t0 ). Many models and their classifications exist. Dispersion models have been used for a long time as they are relatively easy to use. The Gaussian plume model, as an example, is typically used for modelling the air pollutant dispersion. Historically, the model was mainly used for point sources such as factory stacks (Weil et al., 1992), but it has been improved over time to be used for line sources and to be applicable even under calm and changing wind conditions (Shorshani et al., 2015). Empirical models are based on measurements which are typically interpolated to create air quality maps. Beelen et al. (2009) compared different methods to map the background air pollution in the European Union, including kriging and a land use regression model. Interpolation with kriging of point observations at locations s is based on a stochastic process that is split into a trend, a spatially dependent error term and spatially independent noise. In ordinary kriging the trend is constant but unknown; in simple kriging the trend is constant and 4.

(27) 1.4. Modelling and mapping known. Land use regression (LUR) is based on a regression equation between predictor variables and measured concentrations. Air pollutant concentration mapping with LUR predicts values at unsampled locations using measured concentrations at a number of locations. Those are combined with a stochastic model using predictor variables such as land use, altitude and meteorology (Hoek et al., 2008). Regression kriging combines estimation of the trend using a regression model with simple kriging on the error component, which is assumed to have a known mean of zero. Van de Kassteele et al. (2005) evaluated methods for predicting the annual number of days that the ozone (O3 ) limit value is exceeded, using model-based spatial interpolation. In several other studies the methods for spatial interpolation of particulate matter (PM) in Europe are evaluated (Van de Kassteele et al., 2006; Hamm et al., 2015). Van de Kassteele and Stein (2006) developed a geostatistical model for mapping PM at the European scale, using error-in-variable external drift kriging (KED). In KED, secondary information is added to the statistical interpolation model (Van de Kassteele and Stein, 2006). Generally, the primary variable is expected to be the most precise while the number of locations is low, whereas the secondary variable can be less precise but is sampled more densely (Van de Kassteele et al., 2009). In epidemiological studies, urban scale maps are often used, accounting for short-distance spatial variability. Klompmaker et al. (2015) studied the spatial variation in ultrafine particles (UFPs) and black carbon (BC) in Amsterdam and Rotterdam, the Netherlands. LUR is often applied in urban air quality mapping, for example for mapping nitrogen dioxide (NO2 ) (Sahsuvaroglu et al., 2006; Jerrett et al., 2007; Hoek et al., 2008; Basagaña et al., 2012), particulate matter (PM) (Hoek et al., 2008; Saraswat et al., 2013), and more recently also for mapping UFP (Saraswat et al., 2013; Montagne et al., 2015).. Studies of health effects of long term exposure typically take into account the spatial component only, ignoring the temporal variability or adjusting for it (Gulliver and Briggs, 2004; Gehring et al., 2010; Beelen et al., 2014). Other studies measure exposure in different transportation modes and combine it with findings from health studies to assess the health effects of using specific transportation modes, without addressing the spatial and temporal variation (Knibbs and de Dear, 2010; Knibbs et al., 2011). Sensor networks measuring at a high spatio-temporal resolution provide opportunities for assessing short-term health effects. Exposure can be estimated near the school or work address, instead of at one central location in the city. The best spatial and temporal resolution can be achieved by modelling towards the maximum scalable unit, being the maximum unit in space and time where the air pollutant concentrations are considered to be homogeneous. 5.

(28) 1. Introduction. 1.5 Uncertainty When maps are used to assess health effects, it is important to communicate its related uncertainty. Incorrect or inaccurate dose-response characterizations may lead to overestimation or underestimation of health effects (Burns et al., 2014). The confidence and trust of the user in map products depends on user’s awareness of the uncertainties that they bring along (Sacha et al., 2016). The U.S. National Research Council (NRC) defines uncertainty as a lack of information, incorrect information, or incomplete information (U.S. NRC, 2009). Not surprisingly, it is recognized as an important element in geo-information science (Foody, 2003), as all geo-information contains uncertainty to some degree (Hwang et al., 1998). Uncertainty leads to imperfection and is a result of vague, ambiguous, imprecise, inaccurate, or incomplete information (Tavana et al., 2016). Low spatial data quality leads to an increase in uncertainty of the output models and maps. Uncertainty depends upon the density of observations and the mapping procedure; increasing the number of observations reduces the uncertainty about the spatial variability of an attribute rather than it reduces the spatial variability itself (Heuvelink, 1998). Tavana et al. (2016) make a distinction between statistical and nonstatistical methods for assessing uncertainty. Uncertainty is a result of vagueness, ambiguity, imprecision, inaccuracy, or incompleteness. Vagueness and ambiguity can be assessed using non-statistical methods such as fuzzy set and possibility theory. Imprecision, inaccuracy and incompleteness can be assessed using statistical methods, such as probability theory or Dempster-Shafer theory (Wang et al., 2005b; Tavana et al., 2016). Other methods include Monte Carlo, Taylor series expansion and Relative Variance Contribution (RVC) (Wang et al., 2005a). Fuzzy membership approaches have been used to assess uncertainty in air pollutant models (Guo et al., 2007; Shad et al., 2009), as well as probabilistic methods (Colvile et al., 2002; Yegnan et al., 2002). The latter refers to uncertainty as the variance in the input data compared to the variance in the output results.. 1.6 Air pollution: sources and sinks In order to model air pollution levels, it is important to understand the processes that lead to an increase in air pollutant concentrations (sources) and those that lead to a decrease in air pollutant concentrations (sinks). Sources can be of natural or anthropogenic origin, or the result of chemical processes in the atmosphere. Sinks can be related to meteorological and chemical processes. We differentiate between PM and gases. PM can be divided into different categories based upon their aerodynamic 6.

(29) 1.7. Health effects diameter: PM10 (particles <10 µm), PM2.5 (<2.5 µm), PM1 (<1 µm) and UFPs (<0.1 µm). Fine particles (PM2.5 , PM1 and UFPs) penetrate deepest into the gas-exchange part of the lung (Brunekreef and Holgate, 2002). Sources of fine particles include vehicular traffic and exhaust, construction activities, factories and power generation plants, wood burning, and agricultural activities (Graham, 2004). The concentration of particulate matter consists of primary and secondary pollutants. Primary pollutants include for example motor vehicle emissions. Secondary pollutants are formed by collision of smaller particles and gases (Gulliver and Briggs, 2004). Nitric oxide (NO) is brought into the atmosphere by combustion of fossil fuels in power generators and motor vehicles (Brunekreef and Holgate, 2002; Graham, 2004) The toxic pollutant NO2 is formed when the non-toxic NO is oxidized in the atmosphere in a chemical reaction using the O3 present in the atmosphere (Brunekreef and Holgate, 2002; Fenger, 2009).. 1.7 Health effects Long-term exposure to traffic-related pollutants may have large health effects and shorten life expectancy (Hoek et al., 2002). An association has been suggested between long-term exposure to particulate matter air pollution and increased mortality from lung cancer, respiratory diseases and cardiovascular diseases (Dockery et al., 1993; Pope et al., 1995; Abbey et al., 1999; Hoek et al., 2002; Brook et al., 2010; Beelen et al., 2014). In different studies, different exposure-response associations were found. Those are related to differences in methods used for exposure assessment, differences in infiltration of particles indoors, particle composition and population composition (Hoek et al., 2013). NO2 exposure has been associated with all-cause mortality in adults (Hoek et al., 2013) and respiratory infections, lung function growth, and asthma exacerbation in children (Goldizen et al., 2016). NO2 is also a tracer for other traffic-related air pollutants such as black carbon and UFPs (Health Effects Institute, 2010). Pollutants have an indirect effect on asthma, interacting with pollen grains and enhancing the release of antigen, causing inflammation in the airways (Graham, 2004). The number of prospective cohort studies on the relationship between traffic-related air quality and asthma is limited (Gehring et al., 2010). It is difficult to distinguish between effects of specific pollutants due to a large overlap between the symptoms of different pollutants and a high correlation between different air pollutants in space and time (WHO, 2013a,b). Asthma exacerbation in children specifically is a highly relevant endpoint, which needs better quantification to be used in future health impact assessment of outdoor air pollution (WHO, 2013a). Young children with asthma are very sensitive to the effects of air pollution. Increased 7.

(30) 1. Introduction sensitivity compared to adults is due to a combination of increased time spent exercising outside, high ventilation rates per body weight, developing lungs and immature metabolic pathways (Guarnieri and Balmes, 2014). Few epidemiological studies have directly compared effects in children and adults in the same study. In two studies conducted in the Netherlands in parallel, of which one was focused on adults and one on children, Van der Zee et al. found significant effects of increased PM10 concentration levels on lung function in children, but not in adults. Similar results were found for acute symptoms in the lower respiratory tract, which were significantly related to PM10 in symptomatic children but not in symptomatic adults (Van der Zee et al., 1999, 2000). School age children have more predictable time activity patterns, allowing more precise exposure assessment based on outdoor monitors.. 1.8 Limit values and guidelines Based on the scientific literature available on the health effects of air pollution, the WHO has provided a set of guidelines to reduce the health impacts of air pollution. The European Commission (EC) also formulated a directive in which limit values are given that should not be exceeded by the member states of the European Union (European Parliament and Council of the European Union, 2008). Table 1.1 shows an overview of the different maximum concentration values as advised by the WHO and implemented in the European directive and in the Dutch national law. Table 1.1 WHO guidelines in comparison with EU directives and Dutch national law on the concentration limit for different pollutants (in µg m-3 ) (Ministry of Infrastructure and the Environment, 1979; WHO, 2006; European Parliament and Council of the European Union, 2008). PM2.5 (annual mean). PM2.5 (24 hr mean). PM10 (annual mean). PM10 (24 hr mean). O3 (8 hr mean). NO2 (annual mean). NO2 (1 hr mean). WHO. 10. 25. 20. 50. 100. 40. 200. EC. 25. -. 40. 50*. 120**. 40. 200***. Dutch law. 25. -. 40. 50*. 120**. 40. 200***. * value may be exceeded maximum 35 times a year. ** value may be exceeded maximum 25 times a year based on a 3-year average. *** value may be exceeded maximum 18 times a year.. The WHO guidelines concern the concentrations of PM10 , PM2.5 , O3 , and NO2 . There has been too little research on the health effects of UFPs to set guidelines for those pollutants (WHO, 2006). Unregulated pollutants, such as PM1 and UFPs, are often not measured by ambient 8.

(31) 1.9. Problem statement air quality monitoring stations (Goldizen et al., 2016). This creates a vicious circle: without monitoring sites, little research is being done, which in turn makes it difficult to come up with guidelines – and without guidelines, no monitoring is performed. As can be derived from Table 1.1, the values from the European directive are implemented in Dutch national law without changes. The EC limit values are up to twice as high as the WHO guidelines for PM2.5 and PM10 . The WHO has also set up a set of interim targets, including a description of the difference in health effects between the interim targets (IT-x) and the air quality guidelines (AQG). The aim of the interim targets is to provide policy makers all over the world with various options for air quality management (WHO, 2006). Even well below the EC limit values, long-term exposure to fine particulate matter causes health effects and mortality (Beelen et al., 2014). Brunekreef and Holgate (2002) mention that a limit value for concentrations to cause health effects is absent or at very low value, because exposure to low concentrations of air pollutants already causes damage. Acknowledging this, the WHO states that the guideline values mentioned cannot fully protect human health (WHO, 2006). As lower limits are absent, there is an interest in health effects at lower levels of air pollution. To study these, air pollutant models are often used to estimate exposure. A distinction is made between short-term exposure and long-term exposure. Short-term exposure refers to the exposure for hours up to days (WHO, 2013b), whereas long-term exposure refers to exposures of a year or longer (Hoek et al., 2013). A distinction is made as well between short-term health effects and long-term health effects. Short-term health effects are acute effects of exposure on daily symptoms or lung function (Weinmayr et al., 2010). Long-term health effects include the development of diseases over a longer period of time. Long-term health effects are often studied in relation to long-term exposure (Gehring et al., 2010; Hoek et al., 2013; Beelen et al., 2014), whereas short-term health effects are studied in relation to short-term exposure (Weinmayr et al., 2010; Goldizen et al., 2016). This is also reflected in the WHO guidelines (WHO, 2006) that associate daily averages with short-term mortality risks and annual averages with long-term mortality risks.. 1.9 Problem statement Recently developed low-cost urban air quality sensor networks offer the possibility for monitoring air pollution at a fine spatio-temporal resolution. However, low-cost sensors may be more prone to report outliers, and their data quality is often unknown (Snyder et al., 2013). Compared to conventional monitors, measurements of low-cost air quality sensors are more sensitive to interference effects of humidity and other pollutants, as well as a loss of sensitivity to the target pollutant over time, 9.

(32) 1. Introduction referred to as sensor drift. An evaluation of the data quality of low-cost sensors is of major importance, as it determines the usefulness of the measurements in different applications. The communication of data quality issues is of importance to avoid misinterpretation of the data when open to the public, media, politicians and researchers. Next, the possibilities of using low-cost urban sensor networks for modelling and mapping air quality should be evaluated. Using those fine resolution air quality maps in combination with health data, the spatially explicit health risks related to air pollution can be visualized. Here, a neglected topic is the propagation of uncertainty from the input data to the output maps.. 1.10 Research objectives The key objectives of this thesis are: 1. To develop an outlier detection method suitable to detect outliers in space and time while accounting for the large spatio-temporal variability of air pollutant concentrations in an urban area. 2. To develop and evaluate automatic calibration methods for low-cost sensors in an urban air quality sensor network, accounting for drift and interference effects. 3. To develop a spatio-temporal kriging framework for modelling air pollutant concentrations using a low-cost sensor network. 4. To create burden of disease maps, expressing the spatial variability in health risks related to ambient air pollution, using a low-cost sensor network to allow spatially refined human exposure assessment.. 1.11 Outline Chapter 1 introduces the challenges related to spatio-temporal big data analysis and data quality issues associated with low-cost sensor network data. It provides context and background information related to statistical modelling and mapping, as well as air pollution sources and sinks, health effects and limit values. It presents research gaps in this field and the related research objectives of this thesis. Chapter 2 gives a detailed overview of the case study area and the low-cost air quality sensor network used in this study. Chapter 3 presents a new outlier detection method, in which observations are classified in spatio-temporal classes to determine outlier threshold levels based on the location and time at which an observation was taken. Transformations are applied to account for non-normality of air pollutant concentrations. 10.

(33) 1.12. Author contributions Chapter 4 presents a novel iterative Bayesian calibration method, and compares the method to several existing calibration methods. The methods are compared in terms of temporal stability, spatial transferability, and sensor-specificity. Chapter 5 presents a spatio-temporal regression kriging framework to model air pollution in an urban area. The trend part of the model consists of a set of spatial and temporal covariates, and the residuals are interpolated using simple kriging. Chapter 6 presents a burden of disease assessment based on a new panel study on asthmatic children. The panel study data is combined with a priori effect estimates from literature in a Bayesian framework. The updated effect estimates are combined with modelled air pollution concentrations to obtain a burden of disease map. The propagation of uncertainty from the input data to the burden of disease map is evaluated. Chapter 7 provides a synthesis of this thesis. The main results are summarized, as well as the implications, limitations and suggestions for further research.. 1.12 Author contributions Chapters 3, 4 and 5 are based on published papers and Chapter 6 is currently under review. For the purpose of consistency throughout this thesis, small changes have been made compared to the published versions. Case study area descriptions have been removed from the papers and merged in Chapter 2. Variable names and symbols have been changed to avoid confusion due to multiple meanings and definitions. Chapters 3–6 contain references to ‘we’, referring to the authors of the publication. In these publications, VZ carried out all the scientific analyses and wrote the manuscript. FO advised on the spatio-temporal statistical framework, GH advised on air quality, sensor calibration and health effects, and AS advised on data interpretation and spatio-temporal data analysis. All authors were involved in textual editing of the final manuscripts.. 11.

(34)

(35) 2. Case study area. 2.1 Study area: the city of Eindhoven The study area is the city of Eindhoven, located in the southern part of the Netherlands (Figure 2.1). Its background concentrations of air pollutants are relatively high, mainly due to surrounding agricultural activities and to the city’s position with respect to industrial areas such as Antwerp and Ghent in Belgium, the Ruhr area in Germany and Rotterdam area in the Netherlands. According to the Royal Netherlands Meteorological Institute (Koninklijk Nederlands Meteorologisch Instituut, KNMI), the prevailing wind direction in Eindhoven is from the south-west (KNMI, 2011), causing long range transport of pollutants. The city has a high population density and traffic intensity, elevating levels of traffic-related pollutants such as NO2 . Together with the high background concentrations, there is a large short-distance spatial variation. This short-distance variation is not only found for PM, but most evidently in gases such as NO2 . Traffic-related air pollutants are a major source of ambient air pollution in urban areas (Goldizen et al., 2016). Therefore, the spatial variability of PM2.5 and NO2 is highest in the city, because of the relation of these pollutants with traffic (Beelen et al., 2014). In recent years, inhabitants of the city have become more aware of the health effects of traffic-related air pollution. Because of the low density of existing monitoring networks and a general mistrust in routine dispersion models used for the evaluation of exceedance of limit values, the AiREAS civil initiative has been set up to monitor the air quality at a fine spatio-temporal resolution.. 2.2 The ILM air quality network AiREAS is a civil initiative in Eindhoven in which inhabitants cooperate with companies, universities and governmental organizations (Close, 2016). As part of this initiative, an air quality sensor network has been set up in Eindhoven, referred to as Innovatief Luchtmeetnet (ILM). It is the first fine resolution urban air quality sensor network in the Netherlands 13.

(36) 2. Case study area. ¯ !. Figure 2.1. Location of Eindhoven (black dot) within the Netherlands. (Figure 2.2). It was installed in November 2013 and has been operated continuously since. The network consists of 35 weatherproof ‘airboxes’. Since the total area of the municipality of Eindhoven is approximately 90 km2 , the sensor network is relatively dense. The airboxes are of size 43 × 33 × 20 cm (Figure 2.3) and contain an array of sensors. They are manufactured by the former Energy Research Centre of the Netherlands (ECN), now part of the Netherlands Organisation for applied scientific research (TNO). Each airbox measures PM10 , PM2.5 , PM1 , O3 , temperature and humidity as the air flows through. A large part, 25 airboxes, also measures NO2 since 2015 based on available budget. Because of the high sensor costs, UFPs are measured at six locations. The UFP sensors are installed in separate boxes which are attached to the airboxes for power supply and GPRS connection. From November 2016 to February 2017 the UFP sensors were attached to different airboxes every three weeks to cover multiple locations. All AiREAS data is publicly available (AiREAS, 2016).. 2.3 Spatial sampling scheme The spatial locations of the airboxes were chosen based on several criteria (Close, 2016), following the philosophy of the ESCAPE study (Eeftens et al., 2012). Most importantly, sampling sites represent locations where humans are exposed. The airboxes are located in the build-up area of the city, near residential areas and schools. The set of locations covers urban background locations in quiet neighborhoods as well as urban traffic locations near busy roads. One airbox is located outside of the city for regional background monitoring. All airboxes are installed in fixed positions at lamp posts to supply electricity. They are located at 2.53 m height, representing human exposure as closely as possible, while minimizing the risk of accessibility by third persons. At two locations in the city, an airbox is collocated with a reference monitor (Section 2.5). 14.

(37) 2.4. Airbox sensors. Figure 2.2. Locations of the airboxes in the city of Eindhoven, the Netherlands. 2.4 Airbox sensors Ventilation strips on the sides of the airbox allow air to flow through. A gauze protects the airbox from insects and the air is dried to minimize interference from relative humidity. An overview of the installed sensors is shown in Table 2.1. An optical PM sensor is used to count the size and number of particles that flow through with the help of an resistive heater. The light of an infrared LED is scattered by PM and then measured by a photo-diode detector. The raw output of the sensor consists of digital pulses proportional to particle count concentrations (Austin et al., 2015). A combination of particle size and count is used to convert the particle counts to concentrations in µg m-3 . The particles entering the UFP sensor are charged and enter a Faraday cage in which the deposited charge is measured using a very sensitive current meter and converted to particle number concentrations (Marra et al., 2010). O3 is measured using a metal oxide sensor after heating and ambient temperature correction. Sensor resistance is converted to O3 concentrations (Hamm et al., 2016). NO2 is measured using the electrochemical cell Citytech Sensoric NO2 15.

(38) 2. Case study area. Figure 2.3. Airbox attached to light pole.. 3E50 in a differential measurement setup. A switching valve and reagent cartridges are used in front of the electrochemical cell to dry the air. Observations are discarded when temperature and humidity fall outside acceptable ranges. The airboxes are attached to light poles for power supply. The data of all sensors is sent to a server every 10 minutes via a GPRS connection. After initial lab and field calibration of the sensors, data have been collected since November 2013. There are some gaps in the data for moments in time in which the instruments were removed for testing, adjusting or calibration. The sensors were recalibrated at the end of 2015, together with the installation and calibration of the NO2 sensors. Throughout this thesis, NO2 sensor data of 2016 are used.. 2.5 Reference measurements The national air quality sensor network (LML) is maintained and operated by the National Institute for Public Health and the Environment (RIVM, 2019a). The LML sensor network consists of around 60 measurement stations throughout the Netherlands, of which two are situated in Eind16.

(39) 2.5. Reference measurements Table 2.1 Variables measured and instruments used in the airboxes. Variable. Units. Instruments. PM10. µg m-3. Shinyei PPD42 ECN revised. PM2.5. µg m-3. Shinyei PPD42 ECN revised. -3. Shinyei PPD42 ECN revised. -3. Aerasense NanoMonitor PNMT 1000. -3. E2V MICS 2610. PM1 UFP. µg m # cm. O3. µg m. NO2. µg m-3. Citytech Sensoric NO2 3E50 ECN revised. Temperature. ◦. Sensirion SHT75. Relative humidity. %. C. Sensirion SHT75. hoven. Although the LML has a lower spatial and temporal resolution, the uncertainty of the measurements is expected to be lower than the uncertainty of the ILM measurements. The measurement uncertainty of the LML sensor network is about 15-20% for PM2.5 and PM10 (RIVM, 2014). The required maximum uncertainty to follow the European directives equals 25% for PM10 (Nguyen and Hoogerbrugge, 2014). Although the uncertainty of the LML sensor network is below the threshold set by the European directives, it may create problems for specific applications. For example, when the concentrations of PM2.5 and PM10 are close to each other, the measured values of PM2.5 could be higher than the measured values of PM10 (RIVM, 2014). This does of course not represent a valid situation, with PM2.5 being part of PM10 . This uncertainty level also allows for negative values of PM2.5 and PM10 concentrations to occur. For calibration and validation purposes, airboxes are located within a few meters from the LML measurement stations.. 17.

(40)

(41) 3. Outlier detection in urban air quality sensor networks. Abstract Low-cost urban air quality sensor networks are increasingly used to study the spatio-temporal variability in air pollutant concentrations. Recently installed low-cost urban sensors, however, are more prone to result in erroneous data than conventional monitors, e.g. leading to outliers. Commonly applied outlier detection methods are unsuitable for air pollutant measurements that have large spatial and temporal variations as occur in urban areas. We present a novel outlier detection method based upon a spatio-temporal classification, focusing on hourly NO2 concentrations. We divide a full year’s observations into 16 spatiotemporal classes, reflecting urban background vs. urban traffic stations, weekdays vs. weekends and four periods per day. For each spatiotemporal class, we detect outliers using the mean and standard deviation of the normal distribution underlying the truncated normal distribution of the NO2 observations. Applying this method to a low-cost air quality sensor network in the city of Eindhoven, the Netherlands, we found 0.10.5% of outliers. Outliers could reflect measurement errors or unusual high air pollution events. Additional evaluation using expert knowledge is needed to decide on treatment of the identified outliers. We conclude that our method is able to detect outliers while maintaining the spatiotemporal variability of air pollutant concentrations in urban areas.. This chapter is published as: Van Zoest, V.M., Stein, A., Hoek, G., 2018. Outlier Detection in Urban Air Quality Sensor Networks. Water, Air, & Soil Pollution 229, 111. doi:10.1007/s11270-018-3756-7.. 19.

(42) 3. Outlier detection in urban air quality sensor networks. 3.1 Introduction Air quality is monitored globally, with national monitoring networks being used to assess air pollution in relation to environmental limit values. In Europe, national, regional and local environmental agencies operate these monitoring networks according to EU guidelines (European Parliament and Council of the European Union, 2008), complying to high standards of equivalency (EC Working Group on GDE, 2010). Each European country has a network of air quality monitoring stations that are located in urban, suburban and rural areas.. Health effects of air pollution have attracted public and scientific attention globally as the global burden of disease of outdoor air pollution is significant (Cohen et al., 2017). The health risks are typically highest in urban areas because of their high population density, a high density of schools and hospitals and higher air pollution concentrations. In recent local networks, urban air quality is measured using a larger number of sensors than in national air quality networks, allowing detection of more local sources. In response to the increasing civil interest in the air they breathe, more local initiatives have resulted in extended low-cost monitoring networks. These provide more detailed spatiotemporal data on air quality. Data from such sensor networks however are more prone to result in errors and their spatio-temporal data quality is often unknown (Snyder et al., 2013). This leads to an increased need for data evaluation. Data evaluation of low-cost air quality networks typically includes outlier detection, comparison with classical monitors, comparison of inter-sensor measurements and evaluation of the stability of sensors. In this paper we focus on outlier detection.. Outlier detection is an important part of data cleaning and particularly relevant for low-cost air quality sensor networks. Outlier detection is defined as the detection of values that are statistically significantly different from the expected value at a given time and location. Outlier detection is important for detecting air pollution events, but also for removing errors that might otherwise affect data analysis and comparison, including unnecessary unrest among the population if data are publicly available online. Errors in this context refer to inaccuracies due to air quality sensor faults, mistakes in the human handling of the sensors, or positioning of the sensors under conditions for which they are not designed. Events are valid observations of very high or low air pollutant concentrations compared to the concentrations expected at a given time in a given location (Zhang et al., 2012). True events can be related to very local sources (e.g. a small fire, truck idling within meters of a monitor) or to very unusual weather circumstances such as low mixing height and high atmospheric stability resulting in poor dispersion of emitted pollutants. 20.

(43) 3.1. Introduction Functional outlier detection, as a common type of temporal outlier detection, compares various function curves of fixed time periods. In the past, this method was applied to PM10 , SO2 , NO, NO2 , CO and O3 to detect months with unusually high air pollutant concentrations (Martínez Torres et al., 2011), or to detect working days and non-working days with outlying NOx levels (Febrero et al., 2007, 2008; Sguera et al., 2016). Functional outlier detection is used to compare entire vectors of measurements (e.g. all observations in a month) and is therefore less suitable for the detection of individual outliers. Comparing an observation only to its temporal neighborhood may also lead to the neglect of a systematic bias in the sensor.. In spatial outlier detection, an observation is compared to the observations in its spatial neighborhood. Bobbia et al. (2015) used kriging to detect outliers in PM10 concentrations on a provincial scale. Spatiotemporal outlier detection combines the spatial neighborhood with a temporal neighborhood. It has been applied to PM10 measurements at the European scale (Kracht et al., 2014). At this scale level however only rural and urban background stations can be used, as the methods are not suitable for dealing with the wide spatial variation of air pollutants in an urban area.. For an urban air quality sensor network, both spatial and spatiotemporal outlier detection have only been applied to air pollutants that show a low spatial variation. Hamm (2016) and Shamsipour et al. (2014) applied spatial and spatio-temporal outlier detection methods on PM10 , which in cities is mostly dominated by regional background concentrations from sources outside the city (Eeftens et al., 2012). Distanceweighting techniques such as kriging were successfully applied to urban PM10 for filling missing values and for outlier detection. There was no need for space varying covariates because PM10 concentration was not related to the type of location or street (Hamm, 2016). For NO2 , however, the concentrations can vary over short distances, e.g. governed by the traffic density of a street (Briggs et al., 1997; Cyrys et al., 2012). As the distances over which NO2 concentrations vary (10-s of meters) are commonly shorter than the distances between sensor locations (km’s), spatial outlier detection methods based on distance-weighting cannot be applied to NO2 measurements in cities.. The objective of this study was to develop an adequate outlier detection method for an urban air quality sensor network. Such a network is characterized by a fine-scale spatial and temporal variation in air quality. For this study, we use NO2 data from the ILM air quality sensor network located in the city of Eindhoven, the Netherlands (Chapter 2). 21.

Referenties

GERELATEERDE DOCUMENTEN

Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright

$V D VHFRQG DSSURDFK WR H[DPLQH WKH UHOHYDQFH RI ORZHUHG DWWHQWLRQDO HQJDJHPHQW ZLWK IRRG FXHV IRU WKH SHUVLVWHQFH RI $1 ZH WHVWHG ZKHWKHU D UHODWLYHO\

Legend: Timepoint 0 = Baseline/ first session, Timepoint 1 = Post-treatment/ last session, Verbal initiative = number of observed verbal initiatives towards the therapist and

In het rapport zijn een reeks kansrijke belichtingsscenario’s doorgerekend voor een representatieve gewasstructuur voor tomaat en roos.. Het resultaat bleek sterk afhankelijk

To sum up, using the cases of the dot-com crises and Great Recession in relation to value investing, I find that a conservative investment approach can indeed

Het aanpassen van het schoolplein zal volgens de directeur van het Talent een manier zijn om de leerlingen te stimuleren tot meer deelname aan spel- en beweegactiviteiten, maar

In addition, I argue that the Jewish community believes that migrant communities offer unique inputs on an equal footing and I contrast this viewpoint with Belchem’s (2014) study

decided nollo align itself with either white party, thereby alienating the Coloured African Peoples ' Organisation and the Cape Native Voters' Association both of