• No results found

Integration of Administrative Data for Science and Public Policy: An Experience in Brazilian Public Health

N/A
N/A
Protected

Academic year: 2021

Share "Integration of Administrative Data for Science and Public Policy: An Experience in Brazilian Public Health"

Copied!
8
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

STI 2018 Conference Proceedings

Proceedings of the 23rd International Conference on Science and Technology Indicators

All papers published in this conference proceedings have been peer reviewed through a peer review process administered by the proceedings Editors. Reviews were conducted by expert referees to the professional and scientific standards expected of a conference proceedings.

Chair of the Conference Paul Wouters

Scientific Editors Rodrigo Costas Thomas Franssen Alfredo Yegros-Yegros

Layout

Andrea Reyes Elizondo Suze van der Luijt-Jansen

The articles of this collection can be accessed at https://hdl.handle.net/1887/64521 ISBN: 978-90-9031204-0

© of the text: the authors

© 2018 Centre for Science and Technology Studies (CWTS), Leiden University, The Netherlands

This ARTICLE is licensed under a Creative Commons Atribution-NonCommercial-NonDetivates 4.0 International Licensed

(2)

Integration of Administrative Data for Science and Public Policy:

An Experience in Brazilian Public Health

1

Bethania de Araújo Almeida *, Paula Xavier ** and Mauricio Barreto ***

*bethania.almeida@fiocruz.br

Center of Data and Knowledge Integration for Health (CIDACS), Fundação Oswaldo Cruz, Parque Tecnológico da Bahia - Rua Mundo 121 – Trobogy. Salvador (BA), 41745-715 (Brazil)

** paula.xavier@fiocruz.br

Vice Presidency of Education, Information and Communication (VPEIC), Fundação Oswaldo Cruz, Av Brasil, 4365-Manguinhos. Rio de Janeiro, 21040-900 (Brazil)

*** mauricio.barreto@fiocruz.br

Center of Data and Knowledge Integration for Health (CIDACS), Fundação Oswaldo Cruz, Parque Tecnológico da Bahia - Rua Mundo 121 – Trobogy. Salvador (BA), 41745-715 (Brazil)

Background

Currently, Open Science practices have primarily focused on two main aspects: open access and, more recently, open data. Together with perspectives on open access, the production and use of data, both in the scientific arena as well as other sectors, has grown tremendously, prompting debates on the importance of this data, its usage, potential, as well as challenges faced by and impacts on a globally connected society that benefits from the use of ever-more powerful and specialized digital technologies. As a result of this exponential growth, over the last decade new practices of knowledge production have arisen to support research using large volumes of data, sophisticated computational methods and high-throughput technology, resulting in large-scale data-driven science that presents many challenges, including epistemological issues surrounding the conditions under which those data are interpreted, disseminated and reused (Leonelli, 2017).

Previous research efforts have revealed that funding agencies have been instrumental in shaping the policies pertaining to the management, sharing and reuse of scientific data. The implementation of these policies has become an obligatory requirement for financing research projects. Policies are converging in an attempt to simplify opening the data generated by research projects (Santos et al. 2017). However, a consensus has yet to be reached that addresses all the issues pertaining to the access, treatment, use and reuse of administrative data, especially when this data contains personal information that is used in the integration of disparate datasets

1 This work was supported in part by Zika Platform- a long-term surveillance platform for the zika virus and microcephaly, Unified Health System (SUS) - Brazilian Ministry of Health. Additional support was provided by CIDACS – the Center of Data and Knowledge Integration for Health

(3)

from varied sources for research purposes. The usage of data collected or stored by governments is also of relevance to open science (Foster, 2018), but these practices tend to be more focused on public transparency, access to information and citizen participation (Open Government Partnership, 2018).

Government data, here in referred to as administrative data, may be collected by any government department or agency in the delivery of day-to-day services. It contains information about individuals, businesses and other organisations, and may include private information, such as school records, health information, etc. (Administrative Data Research Network, 2018).

The data collected by governments differs greatly from that generated by the academic community, as government data is collected in a logical format over time and refers to the totality of populations or specific groups, while academic data is limited in scope, generally collected over a defined period for specific purposes, and generally has terms of consent.

Providing access to and the usage of administrative datasets for research purposes presents many challenges, most prominently related to ethical and legal issues in an effort to protect personal data. Researchers using administrative data work under strict conditions as dictated by government departments that have implications for study replication and the development of cumulative knowledge (Conelly et al., 2016).

Despite a lack of consensus regarding the access, treatment and use of administrative data for research purposes, particularly when these datasets contain personally identifiable information, we nonetheless believe that the potential benefits provided by enhancing the knowledge surrounding health issues and the generation of evidence that supports regulatory decisions and shapes public health policy justifies the use of these data, provided that technical and administrative security measures, capable of preventing unauthorized access and disclosure.

Even some authors, such as Lisa M. Lee, a bioethicist, have argue that the non-use of this data constitutes a lack of ethical behavior by preventing us from reaping these societal benefits (Lee, 2017).

In addition to ethical and professional responsibilities surrounding the protection of individual and collective rights, transparency, independence in government research, opposition to commercial usage and clarity regarding potential public benefits are important aspects that must be considered in order for society to approve access to and the use of administrative datasets containing personal information (Wright, 2014).

Administrative data have been neglected in discussions regarding big data. There are multiple types of big data, with each offering new opportunities in specific areas of investigation, requiring different methodological approaches and a clear understanding of the specific nature of the data in question in order to undertake appropriate analyses (Connely et al., 2016).

In the area of public health, the large size of many administrative data resources holds the potential to contribute to the development of high-quality and impactful research, especially when integrated with other datasets. The usage of linked administrative data for public health research purposes allows for, among other aspects, the assessment of the development of disease over time , cause and effect testing (relationship of time between exposure and effect), reduced time and cost with respect to field studies, the evaluation of health determinants and policies in large-scale populations, all of which would be infeasible in studies employing traditional data collection techniques.

The usage of administrative data containing personal information for research purposes is restricted in many countries. The legal basis for access, appropriate security arrangements, exclusive usage for a previously specified purpose, appropriate credentials from the requesting institution and the ethical basis of the proposed study must be considered. Furthermore, it is

(4)

only as well as be aware that legal action will be taken if/when data are used inappropriately or without due care (Harron et al., 2017).

Beginning May 25, 2018, the European Union General Data Protection Regulation (GDPR) took effect, which specifies cases for the use of personal data for research purposes among all member countries. GDPR established that the data warranting special protection can be processed for health-related purposes and research in relevant cases where benefit can be provided to citizens and society in general. On August 14, 2018, a legislative proposal protecting individuals’ private data was signed into law by the Brazilian president. Both of the above mentioned regulatory proposals recognize the potential benefits of using personal data to enhance knowledge and generating evidence to support regulatory decision-making and public policy, especially in the area of public health, provided that technical and administrative security measures are implemented to prevent privacy violations.

Purpose

Considering that administrative data linked to data from a variety of sources holds the potential to change the research landscape in the health arena, since it constitutes a valuable tool for combining individual-level information from different sources for population-based research applications with implications for public policy-making, this paper will present the experience of a Brazilian data center that uses large volumes of administrative data for linkage purposes to enrich research efforts in the area of public health and also to support decision-making by public policymakers to tackle health inequalities.

Methods

This paper is based on a participant observation case of managing large volumes of data containing personal information, while taking into account privacy, ethics and information security concerns, as well as technical and scientific considerations associated with linking individuals across datasets for public health research purposes.

Results

The Fiocruz Center of Data and Knowledge Integration for Health (CIDACS) was created in December 2016 to conduct interdisciplinary studies and research, develop new scientific methodology and promote professional training using large-scale databases (big data) and high- performance computational resources in a secure environment. The center’s main research projects are primarily focused on evaluating public policies to better understand their impact on health outcomes in low-income populations throughout Brazil.

Administrative data are at the core of the research conducted by CIDACS, which is designed to accelerate research activities that support public health recommendations. By analyzing linked anonymized databases, we employ a data-driven approach that is more realiable, cost- effective and less time-consuming than traditional epidemiological methods investigating cohorts in population-based studies for example.

After receiving authorization to use the administrative databases required for our proposed research projects, as previously approved by an ethics committee, these datasets are received together with all security information protocols and ingested into CIDACS’ Big Data Platform.

Our linkage environment has been designed to preserve confidentiality through a combination of physical and virtual settings that restricts the possibility of re-identification of individuals in accordance with considerations for safe data linkage environments, including the separation of linkage, access and analysis processes.

(5)

Figure 1: The CIDACS Big Data Platform:

Source: CIDACS, 2017.

Big data preparation, processing, linkage and datasets extraction which includes assessment of linkage quality and accuracy, involve the areas of data science, data curation and statistics.

Cidacs has developed flexible, innovative and advanced algorithms that can aid in the linkage of large-scale data sets by providing a high degree of accuracy (Pita et al., 2018; Barbosa et al., 2018).

In an attempt to provide suitable information for researchers to judge the reliability of the resulting linked data for their required purposes, all data preparation, including deterministic or probabilistic linkage methods, as well as linkage quality and potential bias information, are richly described in the corresponding metadata containing descriptive aspects and statistics that allows for the evaluation of usability, transparency and reproducibility in linked datasets.

CIDACS’ Big Data platform operates with a hierarchical access policy, ensuring that only a specified number of individuals possess the highest level of access to all data elements.

Researchers are exclusively permitted to access de-identified, coded data variables relevant to their proposed area of interest after signing a term of responsibility with usage conditions designed to prevent further linkage or the reidentification of individuals. The access to and analysis of dataset can be either in-person or by secured network (Virtual Private Network- VPN) in conformity with sound practices and information security. Analysis and visualization software is available, which is designed to optimize big data processing.

Currently our main research projects are mostly focused on evaluating public policies to better understand their impact on health outcomes in low-income populations throughout Brazil. For instance, administrative databases containing data on public health services integrated with information from social welfare programs can enable researchers to evaluate the possible effects of public policies on health outcomes as it is possible track over years health conditions of people that are registered in those programs. So, administrative databases are also particularly useful for reliable public policy evaluation throughout time.

Figure 02: Cohort containing health, education, socioeconomic and demographic information on more than 100 million Brazilians, derived from social program databases

(6)

Source: CIDACS, 2018.

At this time, approximately 20 studies are being conducted using this cohort, including the effects of social determinants, as well as of conditional cash transfers or housing programs for low-income populations, on health outcomes, such as tuberculosis, leprosy, cardiovascular disease, child mortality, low birth weight, premature births, neonatal mortality, maternal mortality, and mortality due to suicide and homicide.

To perform data linkage among different databases for our purposes, complex data management procedures are required focused on the treatment of sensitive data, privacy concerns and other potential confidentiality issues, as well as legal and ethical considerations, in order to establish proper and effective ways for accessing, and, if possible, opening our linked anonymized datasets to outside researchers and public policymakers in accordance with open data recommendations.

To this end, later this year, rich metadata for linked anonymized datasets will be made findable, and we will publish clear rules regarding the process for accessing these datasets according to the FAIR Principles, which have been adopted as a standard for data management when issuing recommendations on implementing open science. These principles have been elaborated to enable fragmented and dissimilar data to become Findable, Accessible, Interoperable and Reusable, delineating the characteristics that contemporary data resources, tools, vocabularies and infrastructures should incorporate to aid in discovery and reuse by third parties (Wilkinson et al., 2016; European Commission, 2016; Go Fair, 2018).

Conclusions

CIDACS’ big data management experience seeks to prioritize the protection of individual and collective rights and interests, by balancing data protection with transparency in terms of the quality of the linked anonymized data provided for research. Our Center’s governance further verifies that certain data management decisions regarding the access, use and reuse of our linked anonymized datasets are aligned with legal, safety and ethical aspects to assure research integrity in an effort to answer the complex questions essential to making new discoveries and supporting policymaking decisions in public health.

By addressing the issues surrounding the treatment of administrative data containing personal information, as well as the use and reuse of linked anonymized datasets to support public health research and evidence-based decision-making to inform policy, CIDACS also intends to explore other efforts to adhere to open data practices in the mold of FAIR Principles which includes distinction between data and metadata to support a wide range of special circumstances.

(7)

Considering that public health aims to promote and protect the health of peoples and populations by taking into account in its approach the social, economic and environmental aspects that impact the conditions under which people are born, grow, live, work and grow old (WHO, 2017), CIDACS’ experience demonstrates the feasibility of seeking ways of using administrative data for research purposes, which may eventually become adherent to the principles of open science.

We believe that big-data platforms, such as that of Cidacs, hold the potential to create data ecosystems capable of generating knowledge and evidence to support scientific investigation and decisions made by policymakers, administrators and others throughout society in an effort to fact the problems that impact the health of populations.

References

Administrative Data Research Network, Glossary. (2018). Retrieved from https://www.adrn.ac.uk/public-engagement/glossary/.

Barbosa,G., Alli,S., Araujo, B,. Reis.S., Sena,S., Ichihara,M., Pescarini, J., Fiaccone, R., Amorim, L., Pita, R., Barreto,M., Smeeth, L. & Barreto, M.L. (2018). A Novel Search Based Record Linkage System for Huge Databases With Higher Accuracy and Scalability. Article submitted for publication.

Connelly, R., Playford, C.J., Gayle, V. & Dibben, C. (2016). The role of administrative data in the big data revolution in social science research. Social Science Research, (59),1-12. DOI:

10.1016/j.ssresearch.2016.04.015

European Commission. (2016). H2020 Programme: guidelines on FAIR Data Management in Horizon 2020, v.3. Retrieved from

http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi- oa-data-mgt_en.pdf.

Foster Open Science. (2018). Open Science Taxonomy, Open Government Data. Retrieved from https://www.fosteropenscience.eu/foster-taxonomy/open-government-data.

Go Fair. (2018). Global Open, Findable, Accessible, Interoperable, Reusable. Retrieved from https://www.go-fair.org/.

Harron, K., Dibben, D., Boyd, J., Hjern, A., Azimaee, M., Barreto, M. & Goldstein, H. (2017).

Challenges in administrative data linkage for research. Big Data & Society. July–December, 1–

12. DOI: 10.1177/2053951717745678

Lee, M. (2017). Ethics and subsequente use of electronic health record data. J. Biomed. Inform., (71), 143-146. DOI: 10.1016/j.jbi.2017.05.022

(8)

Leonelli, S. The Epistemology of Data Use. (2017). Lecture in The Center of Philosophy of

Science - University of Pittisburgh. Retrieved from

https://www.youtube.com/watch?v=qc1aQep4DE8.

Open Government Partnership. (2018). About OGP. Retrieved from https://www.opengovpartnership.org/about/about-ogp.

Pita, R., Sena. S., Fiaccone, R., Amorim, L., Reis, S., Barreto, M.L., Denaxas, S. & Barreto, M.

(2018). On the Accuracy and Scalability of Probabilistic Data Linkage Overthe Brazilian 114 Million Cohort. IEEE Journal of Biomedical and Health Informatics. 22(2), 346-353. DOI:

10.1109/JBHI.2018.2796941.

Santos, P., Almeida, B. & Henning, P. (2018). Livro Verde - Ciência aberta e dados abertos:

mapeamento e análise de políticas, infraestruturas e estratégias em perspectiva nacional e

internacional. Rio de Janeiro: Fiocruz. Retrieved from

https://www.arca.fiocruz.br/handle/icict/24117.

Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.W.,da Silva Santos, L.B., Bourne. P.E., Bouwman, J., Brookes, A.J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds,S., Evelo, C.T., Finkers, R., Gonzalez- Beltran, A., Gray, A.J., Groth, P., Goble, C., Grethe, J.S., Heringa, J., 't Hoen, P.A., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S.J., Martone, M.E., Mons, A., Packer, A.L., Persson, B., Rocca-Serra, P., Roos, M., van Schaik, R., Sansone, S.A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M.A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K, Zhao, J.& Mons B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data3:160018 DOI: 10.1038/sdata.2016.18.

World Health Organization. (2018). About social determinants of health. Retrieved from http://www.who.int/social_determinants/sdh_definition/en/.

Wright, M. (2014). Using Administrative Data In Research: Learning from the Public Dialogues. Administrative Data Research Network. Retrieved from http://www.esrc.ac.uk/files/research/administrative-data-taskforce-adt/using-administrative- data-in-research-learning-from-the-public-dialogues-melanie-wright/.

Referenties

GERELATEERDE DOCUMENTEN

In the past, numerous recommendations have already been made and studies performed on how to develop formal methods research in order to close the gap (e.g., [ 3 , 5 – 7 , 9 , 13 ,

Abstract The National Institute for Health and Care Excellence (NICE) invited AstraZeneca, the manufacturer of ticagrelor (Brilique  ), to submit evidence on the clinical and

Our research contains several examples of such independently generated and processed sets: medical hospital records and the mortality register (chapter 2 and 3), medical

Still, they want to stimulate taxpayers to represent the facts concerning the private use of their cars correctly, which explains why they have to revert to

Different scholarly works that are published in the sciences and the humanities can be adapted to a digital environment, but it is easy to see why the humanities are slower to

In this paper, we compared the effectiveness of different types of persuasive information against meat consumption: two types of en- vironmentally-related moral appeals

Speed of big data analysis and how that translates to speed of decisions ( Section 3 ) Real-time data streams provide more up-to-date information faster than currently available

Angst werd in het onderzoek gemeten met de STAI, die uit twee delen bestaat; namelijk state anxiety (STATE) en trait anxiety (TRAIT). Beide componenten werden met behulp van