Medical record linkage between clinical
databases and the national perinatal
registry
Comparison between frozen and fresh embryo transfer
Master Thesis
A.R. Wong 6/24/2015Medical record linkage between clinical
databases and the national perinatal
registry
Comparison between frozen and fresh embryo transfer
Student Alexander Richard Wong Student number 10668357 E‐mail: alexander.r.wong@gmail.com Mentor Dr. Anita CJ Ravelli Academic Medical Center ‐ University of Amsterdam Department of Medical Informatics E‐mail: a.c.ravelli@amc.uva.nl Tutor Prof. dr. Ameen Abu‐Hanna Academic Medical Center ‐ University of Amsterdam Department of Medical Informatics E‐mail: a.abu‐hanna@amc.uva.nl Location of scientific research project Academic Medical Center ‐ University of Amsterdam Department of Medical Informatics Meibergdreef 15 1105 AZ Amsterdam The Netherlands Period November 2014 – June 2015Table of Contents
Summary ... 5 Samenvatting ... 7 Chapter 1 : Introduction, research questions and chapter organization. ... 9 Introduction ... 9 Chapter organization ... 10 Chapter 2 : Background ... 11 2.1 Medical record linkage ... 11 2.2 Medical record linkage between data from fertility clinics and the PRN registry ... 13 2.3 Medical record linkage between the national cancer registry and the PRN registry ... 14 2.4 The Netherlands Perinatal registry (PRN) ... 15 2.5 Linkage and Privacy regulations in the Netherlands ... 15 Chapter 3 : Medical record linkage between fertility clinics and the PRN registry... 17 3.1 Introduction ... 17 3.2 Methods ... 18 3.3 Results ... 21 3.4 Discussion ... 25 Chapter 4 : Medical record linkage between the Netherlands Cancer registry and the PRN registry.... 27 Chapter 5 : Literature review on the difference in perinatal outcomes between fresh and frozen embryo transfer... 31 5.1 Introduction ... 31 5.2 Methods ... 32 5.3 Results ... 33 5.4 Discussion ... 38 Chapter 6 : Are fresh embryos at risk? a retrospective cohort study using linked assisted reproductive technology and perinatal data ... 416.1 Introduction ... 41 6.2 Material and Methods ... 41 6.3 Results ... 43 6.4 Discussion ... 45 Conclusion SRP ... 47 Abbreviations ... 48 Acknowledgements ... 49 References ... 50
Summary
Introduction Medical research often requires more information than is available in a single electronic health record or medical registry. The objectives of this scientific research project (SRP) were 1) to investigate to what extent medical record linkage could be used to link data from fertility clinics on assisted reproductive technologies and women diagnosed with breast cancer to pregnancy and child outcome data from the Netherlands Perinatal Registry (PRN) , with a low amount of common identifying information, and 2) to analyze on one linked database whether frozen embryo transfer compared to fresh results in better perinatal outcomes. Methods Linkage: Data from 5 of the 13 fertility clinics on assisted reproductive technologies (ART) which resulted in a pregnancy (n=10,129) from 1999 to 2011 was deterministically linked to the national PRN registry (n=2,548,977) from 1999 to 2012. Data from the Netherlands Cancer Registry (NKR) on women diagnosed with breast cancer (n=26,105) and born between 1959 and 1995 was deterministically linked to data on pregnancies from the Netherlands Perinatal Registry (n=2,668,584) between 1999 and 2013. Analysis of linked ART‐PRN data: A literature review was conducted and the linked database was analyzed to investigate whether frozen embryo transfer in the Netherlands resulted in better perinatal outcomes than fresh embryo transfer. The main outcome measurements were (low) birth weight, small for gestational age (SGAp10), preterm birth (<37 weeks), perinatal mortality (and admission to a neonatal intensive care unit (>24 hours). Results Linkage: Based on the mother’s date of birth, four digit zip code and the child’s date of birth or a timeframe in which the child was expected to be born using the treatment date in the data from fertility clinics and the child’s date of birth from the PRN registry when the child’s date of birth was missing, data from fertility clinics could be linked to the PRN registry. 75.9% of the records from fertility clinics were linked to at least 1 record from the PRN registry. The records from the fertility clinic which linked to the PRN registry were not different in maternal and paternal age but included less patients with low social economic status (SES) and in vitro fertilization and more patients undergoing frozen embryo transfer than records which did not link. Two variables were used in deterministic record linkage between the NKR registry and the PRN registry: the mother’s date of birth and four digit zip code. With this linkage key, we linked 9,508 breast cancer patients (36%) to 15,596 pregnancy records in the PRN registry. Analysis of linked ART‐PRN data: In our literature review, we found that children born after frozen embryo transfer had better perinatal outcomes considering birth weight, rate of low birth weight, rate of small for gestational age and rate of preterm birth than children born after fresh embryo transfer. From the database with linked data from fertility clinics and the PRN registry, children born after frozen embryo transfer were on average 91 grams heavier than children born after fresh embryo transfer. Furthermore, they had lower rate of birth weight below 2500g 0.47 (95% CI: 0.29‐0.76) and small for gestational age 0.63 (95% CI: 0.48‐ 0.84). There were no significant differences in preterm birth, perinatal mortality or admission to a neonatal intensive care unit.Conclusion Medical record linkage is possible to link data from multiple sources even if there are just a few common partially identifying variables present. We identified three fourth of the patients who were pregnant after an assisted reproductive technology in the PRN registry There is a possible selection bias on SES and treatment. We found that after breast cancer treatment around one third of the woman got pregnant. External validation is needed because there were only two linking variables. The use of the linked data from fertility clinics and the PRN registry confirms that perinatal outcomes after frozen embryo transfer are better than perinatal outcomes after fresh embryo transfer in the Netherlands. Keywords medical record linkage, assisted reproductive technology, perinatal outcomes, frozen embryo transfer
Samenvatting
Introductie Medisch onderzoek heeft vaak meer informatie nodig dan er beschikbaar is in één elektronisch patiënten dossier of medische registratie. De doelen van dit wetenschappelijk onderzoek waren 1) te onderzoeken tot welke mate medisch dossier koppelen gebruikt kan worden voor het koppelen van data van fertiliteitsklinieken over kunstmatige voortplantingstechnieken en vrouwen gediagnosticeerd met borstkanker aan zwangerschap en kind uitkomsten van de Perinatale Nederlandse Registratie (PRN), met een laag aantal gemeenschappelijke identificerende informatie, en 2) één gekoppelde database te gebruiken om te analyseren of bevroren embryo transfer betere perinatal uitkomsten heeft dan verse embryo transfer. Methoden Koppeling: Data van 5 van de 13 fertiliteitsklinieken over kunstmatige voortplantingstechnieken (ART) die tot een zwangerschap hebben geleid (n=10.129) van 1999 t/m 2011 werd deterministisch gekoppeld met de nationale PRN registratie (n=2.548.977) van 1999 t/m 2012. Data van de Nederlandse Kanker Registratie (NKR) over vrouwen gediagnosticeerd met borstkanker en geboren tussen 1959 en 1995 werd deterministisch gekoppeld aan data over zwangerschappen van de PRN registratie (n=2.666.584) van 1999 t/m 2013. Analyse van de gekoppelde ART‐PRN data: Een literatuur review is gedaan en de gekoppelde database werd geanalyseerd om te onderzoeken of bevroren embryo transfer in Nederland tot betere perinatal uitkomsten leid dan verse embryo transfer. De belangrijkste uitkomstmaten waren: (laag) geboorte gewicht, te klein bij de geboorte (SGAp10), te vroeg geboren (<37 weken), perinatale mortaliteit en opname in een neonatale intensive care unit. (>24 uur). Resultaten Koppeling: Met behulp van de geboortedatum van de moeder, vier cijfers postcode en geboortedatum van het kind of de periode waarin het kind verwacht werd te worden geboren, gebaseerd op de datum van behandeling in de data van fertiliteitsklinieken en de geboorte datum van het kind in de PRN registratie wanneer de geboortedatum van het kind miste, kon data van fertiliteitsklinieken worden gekoppeld aan de PRN registratie. 75,9 van de dossiers uit fertiliteitsklinieken werd gekoppeld aan minstens 1 dossier uit de PRN registratie. De gekoppelde behandelingen waren niet verschillend in de leeftijd van de moeder of vader, maar hadden minder patiënten met een laag sociale economische status (SES) en in vitro fertilisatie en meer patiënten die bevroren embryo transfer ondergingen dan de niet gelinkte dossiers. Twee variabelen werden tijdens deterministisch koppelen van de NKR registratie en de PRN registratie gebruikt: de geboortedatum van de moeder en vier cijfers van de postcode. Met deze koppelsleutel, koppelden we 9.508 borstkanker patiënten (36%) aan 15.596 zwangerschappen in de PRN registratie. Analyse van de gekoppelde ART‐PRN data: In onze literatuur review, kinderen geboren na bevroren embryo transfer hadden betere perinatale uitkomsten ten opzichte van geboortegewicht, aantal kinderen met laag geboortegewicht, aantal kinderen te klein tijdens geboorte en aantal te vroeg geboren kinderen dan kinderen geboren na verse embryo transfer. In de database met gekoppelde data van fertiliteitsklinieken en de PRN registratie, waren kinderen geboren na bevroren embryo transfer gemiddeld 91 gram zwaarder dan kinderen geboren na verse embryo transfer. Daarnaast was het gewicht minder vaak lager dan 2500 gram 0,47 (95% CI: 0,29‐0,76), en waren ze minder vaak te klein bij degeboorte 0,63 (95% CI: 0,48‐0,84). Er werd geen significant verschil gevonden in aantal te vroeg geboren kinderen, perinatale mortaliteit of opname in een neonatale intensive care unit. Conclusie Medisch dossier koppelen kan gebruikt worden om data van verschillende bronnen te koppelen ook al zijn er slechts een aantal gemeenschappelijke gedeeltelijke identificerende variabelen aanwezig. We identificeerden drie vierde van de patiënten die zwanger werden na een kunstmatige voortplantingstechniek in de PRN registratie. Er is een mogelijke selectie bias ten aanzien van SES en behandeling. We concluderen dat na borstkanker één derde van de vrouwen zwanger werd. Externe validatie is nodig, omdat er slechts twee koppel variabelen waren. Het gebruik van gekoppelde data van fertiliteitsklinieken en de PRN registratie bevestigt dat perinatale uitkomsten na bevroren embryo transfer beter zijn dan perinatale uitkomsten na verse embryo transfer in Nederland Sleutelwoorden: medisch dossier koppelen, kunstmatige voortplantingstechnieken, perinatale uitkomsten, bevroren embryo transfer
Chapter 1: Introduction, research questions and chapter organization.
Introduction
Medical registries are useful data sources for medical research. However, these registries do not always contain all information that is necessary to answer particular clinical questions. Therefore, it is required to link these registries together. In the Netherlands, privacy laws do not permit the use of nationwide person identifying information such as a civil registry identification number (BSN‐number) to be used to link registries for research purposes as is possible in Scandinavian countries.(1) Therefore, a combination of other identifying variables is used to link registries together. The objective of this scientific research project (SRP) was to investigate to what extent medical record linkage could be used to link data from multiple sources despite a low amount of identifying information and to perform an epidemiological study on one of the resulting databases in order to find out whether frozen embryo transfer results in better perinatal outcome than fresh embryo transfer. Two studies were started based on clinical questions from the department of obstetrics and gynecology at the Academic Medical Center (AMC). For both these pilots, perinatal information of children was necessary. The Netherlands Perinatal registry (PRN) is a mother child registry which contains pregnancy, delivery and perinatal outcomes of almost all pregnant women and children born in the Netherlands .(2) For both studies it was expected that the amount of variables usable for medical record linkage would be limited. This research was started to see if the number of variables would still be sufficient enough to reliably link the data in both studies. Medical record linkage between data from fertility clinics and PRN registry There are differences in pregnancy outcomes between children conceived by assisted reproductive technology protocols used by fertility clinics in the Netherlands and spontaneous born children. It is not clear to what extent these differences in pregnancy outcomes are caused by differences in stimulation, laboratory protocols or maternal factors. While data from PRN contains pregnancy and perinatal outcomes together with the type of assisted reproductive technology used to start the pregnancy, this information is not complete. The department of obstetrics and gynecology at the AMC wants to create an anonymized database which combines the clinical data from fertility clinics and perinatal data from PRN. This new database would then be used to facilitate clinical research on assisted reproductive technology protocols and their perinatal outcomes. The fertility clinics agreed to have their clinical databases linked to PRN, to enable clinical research on the effects of their treatment (protocols) on perinatal outcomes and the use of PRN data was requested by the department obstetrics and gynecology at AMC with application number 12.43 and approved by PRN. Medical record linkage between the national cancer registry and PRN registry Clinicians want to know how many women in the Netherlands become pregnant after they have been diagnosed with breast cancer and have received treatment. They would like to use this information to inform their patients on the chance to get pregnant. The PRN does not contain information on whether awoman has been diagnosed and treated for breast cancer, but the National Cancer Registry (NKR) does. The Netherlands Comprehensive Cancer Organization (IKNL), which manages the NKR agreed to have their data on women diagnosed with breast cancer linked to the PRN registry to enable this type of clinical research. Furthermore, the study was to be considered as a test whether it was possible to link the data and to assess how reliable it would be. The use of PRN data was requested by the department obstetrics and gynecology at the AMC with application number 14.35 and approved by PRN. Clinical research A literature review was done to understand the differences between fresh and frozen embryo transfer and to investigate which perinatal outcomes could be different between children born from fresh or frozen embryo transfer. These outcomes where then used in a retrospective cohort study on the linked data from fertility clinics and the PRN registry, to investigate whether differences in birth weight are dependent on the use of fresh or frozen embryo transfer. Furthermore, we used perinatal outcomes found in the literature review as secondary outcomes. In this SRP, the following research questions will be answered: Medical record linkage between data from fertility clinics and PRN registry: 1. To what extent is it possible to link the clinical patient records from the fertility clinics to the Netherlands Perinatal Registry using medical record linkage? 2. What are the differences between records from fertility clinics which could be linked to the PRN and could not be linked to the PRN? Medical record linkage between the national cancer registry and PRN registry: 3. To what extent is it possible to link data from the national cancer registration to the Netherlands Perinatal Registry using medical record linkage? Clinical research: 4. What is known about the differences in perinatal outcomes between frozen embryo transfer and fresh embryo transfer in scientific literature? 5. Are differences in birth weight dependent on the use of frozen or fresh embryos during embryo transfer after IVF or ICSI treatments?
Chapter organization
In the next chapter, background information on medical record linkage, assisted reproductive technologies, breast cancer, the used registries and databases and the regulations on data linkage in the Netherlands will be given. Chapter 3 and 4 are about the two medical record linkage studies we performed. In chapter 5, literature review is performed to assess what is known in the scientific literature about the differences in pregnancy and perinatal outcomes after fresh and frozen embryo transfer. In chapter 6, the linked data from the first pilot is used to analyze the differences in perinatal outcomes after fresh and frozen embryo transfer in 5 Dutch fertility clinics.Chapter 2: Background
In this chapter background information is given on medical record linkage. The chapter continues with background information on the clinical domains assisted reproductive technologies, breast cancer, perinatal care and the registries used. The chapter concludes with the regulations in the Netherlands about data linkage.2.1 Medical record linkage
A persons’ clinical data can be registered at different locations. These databases often do not have a unique identification number to combine records from the same person together. During medical record linkage, partially identifying variables are grouped together to create a discriminating key which can be used to identify an individual among different data sources.(3) Examples of partially identifying variables are date of birth, gender and zip code. The objective of medical record linkage (MRL) is to find records from both data sources which correspond to the same individual by using personal variables which are stored in both sources. This is possible by creating a new table where all possible combinations (pairs) of the records of both datasets are stored. The number of pairs in this new table is the product of records of the 1st and 2nd dataset. Based on the partially identifying variables, the decision is made whether the two records in a pair belong to the same individual or not. When it is though they belong to the same individual, the pair is classified as a link and otherwise as a non‐link. (3, 4) There are different methods in using the partially identifying variables to determine if they belong to the same individual or not. The two most used methods are deterministic record linkage (DRL) and probabilistic record linkage (PRL). (3‐5) In DRL all variables used in the linking key are of the same importance, while in PRL different weights are assigned to the variables. Two strategies are described for DRL, the standard DRL strategy and the n‐1 match strategy. In the standard strategy a pair is classified as a link if the records have the same value for all the variables in the linking key. This strategy only performs well, if the linking key itself is very discriminating and few errors are present in these variables. The deterministic n‐1 match strategy can compensate for errors by allowing a single mismatch in the linking key. Because all variables in DRL are considered to be of the same importance, the n‐1 match strategy does not differentiate between the pairs that mismatch on one variable. Furthermore, by allowing one mismatch, the discriminating power of the linking key is lowered, which introduces errors, namely false‐links in the linked dataset (6) In PRL, weights for agreement and disagreement are calculated for each variable. Each variable has probability that the variable agrees when a record pair belongs to the same individual (m) and a probability that the variable disagrees when a record pair belongs to the same individual (u).(5) By using maximum likelihood methods on these two probabilities, for each linking variable a weight can be assigned when a record pair agrees on that variable and when they disagree.(5, 6) These weights are then used to calculate a total weight for each pair created during PRL. By estimating prevalence of true matches, a threshold is established. Pairs with a total weight above this threshold are classified as links,while pairs with a total weight below this threshold are classified as non‐links. Furthermore, it is possible to calculate weights for variables when they partially agree. (6) Linking variables The partially identifying variables used to detect whether records belong to the same individual are called linking variables. These linking variables have to be present in both datasets that are to be linked. Furthermore, it is preferable that the linking variables have a high discriminating power.(7) For example, a date of birth is more discriminating than the gender of a person, because the chance that two individuals have the same date of birth is much smaller than the chance they have the same gender. Date of birth and zip code are frequently used linking variables in MRL.(7) Blocking MRL can become unfeasible for even modern computers if the datasets which are linked are large, because the number of possible pairs is the product of the number of records in both datasets. It is possible to limit the amount of pairs being considered by using a technique called blocking. During blocking the datasets are separated into subsets based on the value of the blocking variable. Each subset therefore only contains records that have the same value for the blocking variable. The MRL is then done between the subsets of the 1st and 2nd dataset which have the same value for the blocking variable. (5) It is important that the blocking variables have a high reliability, meaning these variables should have few errors or missing values. This is necessary, because all records that do not have the same value for this variable are automatically discarded. It is possible to repeat the MRL with a block on a different variable to consider pairs which do not agree on the first blocking variable, to mitigate this problem.(5) Ties When a single record from one dataset links to more than one record from the other dataset, these links are called ties. Often these ties include one true link and one false link which occur because the chosen linking variables are not differentiating enough. When probabilistic record linkage is used, these ties are generally solved by choosing the pair with the highest total linking weight. For deterministic record linkage this is generally not possible. However these ties can be validated during a validation study. When no validation study is done, it is possible to make an educated guess based on other variables in the records which are not in the linkage key. Error types There are 2 errors that can occur during MRL: a pair can be classified as a link while in reality the records belong to a different individual (false link) and a pair can be classified as a non‐link while in reality the records do belong to the same individual (false non‐links). A false link can occur, when the number of linking variables used during MRL is too small and therefore not strong enough to discriminate between different individuals. Or the linking variables themselves are not discriminating enough and can only assume a small number of different values. (3) A false non‐link can occur when the data quality of one or both datasets is insufficient, due to missing values or errors in the linking variables. (3)
Validation of Medical record linkage Results from MRL are based on the chosen linking variables. To research the rate of false links and false non‐links that occur in a linked dataset, it is possible to do a validation study. During a validation study the true status of a pair (based on additional information) of records is compared to the status given by the MRL procedure. (8) A validation study consists of 3 steps: sample selection, data selection and data analysis. Depending on the size of the data sources and the created linked dataset, it can be unfeasible to verify the true status of all created pairs. Therefore a sample is taken to be used in the validation study. For both DRL and PRL a validation study can be conducted. When PRL is used to link data sources, it is recommended to focus on records with weights around the established threshold.(9) For DRL it is possible to take a randomized sample of links found. Data collection is required, because the data used in MRL only contains partially identifying information on which the resulted dataset is linked. Therefore additional information is needed from both data sources to find the true status of a pair. For example, PRN does not register names, but for a limited number of records it is possible to request this information at the hospitals or midwives practices that provided the perinatal information to PRN. In data analysis this additional information from both sources is compared and the true status (true link, false link, true non‐link or false non‐link) is determined.(8)
2.2 Medical record linkage between data from fertility clinics and the PRN registry
Assisted reproductive technology (ART) Assisted reproductive technology (ART) is a generic term for medical and surgical treatments to start a pregnancy and is primarily used during subfertility. Another name for ART is Medical Assisted Reproduction (MAR). Examples of ART which can be performed on women are in vitro fertilization (IVF) intra cytoplasmic sperm injection (ICSI) and intrauterine insemination (IUI). The use of ART to treat subfertility started in 1976 when the first pregnancy after IVF was reported. Since then more than 5 million pregnancies have been started with ART.(10) The use of ART is associated with an increased risk of multiple gestations, low birth weight and preterm birth. (10) In vitro fertilization (IVF) IVF is an example of ART. An IVF procedure starts by stimulating follicular growth in the ovaries. This can be achieved by administrating exogenous follicle‐stimulating hormone, which stimulates growth of immature ovarian follicles in the ovaries. When the ovarian follicles are mature, the ovulation phase can be triggered by administrating human chorionic gonadotropin. The oocytes which are released during the ovulation are retrieved by trans‐vaginal follicle aspiration and are then combined with spermatozoa in a laboratory. By combining the oocytes with spermatozoa, the oocytes are fertilized. One or more embryos are then inserted into the uterus using a catheter.(11) Intra cytoplasmic sperm injection (ICSI) ICSI can be performed during an IVF procedure. Instead of fertilizing the oocytes by combining them with spermatozoa, a single sperm cell is injected into the cytoplasm of an oocyte. ICSI is primarily used when the indication to perform an ART is male subfertility.(12)Frozen embryo transfer (FET) The process of frozen embryo transfer (FET) starts similar to a normal IVF procedure. However instead of transferring the fresh embryo back to the uterus in the same cycle, the embryo is cryopreserved. This is often done, when there is an excess of embryos after an IVF or ICSI procedure. There are different methods to cryopreserve embryos, but both slow and ultra‐rapid freezing have proven to be safe and effective. 10 to 20 percent of embryos do not survive the cryopreservation because of damage inflicted to the embryo during freezing and thawing. In contrast to cycles with fresh embryos, where the endometrium is primed by endogenously produced hormones, women who receive frozen embryos have to be primed by exogenous estrogen and progesterone.(11) Scientific studies in which the perinatal outcomes of children born after FET and fresh embryo transfer are compared report better perinatal outcomes after FET.(13) Children after FET were born with higher mean birth weight. Furthermore, the relative risk for low birth weight, small for gestational age or perinatal mortality were lower after FET. (14) It is not yet known why FET results in better perinatal outcomes compared to fresh embryo transfer, but it is suggested that the ovarian stimulation results in a less favorable environment in the uterus, due to high concentrations of estrogen and progesterone which influence the early development of the embryo during fresh embryo transfer, while frozen embryo transfer takes place in a more natural environment in the uterus.(14) IVF and ICSI data sources There are 13 fertility clinics in the Netherlands which perform embryo transplantations after IVF or ICSI. Data on ART performed at these fertility clinics is stored in electronic patient records. Data from these electronic patient records were queried and stored in either an Excel or SPSS file. Each row in these files would contain information on a single ART treatment performed on a woman, who became pregnant after the treatment. A case number was assigned by the clinic, which referred to the full patient record at the clinic to allow validation or data correction when this was needed. The information in the datasets included data on the mother, type and indication of the treatment and morphological data about the embryos. When available, it included date of birth, gender and birth weight of the child.
2.3 Medical record linkage between the national cancer registry and the PRN registry
Breast cancer Breast cancer is the most common type of cancer found in Dutch women. In 2010, 1.58 per 1000 women were diagnosed with breast cancer. (15) When breast cancer is suspected, either by the discovery of an abnormal lump in the breast or armpit, or due to a mammogram or breast MRI, it can be diagnosed by analyzing a sample obtained by a breast biopsy. There are different types of breast cancer. The most common type starts in the ducts of the breasts and is called ductal carcinoma.(16) Treatment options depend on the type and stage of the breast cancer. Early stages of breast cancer are generally treated by removal of the affected tissue, either by removing the whole breast or only the cancerous tissue. Furthermore, adjuvant therapy can be given with endocrine therapy (when the cancer is hormone receptor positive) or chemotherapy. Later stages in which the carcinoma has metastasized are not curable. Life can be prolonged by systemic administration of chemotherapy or endocrine therapyand localized radiation and surgery. (17)The mean survival of women with metastasized breast cancer is 2 years.(16) The Netherlands Cancer Registry (NKR) Since 1989, oncological data of all Dutch patients diagnosed with cancer is registered in a national registry called the Netherlands Cancer registry. The NKR is used in clinical and epidemiological research. Information which is registered at NKR includes: diagnosis, tumor morphology, information about the used treatment and information about the follow‐up of the patient.(18) Data registered in the NKR registry originates from other registries such as the “Pathologisch‐Anatomisch Landelijk Geautomatiseerd Archief” (PALGA), which is a national registry on histopathology in the Netherlands.
2.4 The Netherlands Perinatal registry (PRN)
In the Netherlands four professional organizations are involved in the perinatal care. Midwives, general practitioners, obstetricians and pediatricians all have their own registry in which they record data about the child and mother relevant for their profession. (9) Data from these 4 registries: LVR1 (midwives), LVR2 (obstetricians), LNR (pediatricians) and LVRh (general practitioners) has been linked together with medical record linkage into 1 registry called the Netherlands Perinatal Registry. (9) The procedure in which the data from these 4 registries have been linked In the PRN is described in the thesis: “Record linkage to enhance data from perinatal registries” by medical informatics student Miranda Tromp. This PRN registry is used to assess outcomes of various perinatal care processes. In the PRN registry, information about the mother, pregnancy and perinatal information on the child are recorded.2.5 Linkage and Privacy regulations in the Netherlands
Privacy laws in the Netherlands prevent the use of the national identification number (BSN) to link data for research. Its use is strictly regulated by law.(19) It is therefore not possible to use this number to link data from multiple databases together. Furthermore, access to the databases is regulated by different organizations. In case of the national perinatal registry, access to the data is regulated by PRN, while access to data from NKR is regulated by IKNL and access to data about ART treatments is regulated by the individual fertility clinics. All parties use regulations to enforce privacy for the patients which are registered in their database. These regulations often include, that it should not be possible to trace the linked data back to an individual by anonymizing the data, which can be achieved by transforming identifying information to less identifying information. For example the date of birth can be transformed to an age and zip codes can be removed or transformed to a social economic status. (20) Other regulations limited access to the data. This was the case for data from IKNL, which could only be accessed under supervision and at their location and the linked data was not allowed to leave the building. Furthermore, privacy regulations can enforce the use of a trusted third party.Trusted third parties Medical databases and registries contain both identifying and medical (sensitive) information on individuals. Identifying information is not often needed in detail to conduct clinical research, but is needed in the medical record linkage procedure. Trusted third parties are companies which separate identifying from medical information to protect patient privacy. They only have access to the identifying variables which are used to link data together and are not involved in research on the linked data at a later time. (21) To further protect the privacy, the identifying variables are often standardized and encrypted at the data source. The variables themselves are then still comparable during medical record linkage, but it is not possible for the trusted third party to link the identifying information back to an individual.(22)
Chapter 3: Medical record linkage between fertility clinics and the PRN registry
3.1 Introduction
Registration on perinatal outcomes after assisted reproductive technologies (ART) has been difficult in the Netherlands, because fertility clinics are unaware of these child outcomes or do not register them in their electronic health records. Therefore most comparisons between ART are based on the chance to start a pregnancy alone and not on perinatal outcomes. Perinatal outcomes of a pregnancy are registered by four different professional organizations which are involved in the perinatal care in the Netherlands, namely midwives, obstetricians, pediatricians/neonatologists and general practitioners. These four registries are linked together by probabilistic record linkage into one national mother child registry called the Netherlands Perinatal Registry (PRN). Comparison between the PRN registry and the civil registry (Basisregistratie personen) has shown between 92% in 1999 and 99% in 2011 of all children born in the Netherlands are registered in the PRN registry.(2) Researchers from the Dutch Assisted Reproductive Studies (DARTS) want to combine the clinical data on ART performed at fertility clinics with the fetal and neonatal data in PRN to be able to compare these outcomes between different stimulation and laboratory procedures used by the fertility clinics. To enable clinical research on the ART used by fertility clinics and their perinatal outcomes it was necessary to link the women and the children born form these pregnancies and registered in PRN to the ART provided by the fertility clinics. To combine this data it is necessary to identify which patient at the fertility clinic and which mother in the PRN registry are the same individual. However there are no common identification numbers used by the fertility clinics and the PRN registry on which this can be done. The objective of this study was to investigate how well medical record linkage (MRL) could be used to combine the ART data with perinatal data from the PRN registry, by creating a discriminating key based on common partially identifying variables. At the start of this study it was expected only a small number of these partially identifying variables could be used. The research questions for this study were: 1. To what extent is it possible to link the clinical patient records from the fertility clinics to the Netherlands Perinatal Registry using medical record linkage? 2. What are the differences between records from fertility clinics which could be linked to the PRN registry and could not be linked to the PRN registry?3.2 Methods
Linkage permission and databases Permission to use data from the PRN registry in a pilot study to investigate the possibility of medical record linkage between the different fertility clinics and perinatal data was obtained in 2011 (PRN 11.25). Furthermore, permission to link the perinatal data to all 13 fertility clinics in the Netherlands was obtained (PRN 12.43). All 13 fertility clinics in the Netherlands agreed with the project in 2011, and pledged to cooperate in providing their data on performed ART. However, after obtaining agreement from both fertility clinics and PRN it took till 2014 to acquire the databases and only 7 of the 13 medical ethics commissions at the fertility clinics gave their approval before January 2015. The data on ART procedures which resulted in a pregnancy was queried at the fertility clinics and transported by courier to the Academic Medical Centre (AMC) where the pilot took place. A separate area on the network of the AMC was created in which the medical record linkage took place and where only the people performing the record linkage and their supervisor had access to. Data cleaning The complete medical record linkage procedure took place in Statistical Analysis System (SAS) 9.3 and was based on the MRL procedure developed for the creation of the PRN registry.(9)The first step in linking the ART data with perinatal data was to prepare both the received ART and PRN data files(3). The PRN data which was provided had all pregnancies and children separated into different files based on the year of birth. For this study, data from the PRN registry of 1999 until 2013 was available. A selection was made of the necessary variables of the PRN registry. This selection included all possible linking variables that could be present in the ART datasets (e.g. date of birth, gender, birth weight) and perinatal data which could be used in different clinical research questions along with a record identification number linking to the full record in the PRN registry, to ensure additional data could be added when necessary. The selected variables were standardized across all available year from the PRN registry and date variables were transformed into SAS date values. Furthermore variables and values from available ART data were standardized. After standardization the ART data was linked to itself using deterministic record linkage (DRL) on five differentiating variables to find administrative duplicates. The variables used were chosen on their high differentiating power and low percentage of missing values. Determination of linking variables and medical record linkage The ART dataset was the base of the medical record linkage procedure and the objective was to link a record from the PRN registry to each record from the ART dataset. Common variables between ART data and the PRN registry, which could be used to make a unique identification key, were investigated. Because there were only three common variables which could be used in the medical record linkage procedure, deterministic record linkage was performed between ART data and PRN data with a block on the mother’s date of birth. A flowchart of this procedure is shown in figure 3.1.The variables used were: the mother’s date of birth, four digit zip code and the child’s date of birth. When the child’s date of birth was registered in the ART dataset, it was allowed to differ up to two weeks with the child’s date of birth registered in the PRN registry. When the child’s date of birth was missing in the ART dataset, it was estimated the child’s date of birth in the PRN registry would be within 18 and 42 week after embryo transfer. Defining links and ties Links were considered ties when more than one ART record linked to the same PRN record or when more PRN records linked to the same ART record than the pregnancy status in the PRN registry (singleton, twin or triplet). When gender and birth weight were available in the ART record these values were used to find the true link. When no information for these two variables was available or when these could not be used to distinguish between ties, the gestational age was calculated using the date of embryo transfer in the ART record and the child’s date of birth in the PRN record. This was compared to the true gestational age in the PRN record and pair with the lowest difference between these two values were considered to be true links. Furthermore, it was analyzed whether there was a logical explanation for these ties, by comparing the records on the child’s date of birth, gender, birth weight, pregnancy status (singleton, twin, triplet), child code and whether all the information in the PRN record belonged to one registry (LVR1 or LVR2). It was possible that the different records still belonged to the same child, because the PRN registry itself is constructed with probabilistic record linkage and records which belong to the same child might not have linked because the total linkage weight was below the threshold. When based on the previously mentioned variables it was estimated the records belonged to the same child, they were combined into one record. Anonymization After the data was linked, the dataset was anonymized by converting the mother’s date of birth into a maternal age at time of the ART procedure and time of delivery. The child’s date of birth was converted to a month and year of birth. Furthermore, the four digit zip code was converted to a social economic status (SES) using data from 2010 provided by Sociaal Cultureel Planbureau. Only after anonymization, the data was made available to other researchers. Validation In this study we analyzed whether there were differences between ART records linked to a PRN record and ART records which were not linked during the MRL procedure. We performed student t‐tests on the maternal and paternal age registered in the ART datasets and chi‐square tests on low SES (<25th percentile) and percentage of linkage in the different types of ART treatments.
Figure 3‐1: Flowchart medical record linkage method between assisted reproductive technology data and data from the PRN registry.
3.3 Results
Data availability All 13 fertility clinics in the Netherlands agreed to provide data on their ART procedures which resulted in a pregnancy in 2011. However in August 2014, data of only one clinic was available and at the start of this scientific research project data of only two fertility clinics was available. In May 2015, five fertility clinics provided a workable dataset with data on Art procedures which resulted in a pregnancy. All five fertility clinics used a different electronic health records. Therefore the formats used to store the data were different between all provided datasets. Table 1 lists all 13 fertility clinics in the Netherlands and whether they provided data or not. Furthermore, the number of treatments performed in 2010 which resulted in a pregnancy according to the Dutch society of obstetrics and gynecology (NVOG) is included. Table 3‐1: Table 1: Fertility clinics in the Netherlands, available ART datasets and number of pregnancies in 2010 Fertility clinic Data available until May 30, 2015 and linked to PRN in this SPR project in 2015 Treatments which resulted in a pregnancy in 2010 AMC yes 210 Catherina Ziekenhuis Eindhoven yes 199 Erasmus MC no 574 Isala Zwolle no 420 Kinderwens centrum Leiderdorp no 226 LUMC Leiden no 269 MUMC no 121 St. Elisabeth Tilburg no 346 UMC Nijmegen no 471 UMC Utrecht no 582 UMCG Groningen yes 280 Voorburg yes 213 VUMC yes 673Inclusion and exclusion of data In total 4518 records from the available ART data were excluded. Fertility clinic E provided data from two separate electronic health records and during the query from one of these systems, the data was corrupted and could not be used. Therefore 4516 records were excluded from the MRL procedure. Furthermore, one administrative duplicate and 1 record with no values for potential linking variables were excluded. In table 2 the number of records provided by the clinics and the number of records included and excluded is shown. Table 3‐2: Available records, duplicates and excluded records for each fertility clinic Fertility clinics Records (n) Administrative duplicates (n) Records with all possible linking variables missing (n) Corrupted data (n) Included (n) A 1757 0 0 0 1757 B 2301 1 1 0 2299 C 2538 0 0 0 2538 D 1618 0 0 0 1618 E 6433 0 0 4516 1917 Quality of potential linking variables Potential linking variables were: mother’s date of birth, mother’s zip code, child’s date of birth, child’s gender, child’s birth weight and number of children born. Furthermore, date of procedure could be used to estimate the timeframe in which a child would be born. As shown in table 3, the variables for the mother’s date of birth, zip code were registered by all five fertility clinics and the PRN registry, which made them the strongest linkage variables available. All fertility clinics registered the date on which the ART treatment took place. However this date was not registered in the PRN registry. At the same time the child’s date of birth was not registered by all fertility clinics and had high percentages of missing values when they were registered and could therefore not always be used as a linking variable. Therefore the child’s date of birth was used in combination with the ART procedure date. When the child’s date of birth was not known at the fertility clinic, the date of birth in the PRN registry had to be between 18 and 42 weeks after ART procedure date.
Table 3‐3: Number and percentages missing values for potential linking variables Number and percentages missing value Potential linking variables A (n=1757) B (n=2300) C (n=2538) D (n=1618) E (n=1917) PRN (n=2548977) n % n % n % n % n % n % Mother’s date of birth 0 0% 0 0% 1 0.04% 0 0% 0 0.0% 1717 0.07% Mother’s zip code 0 0% 10 0.4% 23 0.9% 0 0% 11 0.6% 9515 0.37% Date of procedure 34 1.9% 12 0.5% 0 0% 0 0% 0 0.0% 0 0.00% Child’s date of birth 882 50.2% 80 3.5% 511 20.1% 1618 100% 228 11.9% 613 0.02% Child’s gender 1757 100% 2300 100% 2538 100% 1618 100% 176 9.2% 1598 0.06% Child’s birth weight 1757 100% 2300 100% 2538 100% 1618 100% 35 1.8% 1604 0.06% Number of children 882 50.2% 760 33.0% 0 0% 1618 100% 160 8.3% 613 0.02% Table 3‐4: Number and percentage linked to PRN for each fertility clinic and treatment year Clinic A B C D E Total year ART
n linked % n linked % n linked % n linked % n linked % n linked %
1999 0 0 0 0 0 0 39 22 56,4% 0 0 39 22 56,4% 2000 0 0 133 64 48,1% 187 135 72,2% 115 84 73,0% 0 0 435 283 65,1% 2001 0 0 182 103 56,6% 183 135 73,8% 122 94 77,0% 0 0 487 332 68,2% 2002 14 12 85,7% 228 114 50,0% 177 126 71,2% 149 116 77,9% 0 0 568 368 64,8% 2003 22 14 63,6% 193 97 50,3% 258 195 75,6% 91 74 81,3% 0 0 564 380 67,4% 2004 78 66 84,6% 205 114 55,6% 247 180 72,9% 93 73 78,5% 0 0 623 433 69,5% 2005 194 153 78,9% 220 119 54,1% 269 193 71,7% 143 116 81,1% 0 0 826 581 70,3% 2006 318 262 82,4% 217 128 59,0% 246 193 78,5% 173 137 79,2% 0 0 954 720 75,5% 2007 313 281 89,8% 180 122 67,8% 284 228 80,3% 163 128 78,5% 2 2 100% 942 761 80,8% 2008 253 210 83,0% 226 150 66,4% 233 200 85,8% 184 143 77,7% 555 460 82,9% 1451 1163 80,2% 2009 310 275 88,7% 242 151 62,4% 243 219 90,1% 204 172 84,3% 691 578 83,6% 1690 1395 82,5% 2010 237 206 86,9% 273 174 63,7% 211 184 87,2% 142 108 76,1% 669 563 84,2% 1532 1235 80,6% 2011 18 14 77,8% 0 0 0 0 0 0 0 0 18 14 77,8% Total 1757 1493 85,0% 2299 1336 58,1% 2538 1988 78,3% 1618 1267 78,3% 1917 1603 83,6% 10129 7687 75,9%
Medical record linkage performance We used deterministic record linkage between ART and PRN data on exact matches for the mother’s date of birth and four digit zip code. We allowed a difference up to two weeks for the child’s date of birth when available and when this date was unavailable in the ART data we allowed PRN records to link when the date of birth was between 18 and 42 weeks after the ART procedure date. Based on these linking variables, in total 75.9% of all ART treatments have been linked to a record from the PRN registry. This percentage varied between the different fertility clinics from 58.1% to 85.0%. This means in one clinic only 58.1% of the ART treatments could be linked to the PRN registry. The linkage percentage is better in more recent years. Table 4 shows an overview of all available ART data between 1999 and 2011, how many of these treatments are linked to at least one record from the PRN registry and what the percentage was for each year and clinic. Validation To study whether a possible selection bias was introduced between the linked and non‐linked ART records, we looked at three variables registered in the ART data: maternal age, paternal age and the social economic status (based on the zip code). Furthermore, the percentages of the different types of ART treatments were compared between linked and non‐linked records. We performed student t‐test on the mean maternal and paternal age between linked and non‐linked record. No significant difference was found for the maternal age (p=0.11) and paternal age (p=0.43). Table 3‐5: difference in maternal and paternal age in the linked ART and non‐linked ART records. Linked(n=7687) Non Linked(n=2443)
n mean SD n mean SD p‐value
Maternal age 7687 33.8 4.2 2443 33.7 4.1 0.11 Paternal age 5014 37.1 5.6 1881 36.8 5.5 0.43 With chi‐square analysis between linked and non‐linked ART records, we found a significant difference in the percentage of patients with a low social economic status (<25th percentile), there were 5.4% more patients with low social economic status in the non‐linked records than the linked records. Furthermore, fertility clinics performed different treatments and we found a significant difference in the percentage of patients undergoing in vitro fertilization and frozen embryo transfer. In the non‐linked records there were 2.5% more patients undergoing IVF and 1.1% more patients undergoing frozen embryo transfer than in the linked records. We did not find a significant difference in patients undergoing intra cytoplasmic sperm injection.
Table 3‐6: difference in low social economic score and patients undergoing in vitro fertilization, intra cytoplasmic sperm injection or frozen embryo transfer in the linked ART and non‐linked ART records. Linked(n=7687) Non Linked(n=2443) n % n % p‐value Low SES (<25th percentile) 1805 23.5% 678 28.1% <0.01 IVF 3096 40.3% 1046 42.8% 0.03 ICSI 3235 42.1% 1056 43.2% 0.32 frozen embryo transfer 1052 13.7% 275 11.3% <0.01 Ties in the linked ART and PRN dataset After medical record linkage, there were 77 ties found. 61 of these ties involved PRN records which belonged to a singleton pregnancy and 16 ties belonged to pregnancies with multiple children. To investigate whether some records belonged to the same children, but were not linked with the creation of the PRN registry by probabilistic linkage of LVR1, LVR2 and LNR, the ties were compared on the child’s date of birth, gender, birth weight, pregnancy status (singleton, twin, triplet), child code and whether all the information in the PRN record belonged to one registry (LVR1 or LVR2). One tie was found in which all variables exactly matched and one pair originated from LVR1 and one pair from LVR2. This made it likely that these two records belonged to the same child.
3.4 Discussion
Main results The results show it is possible to a large extent to reliably link data from fertility clinics to the PRN registry based on three variables. From the 10,129 ART treatments which were used in the medical records linkage process, 7,687 ART treatments were linked to at least 1 record in the PRN registry (75.9%). The linkage performance differed between the fertility clinics, the range was between 58.1% and 85.0%. Furthermore, the linkage performance increased in the more recent years. The ART records which were linked were linked were associated with a lower percentage of low social economic status and contained less IVF treatments and more frozen embryo transfers compared to the records not linked to the PRN registry. Barriers Data collection of the ART data was complicated. Due to privacy procedures, data availability and corruption of data, only workable data from five fertility clinics was available at the end of May, 2015. Furthermore, 4,516 records from one fertility clinic was corrupted which was confirmed by that particular fertility clinic. Besides the problems with data collection, there were other barriers which complicated the linkage procedure. There was no proper documentation available on the data fields in the data provided by the fertility clinics. The absence of data dictionaries made it difficult to find which data fields contained comparable information. Furthermore the PRN registry only documented the manner in which the registry was created by probabilistic record linkage between LVR1, LVR2, LNR and LVRh in their linkagereports and their original registration forms. Additionally, some data fields from the ART data appeared to have been free text field and contain invalid values for that particular variable or many values which all had the same meaning. In the future it would be beneficial for the quality of the ART data in the linked data, to limit the type of data physicians can enter in a particular field. Strengths and limitations The strengths of this study were that this was the first attempt to link large datasets from fertility clinics in the Netherlands to the PRN registry. Furthermore, we were able to use variables which had a high discriminating power. From the 170,401 pregnancies registered in 2007, only 3,778 pregnancies had the same mother’s date of birth and four digit zip code. Our study was limited, due to the missing values in the linkage variables for a number of records, because we only had a low amount of common identifying variables we used deterministic record linkage, but this method cannot account for the missing values in the linkage key, unless the n‐1 match method is used which would increase the number of ties and false links dramatically.(6) Another limitation of our study is the time difference between data from fertility clinics and data registered in the PRN registry. While the mother’s date of birth would not change, the zip code can change during this timeframe if the patient moves to a different location. The PRN registry contains information on almost all pregnancies and children born in the Netherlands since 1999. However we do not know if the difference in time explains the 24.1% of the ART treatments we could not link to the PRN registry, or other factor such as errors in the linking variables or termination of the pregnancy before the 18th week influence this as well. Besides the 24.1% false non‐links we encountered, we found 77 ties. However we did not perform a full validation study and therefore the amount of false links could be higher. Implications for clinical research Our study shows, that it is possible to link the data from fertility clinics to the PRN registry on just three partially identifying variables, but the medical record linkage procedure does introduce a selection bias. Not all ART treatments were linked to the PRN registry and the records which were linked were significantly different in the occurrence of low social economic status and the amount of in vitro fertilization and frozen embryo transfers performed compared to records which were not linked to the PRN registry. When the linked database is used in clinical research, it is important to realize that the population in the database differs on these parameters from the true population. Future research Validation of the linked database could be performed to get a better grasp on the reasons we encountered false non‐links. This could be done by verifying the assumption that people moved location between ART treatment and delivery by linking the ART data to the civil registry. In this registry the earlier zip code is registered in combination with the current zip code. Furthermore, the manner in which the ties were solved could be validated by contacting the fertility clinics and the hospitals/midwives practices to get additional information on the patient and mother, such as a name. However to perform such a validation, permission would have to be asked of the fertility clinics and their privacy officers, because the linked data was supposed to be anonymous.
Chapter 4: Medical record linkage between the Netherlands Cancer registry and
the PRN registry
4.1 Introduction Breast cancer is the most common type of cancer diagnosed in women in the Netherlands(15), but it is not known how many women in the Netherlands become pregnant after they have been diagnosed with breast cancer. This knowledge is important to properly inform breast cancer patients when they think about having children and can be used by clinicians to give a better prognosis to patients during intake. Oncologists often discourage pregnancy after breast cancer, because of high recurrence rates in the first years after diagnosis and elevated estrogen during pregnancy. Observational studies have shown that pregnant women after breast cancer have a good prognosis, but researchers think these results are biased and healthy mothers are more likely to give birth.(23) In the Netherlands, there currently is no national database/registry which contains information on both breast cancer diagnosis and pregnancy/ perinatal outcomes. However there are 2 separate registries in the Netherlands which contain the necessary information. The Netherlands Cancer Registry (NKR) is a national registry which registers diagnosis, tumor morphology, treatment and follow‐up of all Dutch cancer patients since 1989(18) and is managed by the Netherlands Comprehensive Cancer Organization(IKNL). The Netherlands Perinatal Registry (PRN) is a national registry which registers pregnancy, delivery and perinatal outcomes of almost all children born in the Netherlands since 1999. (2) The department of Obstetrics and Gynecology at the Academic Medical Center (AMC), the Netherlands Comprehensive Cancer Organization and PRN want to create a new database in which the women diagnosed with breast cancer are linked to perinatal outcomes of the children born from these women. There is no common uniquely identifying variable between these registries on which data from these registries could be linked. A pilot study was conducted to evaluate if data from NKR and PRN could be linked on common partially identifying variables using medical record linkage, because it was expected only a small number of linking variables could be used. This study answers the following research question: To what extent is it possible to link data from the national cancer registration to the Netherlands Perinatal Registry using medical record linkage? 4.2 Methods In a meeting with both IKNL and PRN the project was discussed and a formal application to request the use of the data was send to both parties, after which privacy officers of both organizations gave their approval for this project. All women diagnosed with breast cancer before the age of 45, born between 1959 and 1995 and registered in the NKR registry were included. For these women, only the first diagnosis of breast cancer was included, subsequent entries in the NKR registry were excluded.From the PRN registry singletons and the first child of each multiple birth born between 1999 and 2013 was included. Common partially identifying variables were identified and these linking variables were selected for the data from the PRN registry. Furthermore a record identification number linking to the full records in the PRN registry as well as four additional variables which could be used in a validation study were added. These variables were: parity, congenital disorders, gestational age and the hospital or midwives practice which provided the perinatal outcomes to the PRN registry. Medical record linkage was performed at one of the two registries. The included PRN records with the selected variables were brought to IKNL, because data from the NKR registry was not allowed to leave the premises. The included records from NKR and PRN were deterministically linked on the partially identifying variables and were blocked on the mother’s date of birth. Results were evaluated by calculating the frequency in which perinatal records linked to multiple women diagnosed with breast cancer and calculating the frequency in which multiple perinatal records linked to the same woman in the same year. After linkage the linked database will be analyzed and after permission validated. The medical record linkage procedure and the analysis were performed in SAS 9.3 4.3 Results Permission to start the project was received in January 2015. 26,105 women diagnosed with breast cancer and registered in the NKR registry were included. 2,718,861children were born between 1999 and 2013 and registered in the PRN registry. These children were born from 2,668,584 pregnancies and one perinatal record for each of these pregnancies was included. Partially identifying variables for records in the NKR registry were: name and initials, full zip code of the women, date of birth of the women and place of birth. Partially identifying variables for records in the PRN registry were: mother’s date of birth, first four digits of the zip code and ethnicity of the mother. Common partially identifying variables were the mother’s and woman’s date of birth and the first four digits of the zip code. Table 1 shows the frequency in which the 26,105 women with NKR records linked to PRN records. From these women diagnosed with breast cancer, 9508 (36%) were deterministically linked to 15,596 perinatal records by using exact matches on these two common partially identifying variables.