Using survival data in gene mapping : using survival data in genetic linkage and family-based association analysis

(1)

Using survival data in gene mapping : using survival data in genetic linkage and family-based association analysis

Callegaro, A.

Citation

Callegaro, A. (2010, June 17). Using survival data in gene mapping : using survival data in genetic linkage and family-based association analysis. Retrieved from

https://hdl.handle.net/1887/15696

Version: Corrected Publisher’s Version

License: Licence agreement concerning inclusion of doctoral thesis in the Institutional Repository of the University of Leiden

Downloaded from: https://hdl.handle.net/1887/15696

Note: To cite this publication please use the final published version (if applicable).

(2)

C ^HAPTER 8 Summary

The largest part of this thesis (Chapter 2-5) is devoted to newly developed sta- tistical methods for age at onset linkage analysis. We used frailty models in which random effects were introduced to model the dependence between out- comes of relatives due to sharing of marker alleles Identical By Descent (IBD).

From the retrospective likelihood of the marker data conditional on the phenotypes, we derived score tests for genetic linkage analysis. The score statistics appear to be classical Non-Parametric Linkage (NPL) statistics (Kruglyak et al., 1996) weighted by functions of the age at onset (or age at censoring) of the family members. These tests are based on allele-sharing, they can be applied to families ascertained through their phenotypes, and they do not require specifi- cation of genetic models or penetrance functions. Further, they can incorporate both affected and unaffected family members. In fact, the age at disease onset of the affecteds and the age at censoring of the unaffecteds are considered by this approach. Finally, with respect to the likelihood-ratio tests proposed in the literature (Commenges, 1994; Jonker et al., 2009; Li and Zhong, 2002;

Pankratz et al., 2005) the derived score tests are computationally faster, locally most powerful, and robust. For all these reasons, the proposed weighted NPL statistics provide a practical solution for mapping genes for complex diseases with variable age at onset. A collection of compiled C++ programs (Arthur package) which implements the proposed NPL statistics is available from our web site (http://www.msbi.nl/Genetics). Arthur package uses Merlin (Abeca- sis et al., 2002) to compute the mean proportion of alleles shared IBD and the corresponding variance. We used these age at onset NPL methods to analyze linkage data from breast cancer families (Chapter 2 and Chapter 3), life-span (Chapter 3), the time to the first of three events: hard coronary heart disease, diabetes, or death from any cause (Chapter 4) and to study human longevity (Chapter 5). As illustration, on breast cancer families without any mutations in BRCA1 and BRCA2 (Oldenburg et al., 2008), using the the age-at-onset information increased evidence for linkage at chromosome 9 around 82 cM.

In the second part of the thesis we derived a new class of allele-sharing statistics to take into account the phenotype of ungenotyped family-members 103

(3)

Chapter 8. Summary

(family history). More specifically, we used the family-history to optimize the weight given to the mean proportions of alleles shared IBD among relatives with known genotypes. We analyzed the symptomatic osteoarthritis GARP study (Meulenbelt et al., 2008) where, taking into account the family-history, the LOD-score in the surrounding of a known susceptibility locus (DIO2) increased from 3 to 3.6. Further, adjusting for family-history moved the maximum of the LOD-scores closer to the location of DIO2.

In the third and last part of the thesis we derived a new score statistic for family-based association analysis. The score statistic is a classical Family- Based Association Test (FBAT) statistic (Lake et al., 2000; Rabinowitz and Laird, 2000) with a new, simple and flexible weight function. To increase power to detect association we adjusted the statistic for the number of alleles shared IBD between relatives and for the gene-covariate interaction. We analyzed the North American Rheumatoid Arthritis Consortium study (NARAC) data from GAW15 (Witte et al., 2007). Adjusting for the interaction with smoking and anti-CCP increased the significance of the association with the DR locus.

Different methods that deal with heterogeneity of data in nonparametric linkage analysis and in family-based association analysis have been studied in this thesis. All the proposed methods try to reduce the heterogeneity by weight- ing individuals according to their risk profiles. Specifically, we derived new weights to properly use age-at-onset information, family-history, genetic and environmental factors. In order to make the proposed methods available for the scientific community we developed a free and easy to use software which is described in the appendix.

104

Using survival data in gene mapping : using survival data in genetic linkage and family-based association analysis

Using survival data in gene mapping : using survival data in genetic linkage and family-based association analysis

C HAPTER 8 Summary

C ^HAPTER 8 Summary