• No results found

Novel thyroid specific transcripts identified by SAGE: implication for congenital hypothyroidism - CHAPTER 2 Cloning of tissue-specific genes using serial analysis of gene expression and a novel computational substraction

N/A
N/A
Protected

Academic year: 2021

Share "Novel thyroid specific transcripts identified by SAGE: implication for congenital hypothyroidism - CHAPTER 2 Cloning of tissue-specific genes using serial analysis of gene expression and a novel computational substraction "

Copied!
21
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl)

UvA-DARE (Digital Academic Repository)

Novel thyroid specific transcripts identified by SAGE: implication for congenital

hypothyroidism

Moreno Navarro, J.C.

Publication date

2003

Link to publication

Citation for published version (APA):

Moreno Navarro, J. C. (2003). Novel thyroid specific transcripts identified by SAGE:

implication for congenital hypothyroidism.

General rights

It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s)

and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open

content license (like Creative Commons).

Disclaimer/Complaints regulations

If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please

let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material

inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter

to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You

will be contacted as soon as possible.

(2)

CHAPTERR 2

Cloningg of tissue-specific genes using serial analysis of

genee expression and a novel computational substraction

approach. .

J o s éé C. M o r e n o 1. E r w i n P a u w s '. A n t o i n e H. C. v a n K a m p e n 2.

M a r c e l aa J e d l i c k o v a 1. J a n J. M. de V i j l d e r1 a n d C a r o l y n R i s - S t a l p e r s

LaboratoryLaboratory of Pediatric Endocrinology. Academic Medical Center UniversityUniversity of Amsterdam. The Netherlands.

'' Laboratory of Bioinformatics. Academic Medical Center.

UniversityUniversity of Amsterdam. The Netherlands.

(3)
(4)

C\on\r:.;C\on\r:.; c' \ bSje-^pficnc qenes j s m g SAGE anc; TPE

Abstract t

AA paradigm of molecular medicine is the identification of functionally specialised geness in the search of defects responsible for human diseases. To identify novel geness relevant for thyroid physiology, we applied Serial Analysis of Gene Expressionn (SAGE) and identified 4.260 tag sequences that did not match any knownn gene present in the GenBank ("no-match" tags). These "no-match" tags representt still uncharactensed transcripts. Most of them are expected to correspond too housekeeping genes and only a reduced number to genes with a tissue-restricted patternn of expression. In order to pinpoint best candidates for tissue-specificity out off a large series of tags, a computer-based approach was used. We compared the relativee abundance of 80 "no-match" tags in our thyroid SAGE library to the expressionn level in 14 other SAGE libraries from 9 different human tissues. Based onn the expression data, an algorithm named TPE (Tissue Preferential Expression) wass developed to discriminate tags with specific expression in the thyroid. Four tags weree then selected as preferentially expressed in the tissue of interest. Results weree validated by RT-PCR and northern blotting on multiple tissue RNA samples. Finally,, the screening of a thyroid cDNA library with EST-sequences related to the selectedd tags allowed the isolation of 4 novel thyroid-specific cDNAs. We demonstratee that the computational substraction of SAGE tags by the proposed TPEE algorithm is a rapid and reliable way to expedite the cloning of tissue-specific geness through the combined use of SAGE and EST databases.

(5)

Introduction n

Off the approximately 15.000 distmc: transcripts present in a cell only 1-2 , diffe-betweenn cells of ciosely related ongins. If cells of more disparate origins are compared,, the percentage o( differentially expressed transcripts may rise to 3-5 'Jo [1].. This indicates that the great majority of expressed genes are in charge of functionss shared by all cell types and that only a small subset of genes is involved in cell-typee specific functions. Defects n widely expressed genes can be expected to resultt in unviable fetuses or corolex c!-mca: phenotypes. Defects in tissue-restrictedd genes have been found in a series of weü-recognized phenotypes of humann disease, usually involving only one organ. In order to unravel the molecular basiss of diseases for which still no responsible gene has been identified, it is essentiall to recognize the importart subset of genes whose expression is specific to eachh celi type. A recent analysis on human transcriptomes reveals that the great majorityy of these tissue-specific genes remains unknown [2], With respect to thyroid physiologyy a^d pathophysiology, rr jltioie relevant genes have been identified in the pastt decade. There are however several forms of congenital thyroid disease for whichh the corresponding gene has net been identified [3-5]

Thee Serial Analysis of Gene Expression (SAGE) is a powerful method that allows thee parallel and detailed analysis of all transcripts expressed within a tissue or cell type.. A 10-bp seguence tag locatec at its 3' end characterizes each transcript [6]. Abundancee of transcripts is determined as the number of times that the correspondingg tag is scored within h e library, providing a quantitative expression profile.. The SAGE technique has heen successfully used to gain msight about the pathogenesiss of different diseases through the comparison of gene expression profiless between norma! and pathological or stimulated tissues [7-14] Although manyy SAGE libraries have been r a d e . no novel gene has been cloned yet based onn the SAGE technique. The main difficulty seems to be the handling of a huge amountt of information, since he selection of "no-match" tags {tags not correspondingg to known genes in GenBank and putatively representing novel genes)) has to be made out of thousands of tags contained in most SAGE libraries. Furthermore,, this selection cannot be made solely on the basis of absolute abundance,, since differentially expressed transcripts can show high, moderate or loww ieveis of expression [1].

Pursuingg our goal to identify novel thyroid-specific genos putatively involved in pathogenesiss we prepared a SAGE library of normal human thyroid tissue. This libraryy consists of a total number of 10.994 tags, representing 6.099 unique mRNA speciess and containing in total 4.260 "no-match" tags [15]. Expression data of 80 of thesee tags in 14 other SAGE libraries were used in the selection of putative

(6)

tissue-specificc tags by a mathematical algorithm. In this paper we demonstrate a rapid ana easyy way to pinpoint a subset of "no-match" tags of interest, which can be further pursuedd by cloning tissue-specific genes using only one tissue-specific SAGE libraryy and public SAGE and EST databases.

R e s u l t s s

Inn 15 SAGE libraries from 10 normal human tissues, the absolute abundance was determinedd of 3 tags corresponding to thyroid-specific genes. 3 tags representing housekeepingg genes and 80 "no-match" tags generated from a thyroid SAGE library.. A total number of 762.188 SAGE tags were analysed from 2 public databasess using either the SAGEmap program [16] or direct count of abundance. Whilee thyroid-specific genes (TG. TPO and TITF-1) scored abundantly in the thyroid libraryy and were practically absent in the rest of libraries, housekeeping genes (GAPD.. B2M and MRPL27) showed expression in ail tissues and "no-match" tags presentedd with a wide range of intermediate expression. Table 1 shows the abundancee of the 3 thyroid-specific tags, the 3 housekeeping tags and a representativee group of "no-match" tags (7 out of the 80 analysed) covering the wholee range of abundance.

Too pinpoint tags putatively corresponding to novel thyroid-specific genes, we assignedd a tissue preferential expression (TPE) value to each tag (Table 1). The TPEE value is based on the number of tissues in which a tag is present (range of expression)) and its expression level in the tissue of interest compared to the other tissuess (preferential abundance). Scores for these two parameters (range of expressionn and preferential abundance) for each individual tag were calculated (see Methods)) and plotted against each other. TPE values were then achieved as the Euclideann distance between each dot representing a SAGE tag and the centre of thee cluster of housekeeping genes (Fig, 1).

Thyroid-specificc tags showed TPE values around 10 and TPE values for "no-match" tagss ranged from 9.45 (NM52) to 0.30 (NM65). To determine which TPE level might representt a useful threshold for (strong) tissue-specificity, TPE values were also calculatedd for 15 tags corresponding to well-recognised muscle-specific genes with expressionn levels between 0.3 and 13.5%u in 2 muscle libraries, an even broader rangee of expression than the one for our thyroidal tags (0.45-9.5 %o). Despite deep differencess in expression levels, all muscle-specific tags appeared to have TPE valuess over 7 (Fig. 1A). Based on these data. TPE levels > 7 were considered indicativee for tissue-specificity. Application of the same TPE threshold to the 80 thyroidall "no-match" tags led to the isolation of four (NM41. 52, 56 and 79),

(7)

Chapterr 2 tf> tf> CD D 3 3 TJ J

:* *

UJ J a. . J2 2 0 0 Q Q C C Ü Ü 0 0 (/] ] O) ) TO TO JC C o o (tl l c c O O c c UJ J a a < j j sL. . r r at t O l l a a c c w w CD D T33 O ' SS ">

SS S

55 to

== H

00 2 CDD C

ÈÈ ^

pp c uu o \r>\r> Q .

>> S

55 £

'SS o

SS ">

11

c

*-^^ VI UJ J

^^ B:

|22 i

ii g' £ ^ ^^ E Ï < -ZL-ZL Z 2 Z 2 UJ J CL L K K

« «

Qj j C C <D D Ol l U) U) : : :r r X X X X X X L ^ ^ T. . X X X X ^ ^ X X o o j --r --r -^ ^ -- - C X C_ ? "" ~ ï; S -11 1 ? „ c = xx '^ 7 XX ;> X << . LHH LJ r .£ -^ ir r r — — «~ ~ l —' ' x x x x ,--u; ; --.— — ^ ^ X -- c--<z c--<z p, , ^^ £ g J a; £ CJJ Z £. -r -LUU -z % — aa ^ QQ = = 3 == < ~

(8)

Cloningg of tissue-specific genes using SAGE and TPE

lusclee and housekeeping tags

5 5 aa 4 = = SS 2

--0 --0 -2 2 MYH7(11.6) ) v . : . .. • • MYH22 (11.5) MYL22 1115) CCP.44 110.8) CKMM (10.7) NEBB (10.7) AMPD11 110.7) MBB 110.5) ,, TNNC1 (9.5) DESS (8 2j ACTN22 (9.8) S L N ( 7 9* GAPD D MRPL27 7 NM655 * B2MM * NM63 3 F i g I A A Rangee of expression

Thyroidd and housekeeping tags

TPOO (10.9)< TGG (10.4) NM522 (9.4) 8 8 TIF1TIF1 (9.3) NM411 8.4) NM79I7 5) NM566 (7 21 ' : ' . •• • i NM655 X GAPDD ( B2M M

f.'HP.?--Fig.. 18 Rangee of expression

Fig.. 1 Graphical mapping of tags upon their preferential abundance and range of

expressionexpression in 10 different human tissues. Tissue preferential expression (TPE) values,

calculatedcalculated as the individual distance from each tag to a cluster of housekeeping genes ( • ). areare given in brackets. The housekeeping cluster includes 5 genes a expressed in all tissues underunder consideration. (A) Fifteen tags corresponding to muscle-specific genes ( ) show TPETPE values higher than 7 (ACTN2: actinin. alpha 2: AMPD1: adenosine monophosphate deaminasedeaminase 1. isoform M: DES: desmin: NEB: nebulin: MYH2: myosin, heavy polypeptide 2: MYH7:MYH7: myosin, heavy polypeptide 7: MYL2: myosin, light polypeptide 2: MYL3: myosin, light polypeptidepolypeptide 3: CCR4: chemokine receptor 4: CKM: creatine kinase, muscle: MB: myoglobin: SLN:SLN: sarcolipin: TNNC1: troponin C. slow: TNNC2: troponin C2. fast: TPM2: tropomyosin 2: GAPD:GAPD: glyceraldehyde-3-phosphate dehydrogenase: B2M: beta-2-microglobulin: MRPL27: mitochondrialmitochondrial ribosomal protein L27: NM63 and NM65: genes corresponding to "no-match" tagstags nos. 63 and 65 in TH4. respectively). (B) Mapping and TPE values of 3 thyroid-specific

tagstags ( ) . 5 putatively thyroid-specific "no-match" tags with TPE > 7 ( o) and 1 tag with TPE <7<7 (TG: thyroglobulin: TPO: thyroid peroxidase: TITF1: thyroid transcription factor 1: NM41. 52.52. 56. 79. 83: genes corresponding to the "no-match" tags nos. 41. 52. 56. 79 and 83 in our TH4TH4 SAGE library).

(9)

representingg 5C:. of the original croup that were then selected for further downstreamm analysis. NM83. a tag *vth a TPE value of 5.8. was included as a controll for the degree of tissue-specihuty underlined by the TPE threshold of 7 (Fig. 1B). .

Followingg this approach, 5 tags with a putative thyroid preferential expression were analysedd for tissue-specificity, as determined by RT-PCR on 9 different tissues includingg the thyroid. Northern blotting and the screening of a thyroid cDNA library weree also performed to investigate tissue distribution of these novel genes and to isolatee their corresponding cDNAs. Since the use of short SAGE tags (10 bp) in downstreamm applications such as PCR and hybridization studies is technically difficult,, these tags were used to screen the human EST databases in the search for thee corresponding EST sequences (see Methods). Table 2 shows the EST sequencess attributed to each 'no-match" tag. as well as the tissue of origin ot the selectedd ESTs. These EST sequences were then used to prepare standard PCR primerss to perform RT-PCR and to octain thyroid cDNA probes for northern blotting andd 'tie screenmg of the library.

AA multiple tissue cDNA panel was first used to evaluate tissue-speciftcity of the "no-match"" tags nos. 4 1 . 52. 56 and 79 together with NM83. As a control for the non-saturatingg conditions of the RT-PCR. GAPD was also amplified from corresponding mastermixess and all reactions were stopped within the logarithmic phase of amplificationn (Fig. 2). All four selected tags showed high expression in the thyroid andd low or no expression in othe-" issues. NM41 showed preferential expression in thyroidd with residua! expression k cney and lung. NM56 showed expression mainly inn thyroid and much less in placenta. NM52 is expressed in thyroid, with minimal expressionn in lung, heart and liver NM79 is mainly expressed in thyroid, and to a lowerr extent in kidney and heart As expected. NM83 showed some degree of preferentiall expression in t h y o i d . but had a iow overall expression in the rest of tissues,, confirming that the lower t.ne TPE value is for a tag. the higher the chances forr the corresponding gene to have a housekeeping-like pattern of expression. For furtherr verification of tissue-specdicity of the selected tags, we performed northern blott analysis using the corresponding ESTs as probes. Fig. 3 shows a thyroid-specificc band of around 6 kb corresponding to NM56. obtained after hybridization w'thh a probe derived from the EST clone W60005 (IMAGE clone 338643).

(10)

CC rr-ra of :iss_;e-i:..ecf;:; g e r o s J ü r g SAGb ar-;-; " Pi1-.

Finally,, the PCR probes for NM41. NM56 and NM83 were used to screen a cDNA libraryy prepared from the thyroid tissue originally used to make our TH4 SAGE library.. For each probe 40.000 plaques were screened resulting in 9 positive clones forr NM-41, 7 positives for NM-56 and 5 for NM83. All cDNA clones were sequenced andd contained the SAGE tag at the expected position, flanking the most 3' CATG sequence.. Table 2 shows the Genebank sequences related to insert sequences of positivee clones. A 3 kb clone for NM56 gave a BLAST hit with "NADPH-binding sites1'' of different oxidases from pig (heavy chain subunit). mouse (gp91 ph ox) and humann (mitogenic oxidase). Also a perfect match with a human genomic clone on chromosomee 15 was found. BLAST hits for human clones in chromosomes 16 and 1p355 were respectively found for a 1.6 kb-insert clone corresponding to NM-41 and forr a 2 kb-insert clone from NM83. but without significant homologies to known genes. .

Tablee 2 Linkage of "no-match" tags with Expressed Sequence Tags (EST) and GenBank

sequencessequences (BLAST hits) showing homology with positive clones obtained after the screening ofof a thyroid cDNA library with the respective ESTs.

No-Match h Tag g 41 1 52 2 56 6 79 9 83 3 Tag g sequence e ccagctgcct t (tgggatgta a ctgttgtgtg g ggaatgeetc c oagtgaaaaa a ESTT clone (Ace.. no.} AI375154 4 AA632629 9 W60005 5 AI446209 9 AI023948 8 Originn of EST libraries s lung g thyroid d pancreas s stomach h parathyroid d tumour r BLASTT hits (Ace.. no.)

humann chromosome 16 clone 165E7 (AC0070111 j

-- mouse NADPH-depencent oxicase (MMU43384) )

-- pig NADPH-dependent oxidase (SSU02476! !

-- human NADPH-dependent oxidase (AF127763) )

-- human chromosome 15 clone (AC009700) )

humann chromosome 1 p35 c.lo"e 4 6 2 0 2 3 i H S 4 6 2 0 2 3 ; ;

(11)

Chapterr 2 3444 bp

-\--vO O X> >

v v

^^ vfc

<><> nS- v~ o » , j ^ ,e> .C?* - ^ _o> . O

\ \

^ V ** <^ sf s?

&

ww \^' «<*> _,^

<? ip o

&

^

GAPD D

NM-41 1

NM-56 6

3444 bp

NM-79 9

2988 bp

GAPD D

29Sbp p 2200 bp

NM-83 3

Fig.. 2 RT-PCR of 5 "no-match" genes and GAPD on a panel of 9 human tissues.

FourFour selected genes representing tags with TPE values > 7 (NM41. NM56. NM52 and NM79) showshow a thyroid-restricted pattern of expression. NM83 gene (TPE-5.8) has a certain preferentialpreferential expression in the thyroid, but an overall (housekeeping-like) presence in all tissuestissues tested. Amplification was performed in non-saturating conditions (PCR stopped within thethe exponential phase), reached at 24 cycles for NM41. NM56. NM52. NM79 and GAPD genesgenes and at 28 cycles for NM83. First lane contains size marker. Control lane contains a mixturemixture of all tissues present in the cDNA panel.

(12)

Cloningg of tissue-specific genes using SAGE and TPE

Paa Am Th Ac Te Tm Si St

7.55

kb —

4.44

kb —

2.44

kb —

Fig.. 3 Northern blot of "no-match" 56 on a panel of 8 endocrinological tissues (Pa:

pancreas.pancreas. Am: adrenal medulla. Th: thyroid. Ac: adrenal cortex. Te: testis. Tm: thymus. Si: smallsmall intestine. St: stomach). Each lane contains 2 pg of PolyA+ RNA from the corresponding tissuetissue (Clontech). Probe was a 316 bp PCR product amplified from thyroid cDNA using specificspecific primers designed from the EST clone W60005 sequence.

Duringg the completion of this study, the cloning of a novel thyroid-specific gene namedd ThOX2 (for thyroid oxidase) was reported [17]. The 3' end of the published mRNAA sequence of ThOX2 (Ace. no. AF230496) is identical to the 3 kb partial clone thatt we obtained in the screening of our cDNA library, including the complete sequencee of the W60005 EST clone and the location of our SAGE tag at the expectedd site. The mapping of ThOX2 to human chomosome 15q15 further colaboratess our BLAST results from the NM56 positive clone.

(13)

Discussion n

Whenn the characterization of the complete expression profile of cells became possiblee through the serial analysis of gene expresion. the identification of the subsett of genes responsible for t.ssue-specificity turned a major step in the comprehensivee analysis of SAGE data. Classically, tissue-specificity has been approachedd in terms of comparative abundance of transcripts among tissues. Three categoriess have been proposed for tissue-specific genes upon their moderate (>2 fold),, strong (>5.i or very strong i>10) preferential expression In a given cell type withh respect of a range of others [18]. Important physiological roles should be expectedd for most of these genes as illustrated recently by the selection of differentiallyy expressed seguences L.sing cDNA microarrays [19]

Followingg a fully-computational strategy based on the SAGE technique, we identifiedd 4 novel thyroid-specific cDNAs. showing that the cloning of tissue-specific geness is attainable upon the construction of only one medium-sized SAGE library andd the combined use of SAGE am: EST databases Our approach addresses the unsolvedd problem how to select a i'mited number of SAGE tags to proceed with in thee search of novel tissue-specmu genes. We demonstrate that a computational algorithmm designated TPE. which considers the presence and preferential abundancee of tags in a limited number of SAGE libraries from normal human tissues,, delineates an useful threshold for tissue-specificity. Application of a TPE cut-offf level of 7 on 80 "no-rnatch" tags originated from a thyroid SAGE library resultedd in the selection of 4. whose patterns of expression were shown to be tissue-specific.. More stringent application of the TPE algorithm (cut-off levels > 7) mightt be used upon specific aims of researchers.

Thee computational selection of tissue-specific tags reported in this paper costs a limitedd amount of time and is relatively cheap. Reliability of the method was validatedd both by northern blot and RT-PCR. Furthermore, its accuracy has been recentlyy confirmed by the cloning of ThOX2. a very thyroid-specific gene whose I T I R N AA seguence includes the complete EST sequence coupled to one of the 4 tags thatt we predicted to be thyroid-specific.

Afterr the isolation of a reduced number of tissue-specific tags, we aimed the cloning off their corresponding cDNAs by coupling the tags to EST sequences that were furtherr used as probes for library screening. This linkage, performed using the SAGEmapp program, was proved trustable and convenient for our SAGE-based cloningg strategy. For all 4 selected tags, we could find at least one 3'-oriented EST meetingg the criteria for a correct tag-EST association. When no related ESTs are available.. 3 methods named RAST-PCR [20], SAGE-lite [21]. and GLGl [23] have beenn proposed to enlarge the 10bp seguenceof SAGE tags. Although the specificity

(14)

CC UHipy r:>\ tis^e-ss-yc fie a c r e s us^nc bAGE a r d DE

off these techniques remains to be established, they may however be useful when ann initial selection of tissue-specific tags has been made.

Thee tissue from which the tag-related EST was generated appeared to be the thyroidd in only one case (NM52). For a second tag (NM41). the finding of a related ESTT from lung corroborates the expression of the gene in that tissue, as we detectedd by RT-PCR. These results might indicate that, on the one hand; genes can

bee expressed and develop functional roles in more than one tissue, and on the otherr that the chance to find an EST-hit for a certain tissue depends directly on how oftenn certain tissue was used to construct EST libraries In our example, only 9 EST librariess from thyroid are currently available out of 306 present in EST databases. Thee fact that the tag NM56. corresponding to the ThOX2 gene, was correctly coupledd to an EST originated from pancreatic tissue supports this line of reasoning andd predicts the presence of an oxidase system in the pancreas.

AA theoretical problem of the computational substraction of tissue-specific tags by thee method proposed might arise from the statistical chance that 2 genes share the samee SAGE tag if a tissue-specific gene shares the tag with a housekeeping gene, thee tissue distribution of the latter might discard that tag as non-tissue-specific. This inconveniencee can be easily minimized determining the number of EST clusters that thee SAGEmap program couples to a certain tag. If more than 1 EST cluster is correctlyy linked to a tag, some caution is advised with respect to the real abundance off the tag in SAGE libraries, since it might result from the cumulative abundance of 22 or more genes.

TPEE values guide the biologist in the selection of tags with high chances to be specificc for a tissue. Discrimination power of TPE value increases along with the sizee of the SAGE libraries and the spectrum of tissues included in the study; and

obviouslyy decreases when low abundant tags are analysed. This method should not bee applied to tags with low scores in SAGE libraries, since estimations indicate that everyy unique gene might be represented on average by 1.6 tags in SAGE libraries [2].. In our study, we analyzed "no-match'1 tags present 5 to 18 times in our thyroid SAGEE library (expression levels > 0.45 %o) and, as stated, found in all cases one ESTT cluster coupled to each selected tag.

AA final consideration derived from our study regards the common way to quantitate relativee expression levels of genes among tissues, using a so- called "housekeeping"" gene to normalize data. As shown in Table 1. three classical housekeepingg genes (GAPD, B2M and MRPL27) showed over a 10-fold variation in expressionn among 15 SAGE libraries from 10 different normal tissues, supporting thee idea that quantitative expression of housekeeping genes is not only susceptible off variation by different factors [24], but also constitutively differs among tissues. For

(15)

Oac:err 2

thiss two reasons, direct comparison o* abso;ute abundance of genes as determined throughh the SAGE technique should currently be considered the procedure of preferencee to accurately address Üi'S question.

Inn summary, we developed a fast and fully-computational method to effectively pinpointt tissue-specific genes based on the presence and abundance of their correspondingg SAGE tags in available libraries. It represents a novel example of the usee of the SAGEmap program [161 applied to the study of tissue-specificity. The methodd was tested in muscle tissue and experimentally validated in the thyroid by RT-PCRR and northern blotting. Amplication of this approach on 80 no-match tags fromm a thyroid SAGE library lead to the cloning of 4 novel thyroid-specific cDNAs whosee role in thyroid (patho)phys o:cgy is now open for research. Accuracy of this computationall substraction of tissue-specific tags is meant to further increase when thee number of publicly available SAGE (and EST) libraries will cover more human tissuess and contain a larger amount of tags.

Materialss & Methods

ConstructionConstruction and analysis of a thyroid SAGE library

AA SAGE library (TH4) was constructed from normal human thyroid tissue obtained byy resection [15]. This library contained in total 4.260 "no-match" tags. In the presentt study we analyzed 80 'no-match' tags with moderate and high levels of expressionn (0.45-9.5 \••.-.)

SelectionSelection of putative tissue-specific tags

SAGEE libraries included in the study were chosen out of the 62 available in 2 public databasess (NCBI/'CGAP SAGEmap site at www ncbi.nlm.nih.gov/SAGE and the Rochesterr Muscle Database at www.gcrc.rochester.edu.'SWindex.html ) based on 2 criteria:: they were all made from normal human tissues (brain, breast, colon, skeletall muscle, vascular endothelium, cerebelum. fibroblasts and ovary) and containedd > 20.000 tags each. SAGE libraries made from cancer tissues were excludedd since carcinogenic processes can induce important changes in gene expression.. Detailed information about the SAGE libraries used in this study can be foundd at the respective websites.

DeterminationDetermination of Tissue Preferential Expression (TPE) values.

Absolutee abundance of tags was determined using the SAGEmap database for the librariess generated by the NCBI's Cancer Genome Anatomy Project, and by direct

(16)

üiOn.ryy of iissue-speci'ic genos using SAGE art: TPE

countt from the 2 muscle libraries from the Rochester database. "No-match" tags weree ordered by their tissue preferential expression (TPE) based on an algorithm thatt considers the range of expression and the preferential abundance of tags in a tissuee of interest. The range of expression was defined as the number of tissues in whichh a particular tag Is observed. The preferential abundance of a tag in a certain tissuee was determined from the relative expression ratios calculated as:

RtttiOjiVAiii)RtttiOjiVAiii) - l o g | ( 0 . 0 ( ) l - A A t a g j ) ) (0.001 - A ' / t a g , » ! wheree M denotes the tag count in the tissue of interest and /V, denotes the tag count inn another tissue/. We added 0.001 to each tag count to prevent division by zero or takingg the logarithm of zero. Since we used SAGE data obtained from 10 different tissues,, we obtained 9 relative expression ratios. Subsequently, we calculated the averagee of these ratios, which defines the preferential tag abundance of a given tag /.. When the average log (ratio) of preferential abundance is plotted against the numberr of times that each tag is observed, a single dot represents the expression featuress of each tag in a coordinate system

Inn order to define a measure for the Tissue Preferential Expression (TPE), the Euclideann distance was calculated between the dots corresponding to each thyroid tagg and the center of a cluster of well-recognized housekeeping genes. This algorithmm was applied to 15 tags corresponding to known muscle-specific genes withh a wide range of expression (0.3-13.5%o) to delineate a working TPE threshold forr tissue specificity. Since all muscle-specific tags showed TPE values > 7. thyroid "no-match"" tags showing the same degree of preferential expression were selected ass putative thyroid-specific tags and further analysed.

LinkageLinkage of selected no-match tags to human 3'-oriented ESTs.

Thee human EST database of the NCBI was screened with the 10-bp specific sequencee of each selected tag using the SAGEmap program [16]. A variable numberr of EST clusters is then obtained containing the tag sequence. EST clusters aree presented in groups (3'orientation. 5' orientation or unoriented). EST sequencess were selected when 3 basic criteria were met: to belong to a 3'-oriented clusterr with a clear polyA signal, to contain the complete 10 bp tag in the proper orientationn and that the tag is preceded by the most 3'-located CATG site within the ESTT sequence. When more than one sequence met these criteria, the longest EST and/orr the ones originated from thyroid tissue were prefered. When a selected tag wass found to correspond to 2 or more EST clusters, they were discarded because of

(17)

Chasterr 2

thee chance that its abundance wou.d represent the total counts from 2 or more geness that might coincidentally share the same tag.

ReverseReverse Transcriptase - Polymerase Chain Reaction (RT-PCR)

Firstt strand cDNA was synthesized using 3 ag of mRNA from normal thyroid tissue andd an oligo (dT)-primer according to standard protocols. A Multiple Tissue cDNA Panell (Clontech) including 8 different tissues was also used as template. PCR amplificationn was performed using 0.5 ng of each cDNA and 2.5 units of AmpliTaq DNAA polymerase (Perkin-Elmer) in a total reaction volume of 25 ul.

Primerr pairs of 20-25 nucleotides length were designed based on the previously selectedd EST sequences in order to amplify PCR products longer than 200 bp: NM41frw5'-acaatttccagatggctgctcctc-3'.. NM41rev:5"-tgcctactcagggcttccaagat-3': NM52frw:5'-caatactttaggaggccggggc-3'.. NM52rev:5 -cctgcaactctcctgtggcaat-3': NM56frw:5'-taccaccaccccaggtcaaagaca-3'.. NM56rev:5-gaaacaacccaaacgtccatcaac-3':: NM79frw:5'-actctgcaattggtccccggag-3'.

NM79rev:5"-atgagaggcattcccatgaaatatc-3':: NM83frw:5"-gaacgctgggtaaagaagggaga-3'. NM83rev:: 5"-cgcacagaatatattctaaggtgac-3' To compare relative abundance of

transcriptss among tissues in the panel. GAPD cDNA was amplified from the same PCRR mastermix using specific primers: GAPDfrw:5'-ctgagaacgggaagcttgtc-3 . GAPDrev:: 5'-gtgctaagcagttggtggt -3 . Amplification was stopped at the exponential phasee of the reaction: 3 mm 94 C: 24 (GAPD. NM41.52.56 and 79) or 28 (NM83) cycless of 1 mm 94 : C 1 mm 55 "C (GAPD) or 60 C (NM primers). 1 mm 72 C: 10 mmm 72 C. Reactions were electrophoresed on 2% agarose gel and visualized with thee Eagle Eye II system (Stratagene).

ScreeningScreening of a thyroid cDNA library

AA 3'-directional ZAP Express cDNA library was constructed from the same human thyroidd sample used for the construction of our TH4 SAGE library (Stratagene). Probess consisted of 240-325 bp PCR products, amplified from human thyroid cDNA usingg EST-based designed primers. PCR products were random-prime labeled with '-vP-udATP.. Standard hybridization conditions [24] were used for the screening of approximatelyy 40.000 plaques per tag. After 2 rounds of screening and excision, positivee clones were sequenced and checked for the presence of the corresponding humann EST at the 3' part of the insert, including the SAGE tags at the expected position.. Using the nBLAST program, insert sequences located upstream of the ESTT were used to screen the GenBank in the search for related sequences.

(18)

"iiryy uf tissufj-sncjCif c gen&s „ s m g SAGE a*xl 7 P E

NorthernNorthern blot

Thee same EST sequences used as probes for library screening were used against a blott of 8 human endocrinological tissues, including the thyroid, and containing 2 fjg off mRNA per lane (Clontech). PCR products were labeled with J2P-</.dATP and hybridizationn was performed following instructions from manufacturers. Analysis of expressionn was performed with the Phospholmager system (Molecular Dynamics).

A c k n o w l e d g m e n t s s

Thee authors acknowledge the valuable resources generated by NCBI and the CGAPP project and the University of Rochester with regard of SAGE databases. Researcherss and institutions involved in the construction and availability of EST databasess are also acknowledged.

JCMM is supported by an ESPE Research Fellowship, sponsored by Novo Nordisk A/S. Thiss project was funded by a grant from the Dr. Ludgardine Bouwman foundation.

(19)

Chapterr 2

References s

11 Zhang. L.. ef ai I 19971. Gene expression profiles .n normal and cancer ceJs. Science 276:1268-1272. .

22 Velcufescu. et ai. (1999). Anaiys.s of human transcriptomes. Nature Genet 23: 387-388. .

33 De VijJder. J. J. M.. and Vulsna. T. (2000). Hereditary metabolic disorders causingg hypothyroidism. In Werner's and Ingbar's The Thyroid. A Fundamental andand Clinical Text. Eigth Ecitio" iL. Braverman and R. D Utiger. Eds.), pp. 749-755.. J. B. L.ppmcott Williams & Wilkins. Philadelphia.

44 Leseney. A . - M . ef ai. (1999j. B'ochemical characterization of a Ca''~ 'NADiP.iH-clependentt H202 generator in human thyroid tissue. Biochimie 81: 373-380.

55 Rosenberg. I.N.. and Goswami. A. (1979). Purification and characterization of a flavoproteinn from bovine t h y c c with lodotyrosine deiodinase activity. J. Biol.

Chem.Chem. 254: 12318-12325

66 Velculescu. V. E.. Zhang. !_.. Vogelstem B.. and Kinzler. K. VV. (1995.!. Sena! analysiss of gene expression. Science 270: 484-487.

77 Madden. S. L, Galella. E. A.. Zhu. J.. Bertelsen. A. H.. and Beaudry. G. A. (1997). SAGEE transcript profiles for p53-depenbent growth regulation. Oncogene 15: 1079-1085. .

88 Hibi. K.. ef ai. (1998). Serial analysis of gene expression in non-small cell lung cancer.. Cancer Res. 58: 5690-5694.

99 Hashimoto. S., Suzuki, T.. Dong, H.Y., Nagai. S.. Yamazaki. N. and Matsushima. K.. (1999). Serial analysis of gene expression in human monocyte-denved dendriticc cells. Blood 94: 845-852.

100 Michieis. E. M. C . ef ai (1999). Genes differentially expressed in medulloblastomaa and fetal brain. Physiol. Genomics 1. 83-91.

111 Ryo. A., ef ai. (1999). Serial analysis of gene expression in HIV-1 -infected cell lines.. FEBSIett. 462: 182-186.

122 De Waard. V.. van den Berg, B.IVI.M. Vekon. J Schultz-Heienbrok. R.. Pannekoek.. H. and van Zonneveld. A.J (1999) Serial analysis of gene expressionn to assess the endothelial cell response to an atherogenic stimulus. GeneGene 226: 1-8.

(20)

UiOfingg of f s s u e s p e c i ^ c g e r o s uS;:'C S A G t a r a T P t

133 Welle. S.. Bhatt. K.. and Thornton. C. A. (1999). Inventory of the high-abundance mRNAss in skeletal muscle of normal men. Genome Res. 9: 506-513

144 Nacht. M., et al. (1999). Combining serial analysis of gene expression and array technologiess to identify genes differentially expressed in breast cancer. Cancer ResRes 59: 5464-5470.

155 Pauws. E., Moreno. J.C.. Tijssen. M.; Baas. F.. de Vijlder, J.J.M. and Ris-Stalpers.

C.. (2000). Serial analysis of gene expression as a tool to assess the human thyroidd expression profile and to identify novel thyroidal genes. J.Clin.Endocrinol.Metab.J.Clin.Endocrinol.Metab. 85: 1923-1927.

166 Lai, A., et al. (2000). A public database for gene expression in human cancers. CancerCancer Res. 59: 5403-5407.

177 De Deken. X. et al. (2000). Cloning of two human thyroid cDNAs encoding new memberss of the NADPH oxidase family. J. Biol. Chem 275: 23227-23233.

188 Schmitt. A. O., et al. (1999). Exhaustive mining of EST libraries for genes differentiallyy expressed in normal and tumor tissues. Nucleic Acids Res. 27: 4251-4260. .

199 Pietu, G.. ef al. (1999). The Genexpress IMAGE knowledge base of the human musclee transcriptome: a resource of structural, functional and positional candidatee genes for the muscle physiology and pathologies. Genome Res. 9: 1313-1320. .

200 Van den Berg, A., van der Ley. J., and Poppema. S. (1999). Serial analysis of genee expression: rapid RT-PCR analysis of unknown SAGE tags. Nucleic Acids Res.Res. Methods 27. e17.

211 Peters. D. G.. Kassam, A.B.. Yonas, H.. O'Hare. E.H., Ferrel. R.E. and Brufsky, A.M.. (1999). Comprehensive transcript analysis in small quantities of mRNA by SAGE-Lite.. Nucleic Acids Res. 27: e39.

222 Chen. J.-J.. Rowley, J.D. and Wang. S.M (2000). Generation of longer cDNA fragmentss from serial analysis of gene expression tags for gene identification. Proc.Proc. Acad. Sci. USA 97: 349-353

233 Thellin, O . et al. (1999). Housekeeping genes as internal standards: use and limits.. J Biotechnol. 75: 290-295.

244 Maniatis. T. Fntsch. E. F., and Sambrook. J. (1982). Molecular cloning: A laboratorylaboratory manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor. Neww York.

(21)

Referenties

GERELATEERDE DOCUMENTEN

3) Regulation and monitoring of ARTs is needed to increase the quality of treatments, to improve decision-making on ethically sensitive issues and to eliminate financial

In control C57BL/6 mice, the thickness of the endothelial gly- cocalyxx was not affected by vasodilation, but the increases in glycocalyx porosityy and capillary tube

If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of

If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons.. In case of

1.22 Historical view on the endothelial glycocalyx 13 1.33 Characterization of the endothelial glycocalyx 14 1.44 Composition of the endothelial glycocalyx 19 1.55

Withh the progress of cell biology and elucidation of cell-membrane struc- turee it was found that, indeed, a layer of polysaccharides consisting of the ectodomainss of

Althoughh early electron microscopic studies revealed already decades ago thatt carbohydrate rich endothelial surface structures form the interface be- tweenn blood and the

Wee determined the effect of oxidized low density lipoproteins (OX-LDL) on capillaryy tube hematocrit as well as capillary red cell velocity to test the hypothesiss that OX-LDL