Contents lists available at ScienceDirect
Data
in
Brief
journal homepage: www.elsevier.com/locate/dib
Data
Article
Aqueous
humor
proteome
of
primary
open
angle
glaucoma:
A
combined
dataset
of
mass
spectrometry
studies
W.H.G
Hubens
a,b,∗,
R.J.C
Mohren
c,
I
Liesenborghs
a,d,
L.M.T
Eijssen
b,e,
W.D
Ramdas
f,
C.A.B
Webers
a,
T.G.M.F
Gorgels
a,∗a University Eye Clinic Maastricht, Maastricht University Medical Center, Maastricht, the Netherlands b Department of Mental Health and Neuroscience, Maastricht University, Maastricht, the Netherlands c Maastricht MultiModal Molecular Imaging (M4I) Institute, Division of Imaging Mass Spectrometry, Maastricht
University, Maastricht, the Netherlands
d Maastricht Centre of Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands e Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands f Department of Ophthalmology, Erasmus Medical Center, Rotterdam, the Netherlands
a
r
t
i
c
l
e
i
n
f
o
Article history: Received 2 June 2020 Revised 4 September 2020 Accepted 15 September 2020 Available online 21 September 2020
Keywords:
Primary open angle glaucoma Aqueous humor
Human Proteome
Liquid chromatography tandem mass spectrometry
a
b
s
t
r
a
c
t
Analysis of the proteins of the aqueous humor can help to elucidate the complex pathogenesis of primary open angle glaucoma. Thanks to advances in liquid chromatography tan- dem mass spectrometry (LC-MS/MS) it is now possible to identify hundreds of proteins in individual aqueous humor samples without the need to pool samples. We performed a systematic literature search to find publications that per- formed LC-MS/MS on aqueous humor samples of glaucoma patients and of non-glaucomatous controls. Of the seven publications that we found, we obtained the raw data of three publications. These three studies used glaucoma pa- tients that were clinically similar (i.e. undergoing glaucoma filtration surgery) which prompted us to reanalyse and com- bine their data. Raw data of each study were analysed sepa- rately with the latest version of MaxQuant (version v1.6.11.0). Outcome files were exported to Microsoft Excel. Samples be- longing to the same patient were averaged to obtain peptide expression values per individual. We compared the overlap
DOI of original article: 10.1016/j.exer.2020.108077 ∗ Corresponding authors.
E-mail addresses: w.hubens@maastrichtuniversity.nl (W.H.G Hubens), theo.gorgels@mumc.nl (T.G.M.F Gorgels).
https://doi.org/10.1016/j.dib.2020.106327
2352-3409/© 2020 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )
of identified proteins using the VLOOKUP function of Excel and a publicly available Venn diagram software. For the pep- tide sequences that can belong to multiple proteins (usually of the same protein family), we initially included all possi- bly identified proteins. This ensured that we would not miss a potential overlap between the studies due to differences in identified peptide counts. Next, of those peptides of which we compared multiple proteins, only one unique protein was included in our analysis i.e. either the protein overlapping between studies or in case of no overlap, the protein that had the highest identified peptide count. This yielded 639 unique proteins detected in aqueous humor of either glaucoma pa- tients or non-glaucomatous controls. In our manuscript en- titled “The aqueous humor proteome of primary open angle glaucoma: An extensive review”[1], we further analysed this dataset. The dataset was exported to Perseus (version 1.6.5.0). We removed contaminants and filtered for proteins detected with high confidence, i.e. in more than 70% of the samples of at least one study. This yielded 248 proteins of which we compared the expression in glaucoma patients against con- trol patients. Gene ontology enrichment analysis and path- way analysis was used to interpret the results. The unfiltered dataset reported in this data article and the approach re- ported here to reanalyse and combine raw data of different studies can be applied by other glaucoma researchers to gain more insight in the pathogenesis of glaucoma.
© 2020 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/)
SpecificationsTable
Subject Ophthalmology
Specific subject area Aqueous humor proteome of primary open angle glaucoma
Type of data Table
How data were acquired Raw data were obtained from ProteomeXchange, a publicly available database and reanalysed with the freely available MaxQuant software (Max Planck Institute version v1.6.11.0). During our study, dataset PXD004928 was not yet publicly available and we obtained the raw data after contacting the authors. Microsoft Excel was used to combine the files. Subsequently, we imported the combined dataset in Perseus (Max Planck Institute version 1.6.5.0) for filtering and statistical analysis.
Data format Raw
Analysed Filtered
Parameters for data collection We performed a systematic literature search to find studies that investigate the proteome of aqueous humor from patients with glaucoma compared to non-glaucomatous controls. We considered only studies that included glaucoma patients without other ocular comorbidities. This meant that from the 9 proteomic studies we found, 7 were eligible to obtain the raw data. We managed to obtain the raw data of three publications. They used similar glaucoma patients i.e. patients undergoing glaucoma filtration surgery, which prompted us to reanalyse their raw data and combine the outcome for new statistical analysis.
Subject Ophthalmology
Description of data collection We reanalysed the raw data of three publications that investigated the aqueous humor proteome of primary open angle glaucoma patients compared to non-glaucomatous controls, using LC-MS/MS. We downloaded the raw data from the depositories and subsequently loaded them into the MaxQuant software program (v1.6.11.0) for analysis. Analysed data were exported to Microsoft Excel to average duplicates and to combine the different studies into 1 protein database. This database was imported into Perseus analysis software (v1.6.5.0) to filter for proteins with high detection confidence and subsequent statistical analysis to compare glaucoma patients with controls. Data source location University Eye Clinic Maastricht
Maastricht Netherlands
Data accessibility RAW data were obtained from ProteomeXchange:
Dataset 1: “Human aqueous humor of Primary open angle glaucoma LC-MS/MS”; PXD007624;
https://www.ebi.ac.uk/pride/archive/projects/PXD007624 Dataset 2: “Comparative shotgun proteomics of aqueous humor for cataract, glaucoma and pseudoexfoliation eye disorders”; PXD002623; https://www.ebi.ac.uk/pride/archive/projects/PXD002623
Dataset 3: “Comparative evaluation of the aqueous humor proteome of primary angle closure and primary open angle glaucomas and senile cataract eyes”; PXD004928;
https://www.ebi.ac.uk/pride/archive/projects/PXD004928 Analysed data are included in this article
Related research article WHG Hubens, RJC Mohren, I Liesenborghs, LMT Eijssen, WD Ramdas, CAB Webers, TGMF Gorgels, The aqueous humor proteome of primary open angle glaucoma: an extensive review, Exp. Eye Res. 197 (2020) 108077
doi: 10.1016/j.exer.2020.108077
ValueoftheData
• This dataset provides the list of proteins present in the aqueous humor of primary open angleglaucomapatientsandcataractpatientsandfacilitatesextractionandquantificationof diseasespecificdifferences.
• Thisdatasetisarichresourceforglaucomaresearchersandpharmaceuticalcompanies inter-estedinunravellingtheproteomeofprimaryopenangleglaucoma.
• Thedatasetfacilitatespathwayanalysistoidentifynewglaucomapathwaysthatcanbe tar-geted inhumanoranimalstudies,withtheaimofestablishingnewbiomarkers ornew in-terventionsforprimaryopenangleglaucoma.
• The approach detailedheretoregroup, combineandreanalyse publicly available datamay beusefulforotherstudiesondatainpublicdatabases.
1. DataDescription
Fig.1:
Fig.1isaflowchartthatvisualizestheworkflowthatwefollowedinourreviewonthe aque-oushumorproteomeofprimaryopenangleglaucomapatients.Inshort,aliteraturesearchwas performedtofindeligiblestudies.Wesubsequentlytriedtoobtaintherawdatarelatedtothese studieseitherviapublicly availablerepositories orby attemptingtocontactthecorresponding author.Threedatasetswereobtained(seedataaccessibilitytablefortherespectivelinks).Each datasetwasreanalysed andprocessed,afterwhichtheywere combinedinto1datasetfor sta-tisticalanalysis.
Fig. 1. Flowchart reporting the workflow to obtain a combined proteomic dataset of glaucomatous aqueous humor. File1:
File 1 is a description of the patient characteristics. The columns are self-explanatory. Humphrey visual field analysertest results (columnJ andK) were not available forsome pa-tients asindicated by“NA”. Thesampleshighlightedinred wereexcluded fromourcombined analysis, because thesepatientswere additionallydiagnosed withpseudoexfoliationsyndrome (PEX).The remainingcontrolsandglaucomapatientswerepooledtoformacombineddataset ofwhichtheaverageage,genderdistributionandaverageeyepressureisprovidedincolumns Q-S.Statisticalanalysis(columnT)showedthattheseparameterswerenotsignificantlydifferent betweenthetwogroups.
File2(general):
WereanalysedthethreedatasetswithMaxQuantandexportedtheoutputfilestoMicrosoft Excel(file2).Thedataare namedafterthecorrespondingfirst authors.Thesefilesare consid-eredasrawdata,i.e.theyareunprocessedandcontainseveralredundantcolumns.Thegeneral layout isasfollows:possibleidentified proteins(A),proteinwithmostpeptidereads(B),how manytimesapeptidewasmeasured(C-E),theproteinnames(F),genesymbol(G),fastaheader (H),peptide read per sample, molecular weightof the protein,peptide identificationmethod, sequencecoverage,uncorrectedintensity,IBAQcorrectionintensity,LFQcorrectedintensityand MS/MScount.
Foreach file, thesamples were differentlyannotatedby the authors.An overviewisgiven below:
File2dataset1:Adav.
This dataset contains 5 control patients (CG065, CG070,CG072, CG075 and CG078) and5 glaucoma patients (G009, G010, G016, G039 and G041). Aqueous humor of each patient was analysedinduplo(_Aand_B).
File2dataset2:Kaur.
Thisdatasetcontains9controlpatientsand9glaucomapatients.Controlgroupwasdenoted asCatandglaucomagroupasPOAG.Itseemsthisstudywasperformedintwobatches.Thefirst
batchof4controland4POAGpatientswasannotatedas“long” (Cat1URlong,Cat2long,Cat 3long,Cat4long,POAG1long,POAG2long,POAG3longadPOAG4long)andwasperformed induplo(long1vslong2).Onesamplewasalsoanalysedathirdtime(cat1UR)presumableto testadifferentprotocol.Thesecondbatch of5control(New Cat1–5)and5POAG(NewPOAG 1–5)werenotmeasuredinduplo.
File2dataset3:Kliuchnikova.
Thisdataset contains11control patients(k10, k14, k18, k24,k32, k44, k52,k60, k62, k64, k8)and7glaucomapatients(g110,g114,g116,g12,g50,g54,g56).Allpatientswereanalysedin triplo(_1,_2and_3).
Processeddatasets(file3andfile4):
Proteinexpressionsofduplicatesampleswereaveraged.Theaveraged intensity,iBAQ inten-sityandLFQintensityforeachdatasetareprovidedinfile3.Thisfilecontainsthreetabsnamed “Adav_duplo removed”, “Kaur_duplo removed” and “Kliu_duplo removed”. Layout and sample codingisthesameasforfile2.UsingVLOOKUPfunctionofMicrosoftExcelandVenndiagram softwareallreportedproteinsacrossstudieswerematchedintoasinglefile(file4).Wepresent theproteins(A),majorityproteinUniProtID(B),proteinname(C),genename(D),fastaheader (E),inhow manysamplesthe proteinisidentifiedwithin eachgroup andstudy(G-L), the av-erageLFQexpressionineach study(N-P)andshowedthatafternormalizationtheaverageLFQ intensity wasthe same in each study(column S-U). The normalized LFQ intensity per sam-ple/studyisreported(columnW-BP)andtherawLFQintensitiesispresentedincolumnBR-DK. Rawintensities(DR-FK)andiBAQnormalizedintensities(FP-HI)arealsoprovided.
Filtereddataset(file4):
Forthepurposeofourreview [1],thedatasetwasfurtheranalysedinPerseus.Weremoved contaminants,filteredonproteinswhoseLFQproteinexpressionwasdetectedinmorethan70% ofthesamplesinatleastonestudy,log-normalizedtheLFQintensitiesandperformedmultiple ANOVAtocompareglaucomaandcontrolpatients.TheoutcomewasagainexportedtoMicrosoft Excel(file4).Thefiltereddata fileconsistsofthefollowingcolumns: genename(A),majority proteinUniprotID(B),proteinname(C),meanexpressionincontrols(D),inhowmanycontrol samplestheproteinwasdetected(E),meanexpressioninglaucoma(F),inhowmanyglaucoma samplestheproteinwasdetected(G),anddifferenceinlogtransformedproteinexpression be-tweenglaucomaandcontrols(H).Theuncorrectedp-value(I)andtheFDR-correctedq-value(J) arereported.ColumnL-BEareproteinexpressionvaluesofeachindividualsample.
2. ExperimentalDesign,Materials,andMethods
As depictedin the flowchart (fig.e1), we performeda systematic literature search to find studiesthatreportedproteomicsdatafromLC-MS/MSstudiesofglaucomaaqueoushumor sam-ples. Keywords used were “primary open angle glaucoma” and “aqueous humor”. We found 9LC-MS/MSstudiesofwhich7studiesmatchedourcriteriathatother oculardiseasesare ab-sent [2–8].Weattemptedtogetaccesstotheunderlyingrawdataeitherviadepositoriesorby contactingthecorrespondingauthors.Wemanagedtoobtaintherawdataofthreepublications [2–4](PXD007624,PXD002623andPXD004928).
Oftwoofthesepublications,thepatientcharacteristicswereunfortunatelynotwelldefined. Upon contacting thecorresponding authors,they kindly provided usthe missinginformation. We report the detailed patient characteristics in this manuscript (file 1). Since the inclusion andexclusioncriteriawerelargely overlapping betweenthe threestudies, wedecided topool the controlsandto pool the glaucoma patients fora combined analysis.Patients additionally diagnosedwithpseudoexfoliationsyndromewereexcludedfromthiscombineddataset.Asseen fromcolumnsQ-T,thepooledgroup of25controlsand21glaucomapatientswerecomparable intermsofage,genderdistributionandeyepressure.
Therawdataofprimary openangleglaucoma patientsandcontrolswere reanalyzedusing MAXQuantsoftware(Max PlanckInstitute;[9,10]).Astherawdatavaried greatlybetweenthe
studies,wefailedtonormalizethedatainapooledreanalysis.Therefore,wedecidedto reana-lyzeeachstudyseparately.Thefollowingsettingswereused:
• Variablemodification:Oxidation(M)andAcetylation(proteinN-term) • Fixedmodification:Carbamidomethyl(C)
• Trypsindigestion
◦ Maxmissedcleavage:2 • Labelfreequantification
◦ Minimumratiocount:2 ◦ FastLFQmodeenabled, ◦ StabilizelargeLFQratios
◦ Minnumberofneighbours:3;averagenumberofneighbours:6 • Peptideidentification:
◦ “fromandto”
◦ Advancedidentificationenabled Secondpeptides
Matchbetweenruns
Outputfileswere exported toMicrosoftExcel(file2). Sampleor runduplicates were com-binedtoobtainproteinexpressionvaluesperindividual (file3).Wedidthisaccordingthedata processing recommendations of Bijlsma et al [11]. This meant that samples were averaged if more than one sample had LFQ expression values. If only one of the duplicate samples had measured expressionvalues,thissamplewasconsidered astheaverage.Forproteinsofwhich noneofthereplicates hadexpressionvalues,thevalue wassetto0.Tocombinethedatasets, weextractedthelistofmajorityproteinID’sfromeachstudy.Incaseofmultiplemajority pro-tein IDs matchingto a peptidesequence, we separatedthem intodifferent columns.This en-abled usto check ifatleast one ofthesuggested proteins wasreportedin theother studies, ensuringthehighestamountofoverlapbetweenthestudies.Weidentifiedtheoverlapviatwo different methodsi.e. theVLOOKUP function of MicrosoftExceland by usinga free Venn di-agramsoftware(VIB-Ugent; http://bioinformatics.psb.ugent.be/webtools/Venn/).After we estab-lishedwhatproteinshadoverlappingdetectionbetweenstudiesweusedtheVLOOKUPfunction tocopythecorrespondingexpressionvaluesofeachstudy,creatingourfinal combineddataset (file4).Forcombinedanalysisinthepublicationcorrespondingtothisdataset [1],weusedthe LFQintensitiesoftheproteins.LFQintensitiesvariedgreatlybetweenstudies(1000fold differ-ence) andneedednormalization.This wasachievedbydividing theLFQintensityof aprotein by theaverage LFQintensityintherespective studyandthenmultiplying by theaverage LFQ intensityacrossallstudies.Researchers canapplyother normalizationmethods onthisdataset forintensity,iBAQintensityandLFQintensity.File4wassubsequentlyimportedtoafree analy-sissoftware(Perseus1.6.5.0;MaxPlanckInstitute) [12].Herewefilteredforproteinsthatwere notconsideredcontaminantsandweredetectedwithahighconfidence.Thismeantthatwithin a study,proteins were detected inatleast70% ofeitherthe controlpatients ortheglaucoma patients.Next,weperformedalog-transformationonthenormalizedLFQproteinexpression in-tensity dataandstatistically compared theexpression ofthe glaucoma group andthe control group using the build inmultiple comparison ANOVAwithFDR-adjusted correction. The out-comewasexportedbacktoMicrosoftExcel(file5).
3. EthicsStatement
The current study used data fromthree previously publisheddatasets on human aqueous humor proteome andwe did not havecontact withanyof thestudy participants.All studies declaredthattheyadheredtotheDeclarationofHelsinkiandperformedthestudieson partici-pantsthatprovidedwritteninformedconsent.
DeclarationofCompetingInterest
Theauthorsdeclarethattheyhavenoknowncompetingfinancialinterests orpersonal rela-tionshipswhichhave,orcouldbeperceivedtohave,influencedtheworkreportedinthisarticle.
Acknowledgements
Theauthorsthankprof. dr.RamanjitSihota,dr. InderjeetKaur,prof. dr.SergeiMoshkovskii, dr.AnnaKliuchnikovaandallcollaboratorsoftheincludeddatasets,formakingtheirdata pub-liclyavailableandprovidingadditionalinformationregardingtheirstudiesuponourrequest.
SupplementaryMaterials
Supplementary materialassociated withthisarticlecan be found,in theonlineversion, at doi:10.1016/j.dib.2020.106327.
References
[1] WHG Hubens, RJC Mohren, I Liesenborghs, LMT Eijssen, WD Ramdas, CAB Webers, TGMF Gorgels, The aqueous humor proteome of primary open angle glaucoma: an extensive review, Exp. Eye Res. 197 (2020) 108077 https: //doi.org/10.1016/j.exer.2020.108077 .
[2] SS Adav , J Wei , Y Terence , BC Ang , LW Yip , SK Sze , Proteomic analysis of aqueous humor from primary open an- gle glaucoma patients on drug treatment revealed altered complement activation cascade, J. Proteome. Res. 17 (7) (2018) 2499–2510 .
[3] I Kaur , J Kaur , K Sooraj , S Goswami , R Saxena , VS Chauhan , R Sihota , Comparative evaluation of the aqueous humor proteome of primary angle closure and primary open angle glaucomas and age-related cataract eyes, Int. Ophthal- mol. (2018) 1–36 .
[4] AA Kliuchnikova , NI Samokhina , IY Ilina , DS Karpov , MA Pyatnitskiy , KG Kuznetsova , IY Toropygin , SA Kochergin , IB Alekseev , VG Zgoda , et al. , Human aqueous humor proteome in cataract, glaucoma, and pseudoexfoliation syn- drome, Proteomics 16 (13) (2016) 1938–1946 .
[5] D Salamanca , JL Gomez-Chaparro , A Hidalgo , F Labella , Differential expression of proteome in aqueous humor in patients with and without glaucoma, Arch. Soc. Esp. Oftalmol. 93 (4) (2018) 160–168 .
[6] Y Ji , X Rong , H Ye , K Zhang , Y Lu , Proteomic analysis of aqueous humor proteins associated with cataract develop- ment, Clin. Biochem. 48 (18) (2015) 1304–1309 .
[7] MA Kaeslin , HE Killer , CA Fuhrer , N Zeleny , AR Huber , A Neutzner , Changes to the aqueous humor proteome during glaucoma, PLoS One 11 (10) (2016) e0165314 .
[8] S Sharma , KE Bollinger , SK Kodeboyina , W Zhi , J Patton , S Bai , B Edwards , L Ulrich , D Bogorad , A Sharma ,Proteomic alterations in aqueous humor from patients with primary open angle glaucoma, Invest. Ophthalmol. Vis. Sci. 59 (6) (2018) 2635–2643 .
[9] J Ox , M Mann , MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification, Nat. Biotechnol. 26 (2008) 1367–1372 .
[10] S Tyanova , T Temu , J Cox , The MaxQuant computational platform for mass spectrometry-based shotgun proteomics, Nat. Protocols 11 (2016) 2301–2319 .
[11] S Bijlsma , I Bobeldijk , ER Verheij , R Ramaker , S Kochhar , IA Macdonald , B van Ommen , AK Smilde , Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation, Anal. Chem. 78 (2006) 567–574 . [12] S Tyanova , T Temu , P Sinitcyn , A Carlson , MY Hein , T Geiger , M Mann , J Cox , The Perseus computational platform