Discrete Wavelet Transform-based Multivariate Exploration of Tissue via Imaging Mass Spectrometry

(1)

Discrete Wavelet Transform-based Multivariate Exploration

of Tissue via Imaging Mass Spectrometry

Raf Van de Plas

∗

Katholieke Universiteit Leuven Dept. of Electrical Eng. ESAT

SCD-SISTA (BIOI) Kasteelpark Arenberg 10 B-3001 Leuven (Heverlee)

Belgium

Bart De Moor

Katholieke Universiteit Leuven Dept. of Electrical Eng. ESAT

SCD-SISTA Kasteelpark Arenberg 10 B-3001 Leuven (Heverlee)

Belgium

Etienne Waelkens

Katholieke Universiteit Leuven Dept. of Mol. Cell Biology

Section Biochemistry O & N, Herestraat 49 - bus 901

B-3000 Leuven Belgium

This paper gives a short overview of work described in a technical report by the same authors [3], which is available upon request.

ABSTRACT

Mass spectral imaging (MSI) or imaging mass spectrometry is a developing technology that combines spatial information with traditional mass spectrometry. It enables researchers to study the spatial distribution of biomolecules such as pro-teins, peptides, and metabolites throughout organic tissue sec-tions. MSI has particular merit in exploratory settings where there is no prior hypothesis of relevant target molecules. It is rapidly becoming a potent exploratory instrument for tissue biomarker studies.

MSI is a high-throughput technique that mines massive amounts of measurements from a single tissue section. As various parameters such as the covered tissue surface area, the spatial resolution, and the extent of the mass range grow, MSI data sets rapidly become very large, making analysis from a computational and memory standpoint increasingly diffi-cult. In this paper we introduce the discrete wavelet trans-form (DWT) as a means of reducing the dimensionality of the data,while retaining a maximum amount of biochemical information. The DWT delivers a more compact description of each mass spectrum, expressed as wavelet coefficients. The efficacy of performing analyses directly in the DWT-reduced space is illustrated using unsupervised trend detection via principal component analysis (PCA) on the MSI measurement of a sagittal section of mouse brain.

Categories and Subject Descriptors

J.3 [Life and Medical Sciences]: Biology and genetics; G.1.2 [Numerical Analysis]: Approximation—Wavelets and fractals; I.5.2 [Pattern Recognition]: Design Methodology—Pattern Analy-sis; I.4.2 [Image Processing and Computer Vision]: Compres-sion (Coding)—Approximate methods

∗Corresponding author: raf.vandeplas@esat.kuleuven.be

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

SAC’08 March 16-20, 2008, Fortaleza, Cear´a, Brazil

Keywords

bioinformatics, proteomics, imaging, mass spectrometry, dis-crete wavelet transform, principal component analysis

1. INTRODUCTION

MALDI-based imaging mass spectrometry (MSI) [2] pre-serves the link between a spatial tissue location and a bio-chemical characterisation of what was found there. It uses the molecular specificity and sensitivity of normal mass spectrom-etry to collect a spatial mapping of biomolecules (or rather their ions) from a tissue section. Meistermann et al. [1] give an example of its use in biomarker discovery. Fig. 1 shows an overview of an MSI experiment. A more thorough treatment is available from Stoeckli et al. [2] and Van de Plas et al. [4]. The result of an MSI experiment consists of a grid of mea-surement locations or ’pixels’ covering the tissue section, with an individual mass spectrum connected to each pixel. The data structure can be considered as a three-mode array with two spatial modes (x and y) and one mass-over-charge mode (m/z). In this paper, principal component analysis (PCA) is used to decompose high-dimensional MSI data into a reduced set of uncorrelated biochemical tissue trends [4].

The size of an MSI data set is primarily influenced by two parameters: the number of measurement locations or pixels and the number of scanned m/z-bins. The pixel number in-creases as the tissue surface area that needs to be covered grows larger or the spatial resolution becomes higher. The number of m/z-bins is proportional to the extent of the mass range and the granularity of the mass resolution. As this technology develops, both parameters get pushed upward and data sets appear that stretch the computational and memory resources generally available or challenge the scalability of the algorithms that operate on them. Due to its multiway nature, MSI has, more so than standard mass spectrometry, a need for strong dimensionality reduction methods with minimal loss of biochemical information.

2. METHODS

In response to the issues raised in section 1, this paper intro-duces the discrete wavelet transform (DWT) as a method for reducing high-dimensional MSI data into a lesser-dimensional space with most relevant information left intact. The capa-bility for operating analyses directly in the reduced space is demonstrated using trend detection via PCA [4] with satisfac-tory results. The general method is explained in [3] and Fig. 4, while section 3 demonstrates the procedure on a mouse brain case study.

(2)

Tissue Slice Creation Application of

Slice to Target Plate Matrix SolutionApplication of Laser-based Ionization & Desorption

Peak Identification

& Processing

...

Mass Measurement for each gridpoint Array of

Raw Unprocessed MS Peaklisted MSArray of

...

Selected m/z-window

Multivariate PCA decomposition taking all m/z bins into

account I o n I ma g e Pr i nc i p a l Co m p o ne n t I ma g e

Figure 1: Overview of an MSI experiment on spinal cord. (wet-lab) A tissue section is cut using a micro-tome, mounted on a target plate, and covered with an appropriate chemical matrix to enable ionization. (mass spec) Individual mass spectra are collected from the tissue area of interest, while their spatial relationships are retained. (in silico) The data is collected into a three-mode array for analysis.

hippocampus amygdalar region corpus callosum

caudate putamen (striatum)

lateral cerebellar nucleus

parasubiculum

2 mm

Figure 2: (top) Picture of the sagittal mouse brain section im-aged in section 3. (bottom) Ion image showing the presence of m/z 14148 primarily in the cor-pus callosum and the lateral cere-bellar nucleus. 0.40.60.811.21.41.61.822.22.4x 104 500 1000 1500 2000 2500 3000 m/z intensity 0.40.60.811.21.41.61.822.22.4x 104 500 1000 1500 2000 2500 3000 0.40.60.811.21.41.61.822.22.4x 104 -100 -50 0 50 100 150 200 250 0.40.60.811.21.41.61.822.22.4x 104 500 1000 1500 2000 2500 3000 0.40.60.811.21.41.61.822.22.4x 104 -200 -150 -100 -50 0 50 100 150 200 0.40.60.811.21.41.61.822.22.4x 104 500 1000 1500 2000 2500 3000 0.40.60.811.21.41.61.822.22.4x 104 -300 -200 -100 0 100 200 300 400 6490 samples cs 3249 coef a1 ca1 3249 coef d1 cd1 1629 a2 ca2 1629 d2 cd2 819 a3 ca3 819 d3 cd3 s 0 2468 -1 -0.5 0 0.5 1 Wavelet used: Daubechies nr.5 (db5)

Figure 3: A multiple-level DWT performed on a mass spectrum from the case study of section 3. The chosen wavelet is db5 (top-right) and the de-composition tree has a depth of three levels. The profile of the original mass spectrum s is well pserved in approximations a1 to a3, while the re-quired coefficients describing the waveform have been reduced from 6490 to 819.

x y m/ z PC 1 x y c oe f PC 2 PC 3 + ... x y c oe f PC 1 x y m/ z PC 2 PC 3 + + + ... PCA on mass spectral data + + DWT PCA on DWT coefficients IDWT Requires SVD of a covariance matrix based on a reduced

pixel x coef matrix Requires SVD of

a covariance matrix based on a standard pixel x m/z matrix

Figure 4: Overview of the DWT-based PCA decomposition of MSI data. DWT is used to reduce the feature space, and PCA is performed in the re-duced space. First loading (91.7568%) x y 1 10 20 30 40 50 1 10 20 30 0.5 1 1.5 2 2.5 x 104 2 4 6 x 104First score (91.7568%) m/z Second loading (3.1364%) x y 1 10 20 30 40 50 1 10 20 30 0.5 1 1.5 2 2.5 x 104 0 1 2 3 x 104Second score (3.1364%) m/z Third loading (2.0243%) x y 1 10 20 30 40 50 1 10 20 30 0.5 1 1.5 2 2.5 x 104 0 0.5 1 1.5 2 2.5 x 104 Third score (2.0243%) m/z

Figure 5: Non-DWT-based principal compo-nents. These are the three most important PCs found via direct application of PCA. They chem-ically delineate a number of known anatomical zones in the mouse brain (e.g. corpus callosum).

First loading (92.5669%) x y 1 10 20 30 40 50 1 10 20 30 0.5 1 1.5 2 2.5 x 104 2 4 6 x 104 First score (92.5669%) m/z Second loading (3.0301%) x y 1 10 20 30 40 50 1 10 20 30 0.5 1 1.5 2 2.5 x 104 0 1 2 3 x 104Second score (3.0301%) m/z Third loading (2.0356%) x y 1 10 20 30 40 50 1 10 20 30 0.5 1 1.5 2 2.5 x 104 0 0.5 1 1.5 2 2.5 x 104 Third score (2.0356%) m/z

Figure 6: DWT-based principal components. These are the three primary PCs found via the method of Fig. 4. Although PCA was applied in the lower-dimensional coefficients-space, it yields almost identical results (see Fig. 5).

3. CASE STUDY

To establish an empirical test case, this section applies both the DWT-based method as well as direct PCA to the MSI measurement of a mouse brain section. The goal is an assess-ment on a real example of how well the information in the data set is retained, given a serious reduction in dimension-ality. The example also demonstrates whether results from factor analysis methods such as PCA that are performed in the coefficients-space, approximate or retain relevance to re-sults directly from the original m/z-space.

4. CONCLUSION

The excellent compression characteristics of the DWT pro-vide us with a means of dimensionality reduction that retains the mass spectral information to a high degree while offering significant reductions in resource requirements. In this paper we have shown specifically that resource-hungry operations, such as biochemical trend detection via PCA, can be per-formed directly on the compacter description of the data with meaningful biochemical results that deviate little from non-DWT-reduced results. The approach has calculation time re-ducing, noise removing, and memory requirements dimishing aspects that can be useful in any MSI context. Additionally, it holds promise as a tool to make multivariate exploration feasible on very large MSI data sets, that might otherwise surpass available resources.

Acknowledgements

We kindly acknowledge Dagmar Niemeyer and S¨oren-Oliver Deininger from Bruker Daltonics in Bremen, Germany.

RVDP is a research assistant of the IWT at the K.U.Leuven, Belgium. BDM is a full professor at the K.U.Leuven, Belgium. EW is a full professor at the K.U.Leuven, Belgium. Additionally, RVDP, BDM, and EW are affiliated with the Interfaculty Centre for Proteomics and Metabolomics, ProMeta at the K.U.Leuven. Research supported by Research Council KUL: GOA AMBioRICS, SymBioSys CoE EF/05/007, several PhD/postdoc & fellow grants; Flemish Government: -FWO: PhD/postdoc grants, projects G.0241.04, G.0499.04, G.0232.05, G.0318.05, G. 0553.06, G.0302.07, research communities (ICCoS, ANMMM, MLDM); - IWT: PhD Grants, GBOU-McKnow-E, GBOU-ANA, TAD-BioScope-IT, Silicos; SBO– BioFrame; Belgian Federal Science Policy Office: IUAP P6/25 & P6/28; EU-RTD: ERNSI; FP6-NoE; FP6-IP, FP6-MC-EST, FP6-STREP, ProMeta, BioMacS.

5. REFERENCES

[1] H. Meistermann et al., Biomarker discovery by imaging mass spectrometry: transthyretin is a biomarker for gentamicin-induced nephrotoxicity in rat, Mol Cell Proteomics, 5:10, 2006, pp 1876–1886.

[2] M. Stoeckli et al., Imaging mass spectrometry: a new technology for the analysis of protein expression in mammalian tissues, Nat Med, 7:4, 2001, pp 493–496. [3] R. Van de Plas et al., “Discrete Wavelet

Transform-based Multivariate Exploration of Tissue via Imaging Mass Spectrometry,” Internal Report,

ESAT-SISTA, K.U.Leuven (Leuven, Belgium), 2007. [4] R. Van de Plas et al., “Prospective Exploration of

Biochemical Tissue Composition via Imaging Mass Spectrometry Guided by Principal Component Analysis,” in Proceedings of the Pacific Symposium on Biocomputing 12, Maui, HI, 2007, pp. 458-469.