• No results found

Title: Data analysis for mass spectrometry imaging : methods and applications

N/A
N/A
Protected

Academic year: 2021

Share "Title: Data analysis for mass spectrometry imaging : methods and applications "

Copied!
17
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

The handle http://hdl.handle.net/1887/45501 holds various files of this Leiden University dissertation

Author: Abdelmoula, Walid M.

Title: Data analysis for mass spectrometry imaging : methods and applications

Issue Date: 2017-01-18

(2)

C

HAPTER

1 I NTRODUCTION

1

(3)

1 1.1. M

ASS

S

PECTROMETRY

I

MAGING

Mass spectrometry imaging (MSI) is an ex-vivo imaging technology used to record the spatial distribution of hundreds of unknown biomolecules directly from tissue samples [1]. Biomolecules are essential components in a living organism that control both the structural and functional behavior of every cell within every tissue [2]. Using the same MSI technology but different preparation set-ups different biomolecules (such as proteins, metabolites, lipids, peptides, and pharmaceuticals) can be analyzed in an unlabeled manner and without any prior knowledge, making MSI a promising discovery tool [3–5]. MSI has already had a substantial impact in diverse biomedical applications. For example, MSI has been used to investigate the biomolecular changes associated with neurological disorders such as Parkinson’s disease [6,7], Alzheimer’s disease [8,9], seizure [10], and stroke [11]. Recent studies have shown the ability of MSI to reveal biomolecular intratumor heterogeneity, and to identify detrimental tumor subpopulations that drive patient outcome [4,12]. In addition, MSI plays also a vital role in the drug discovery process as it provides low-cost imaging of pharmaceuticals and their metabolites for drug formulation development [9,13].

A typical MSI experiment requires the tissue sample to be processed based on predefined steps, before starting the MSI acquisition [14]. Following sectioning using a cryostat (fresh frozen tissue) or a microtome (formalin fized paraffin embedded tissue), the thin tissue sections are mounted onto microscope slides and then introduced in to the mass spectrometer.

The MSI data acquisition is essentially a raster scanning of a tissue section’s surface through an ionization mechanism to release the underlying molecules. The ionized molecules are separated according to their mass/charge ratio (m/z) and detected, which also measures the abundance of the detected molecules. A wide array of ionization mechanisms have been utilized for MSI, coupled to an even wider variety of mass spectrometers. Biomedical MSI experiments are mostly performed using the ionization technique matrix assisted laser desorption/ionization (MALDI) [15,16], owing to its wide applicability to multiple molecular classes, commercially available instruments and software focused on biomedical applications, and dedicated training course. Desorption electrospray ionization (DESI) is emerging as another ionization technique ably suited to biomedical MSI [17]; while it is more limited in scope, detecting primarily lipids and some metabolites, the very simple sample preparation step (only tissue sectioning is required) means it has been translated to clinical applications arguably more so than MALDI [18]. In this thesis all experiments have been performed using MALDI coupled to a time-of-flight (TOF) mass spectrometer. A brief overview about sample preparation and data acquisition using a MALDI-TOF mass spectrometer is given below.

1.2. MSI S

AMPLE

P

REPARATION

Fresh frozen tissues and formalin fixed paraffin embedded tissues have been analyzed by MSI:

• Fresh-frozen tissues are obtained through a rapid immersion of a fresh biological tissue in a fluid such as liquid isopentane or liquid nitrogen. The tissue blocks can be stored in -80C freezers or liquid nitrogen tissue storage facilities. Tissue

(4)

1.3.MALDI-MSI

1

3

sections are obtained using a cryostat, in which sectioning in typically performed in the temperature range -10 to -30 C. Tissue sections cut from fresh frozen tissue are typically around 10-12 µm thick, and then thaw mounted onto a cooled microscope slide. MSI of fresh frozen tissues can be used to analyze intact proteins, peptides glycans, lipids, metabolites, neurotransmitters, and pharmaceuticals.

• Formalin fixation is by far the most common format of clinical tissue banks, owing to the superior histology and ease of storage. Formalin fixation preserves a tissues’ cellular morphology and prevents protien degradation through forming crosslinks between proteins [19].The most widely used fixative for this procedure is 10% Neutral Buffered Formalin (with approximately 4% formaldehyde aqueous solution) . The fresh biological sample is then immersed in this formaldehyde solution at room temperature for several hours. For long-term storage the fixed tissue is dehydrated and embedded in paraffin; the “formalin-fixed paraffin- embedded” (FFPE) tissues can then be stored at room temperature indefinitely [20], display superior morphology compared to fresh frozen tissues and are less fragile. Formalin fixed tissue sections are obtained using a microtome, typical thickness approximately 6 µm, and mounted onto the microscope slide. The processing involved with the generation of FFPE tissues has meant that protocols specific to FFPE tissues have had to be developed in order to make them tractable to MSI. This consists of deparaffination and antigen retrieval to (partially) reverse the cross-linked network. Nevertheless, even then proteolytic digestion is nec- essary in order to analyze proteins (because the large proteins remain bound by the remaining formalin cross-linked network) [21]. Despite the use of extensive washing steps during the generation of FFPE tissues, which were thought to lead to the leaching of metabolites, recent MALDI MSI datasets have demonstrated that FFPE tissue can contain metabolite signatures (from the components that were insoluble in the solvents used during the generation of FFPE tissues). N- linked glycans can also be efficiently analyzed from FFPE tissues by releasing them from their protein back bone using the glycosidase PNGase F; these functional groups are not incorporated themselves into the cross-linked network and so once released from the protein can be efficiently analyzed by MALDI MSI.

1.3. MALDI-MSI

MALDI is currently the most common technique in mass spectrometry imaging [1,22].

MALDI-MSI can be used to simultaneously record the spatial distribution of hundreds of biomolecules directly from tissue section, as shown in Figure1.1. MALDI is historically considered a soft ionization technique in that it imparts a low degree of analyte fragmen- tation [15,16]. In MALDI an analyte is mixed with a matrix solution, in which the matrix is a small organic acid that has a high absorbance for the laser wavelength but which have been selected almost entirely empirically (with one notable exception, [15]). Upon evaporation of the solvent analyte molecules are either incorporated or in close contact with the resulting matrix crystals. Upon laser irradiation with a short pulse length UV laser, e.g. a frequency tripled Nd:YAG operating at 355 nm with a pulse duration <3 ns,

(5)

1

Figure 1.1: Schematic workflow of a typical MALDI-MSI acquisition experiment (modified and reprinted with permision from [22]). A tissue section is mounted onto a conductive glass slide, and then covered with a matrix that is chosen based on the molecular class to be analyzed. Irradiation of the matrix coated tissue with a focused UV laser beam leads to the generation of molecular ions, which can then be analyzed in a mass spectrometer. A mass spectrum is generated at each scanned point, and eventually the spatial distribution of each m/z feature can be visualized forming a so-called hyperspectral image or image datacube to be used for further bio-computational analysis.

(6)

1.3.MALDI-MSI

1

5

the matrix is super energized and results in an almost explosive phase change, in which matrix molecules, matrix clusters, and the entrained analyte molecules, move into the gas phase. A fraction of the matrix molecules and analyte ions are charged ions that can be analyzed in the mass spectrometer.

In a MALDI-MSI experiment the matrix solution is deposited onto the tissue, so that the same matrix solution also extracts the analytes from the tissue section [23].

In order to retain the spatial information of the analytes within the tissue section, the matrix solution should be applied homogeneously (on the length scale of the MALDI MSI experiment). The spatial distribution of the detected molecules, across the MSI scanned tissue, can then be reconstructed providing a so-called hyper-spectral image or imaging datacube, as shown in Figure1.1. There are a number of crucial factors that characterizes the MALDI-MSI process: matrix selection and application, MSI mass spectrometer, and resolution (mass, spatial, and depth).

1.3.1. M

ATRIX

S

ELECTION AND

A

PPLICATION

The matrix is of paramount importance in determining which molecular class may be analyzed and with which sensitivity [23–25]. Fortunately the selection of MALDI matri- ces for mass spectrometry is well established and the findings empirically determined for MALDI are equally applicable for MALDI-MSI, with only a few caveats concerning matrix crystal size, Table1.1shows an overview of some common MALDI matrices for different molecular classes. For example, the matrix sinapinic acid is highly effective for MALDI MSI of in-tact proteins whereas the matrix ofα-cyano-4-hydroxycinnamic acid is better for peptides [26]. To avoid any potential artifact that may arise due to the matrix application, the matrix should be homogeneously applied across the tissue sample [27]. While manual deposition using an airbrush or nebulizer represents a highly cost effective solution it is highly variable and utterly dependent on the expertise and patience of the analyst. Instead automated matrix application system based on sprayed are now the methods of choice, e.g. SunChrom SunCollect II, HTX TM-Sprayer or the open source solution iMatrixSpray). For all matrix deposition systems there exists a fine balance between efficient extraction of biomolecules from the tissue and lateral diffusion of the analytes in an overly wet matrix solution coating. Sequential coatings and rehydrations are used to extract the analytes while limiting lateral diffusions.

Table 1.1: Some common MALDI matrices for different molecular classes

Matrix Analyte

Sinapinic acid proteins

α-cyano-4-hydroxycinnamic acid peptides and proteins Trihydroxyacetophenone carbohydrates

Dithranol lipids

3-Hydroxypicolinic acid oligonucleotides 9-aminoacridine lipids and metabolites

(7)

1

Figure 1.2: Schematic diagram showing the separation of ions according to the m/z ratio in a MALDI time-of- flight (ToF) mass analyzer. The ions are accelerated towards a detector to measure their relative abundance.

1.3.2. MALDI-MSI

MASS SPECTROMETER

Once the molecular ions have been generated they are then analyzed according to the m/z ratio, as explained in Figure1.2. The mass analyzer measures then the physical property of those molecules based on their mass-to-charge ratio (m/z). There are two common types of mass spectrometer used for MALDI-MSI, namely time-of-flight (TOF) and those based on Fourier transform mass analysis. The latest commercial ToF systems (Figure1.2) such as the Bruker Daltonics RapiFlex offer high sensitivity, high acquisition speed but moderate mass resolution (10-50k). Fourier transform instruments such as the Fourier transform ion cyclotron resonance mass spectrometer offer high sensitivity and ultra high mass resolution (>1M at m/z 400) but slower scan speed, e.g. 1 pixel per second with an FTICR versus up to 50 pixels per second with the RapiFlex. Ideally the choice of mass spectrometer would be dependent on the application of interest; if high spatial resolution is required then a high speed system that is able to quickly analyze the large number of pixels is preferable, conversely if the analysis leads to very rich MSI datasets that contain many isobaric ions (ions with the same nominal mass but different elemental composition) a high mass resolution instrument would be preferred.

The acquisition time of a MALDI-MSI experiment and the resulting data load depend on the spatial resolution, the mass resolution (partially), the area of the scanned samples and the number of samples. The spatial resolution, analysis area and the number of samples determine the number of observations in the MALDI MSI dataset (i.e. the number of pixel associated mass spectra). The mass resolution determines the number of variables, which is now routinely reduced via automated feature identification and extraction from each pixel associated spectrum. Depending the mass spectrometer utilized the MSI data may need to be first aligned prior to data reduction, to ensure all pixel associated spectra are calibrated identically (i.e. the same analyte is detected at the same m/z ratio); for analyses of tissue sections from different animals or patients, which were recorded at different times, the different patient/animal datasets should be aligned to each other prior to any statistical analysis. High mass accuracy aids MSI data analysis by i) ensuring alignment corrections are only minor, and ii) enabling the assignment of identities to the mass spectrometry peaks.

(8)

1.4.DATAANALYSIS

1

7

1.4. D

ATA

A

NALYSIS

The reduced (peak identification and extraction) data generated from an MSI exper- iment is essentially a volumetric imaging datacube that represents the spatial distri- bution of hundreds of detected biomolecules [28]. The analysis and interpretation of such high-dimensional datacubes is a challenging task. MALDI-MSI data analysis ranges from the manual examination of individual ion images, e.g. for examining the distribution of a drug; to univariate tests to identify biomarkers that differentiate disease states (diagnostic biomarker) [29] or patient outcome (prognostic biomarker) [4]; to the generation of multivariate classifiers that can classify tissues according to the designed test [30], e.g. differential diagnosis. Of particular relevance to this thesis is multivariate analysis of MSI data, in which multivariate methods are used to demarcate, or segment, the tissue into regions characterized by distinct molecular mass spectrometry profiles [3,31]. To date these multivariate tools have been mainly used as molecular histology examination of the tissue, to identify regions of tissue with different molecular signatures. However they may also aid registration of MSI datasets, and thus better integrate MSI with the wider molecular imaging field. For example, a simple affine registration process to align an MSI datacube, a 3D dataset comprising individual images of many molecules, to its associated histological image, a 2D image, is not straightforward because of the one-to-many mapping issue related to aligning 2D and 3D images to each other.

Multivariate analysis methods are very common in representing high-dimensional data with a small number of components, and methods such as principle component analysis (PCA) [32], probabilistic latent semantic analysis (PLSA) [33], non-negative matrix factorization (NNMF) [34], k-means clustering [35], are widely used in analyzing MSI datasets [3,30,31,36]. Nevertheless, the methods are all linear and thus highlight the major variance in the data at the expense of more subtle changes. In MSI this means that they first demarcate the gross anatomical or morphological structure of the tissue;

the molecular differences within seemingly morphologically identical regions of tissue, i.e. the added value of MSI, can contribute a more minor component. This leads in turn to the potential difficulties related to selecting the appropriate number of components with which the multivariate analysis represents the MSI data, or in other words the number of distinct molecular sub regions within the tissue(s). In PCA this concerns selecting the number of principal components, or amount of variance, to retain; for NNMF, PLSA, k-means the number of components is pre-specified.

A recent and promising non-linear dimensionality reduction technique called t- distributed stochastic neighbour embedding (t-SNE) has shown superiority to the above linear multivariate data analysis (dimensionality reduction) techniques [37][38]. The hallmark of the t-SNE is its ability to preserve both local (minor) and global (major) characteristics of the high dimensional data in a single representation map. A major part of this thesis concerns an investigation of the use of t-SNE for analyzing MSI datasets for better visualizations, automated registration and enhanced molecular histology capabilities.

(9)

1 1.5. M

ULTI

-

MODAL

D

ATA

I

NTEGRATION

— MSI

DATA AND

H

ISTOLOGY

MSI represents an important bridge between mass spectrometry based omics and biology/pathology, because it enables the molecular information to be analyzed in the correct context of the tissue’s histoanatomy/histopathology. Despite the high capability of MSI to provide chemical-rich information, it’s spatial resolution is insufficient to reveal the microscopic structures used in establish histopathological examinations.

Conversely, light microscopy based histology provides high-resolution anatomical struc- tures but lacks chemical specificity. Multi-modal data integration between MSI and histology allows the measured molecular information to be linked to its anatomical entities and correlated with histopathological features, enabling the interpretation of molecular imaging data within the correct histopathological context of the tissue [22][39]. A seamless integration of MSI and histology enables the molecular signatures of distinct cell types to extracted; it is this high cellular specificity of MSI that enables it to be used to identify novel biomarkers.

If performed without excessive laser fluence MALDI MSI does not cause significant loss of histological features and this allows the same tissue sample to be histologically stained and a high resolution optical image can be obtained after the MALDI MSI experiment [5]. The same is true for DESI based MSI and experiments performed using the ionization technique [17], and Secondary-Ion Mass Spectrometry (SIMS) within the so-called static limit (<1% of surface analyzed by primary ionization beam) [40]. In order to enable histology of the same tissue section the experiments utilize transparent microscope slides, normal glass slides for DESI or conductive glass slides (ITO coated) for MALDI MSI using instruments that require a defined electric field in the ion source. Adjacent tissue sections can also be used for the histological analysis, for example if the tissue was mounted onto a non-transparent slide, however in this case local differences in the histological make up of the tissues and deformations that may arise during tissue mounting must then be taken into account. Different histological stains have been successfully integrated with MSI, including Nissl staining for neuroscience research, hematoxylin and eosin (H&E) for pathological investigations and immunohistochemistry for imaging the distributions of specific proteins.

The spatial correspondence between the high resolution histological microscope image and the lower resolution MSI data is mostly performed manually through the selection of fiducial markers to align the datasets. Commercially the registration of MSI data to histology has been trademarked by a single vendor (Bruker Daltonics) meaning that the full power of the different capabilities of different mass spectrometry technologies has not been harnessed. A generic, automated registration tool, would better harness the capabilities of modern mass spectrometry for MSI. The information contained in MSI and histology datasets is highly complementary but the difference in the dimensionality of the datasets makes it challenging to automatically determine their spatial correspondences and thus has hampered their automatic registration. Within this thesis we demonstrate how t-SNE based dimensionality reduction of the MSI data enables their automated registration.

(10)

1.6.IMAGEREGISTRATION

1

9

Figure 1.3: Basic framework of parametric-based registration (adopted from [45] ).

1.6. I

MAGE

R

EGISTRATION

Image registration is an indispensable tool for multi-modal data integration, especially in medical and biomedical applications [41–44]. Image registration is a process that aims at spatially aligning two or more images (2D/3D) of the same object but which have been acquired differently (different viewpoints, different imaging modalities, and/or different time points). The registration process typically refers to two images, a fixed (reference) image, If, and a moving image, Im, that is to be aligned to the fixed image. The moving image may need to be translated, rotated, scaled and geometrically deformed, either linearly or non-linearly, to become spatially aligned with the fixed image. Mathematically, see Equation1.1, image registration is an optimization process that searches for the optimum transformation parametersµ to minimize a cost functionb C (similarity metric) with respect to a transformation model Tµ. This iteratively parametric-based registration model presented in [45] and schematically shown in Figure1.3, is adopted in this thesis and implemented using the open source registration toolbox ofelastix[46].

µ = argminb

µ C [If, Tµ(Im)] (1.1)

The success of the registration process depends on a number of parameters that need to be selected carefully based upon the application, including transformation model, cost function, image sampler, and optimization method. For example, linear transfor- mation models (such as Affine or Rigid) are only efficient in covering global deformations (rotation, scaling, translation, and shearing), whereas non-linear deformation models (such as B-Splines [47, 48]) are efficient in covering local deformations (e.g., local compression or tears) [49,50], Figure1.4shows the effect of different transformations. A similarity metric of mean squared difference, which assumes similar image intensities, is generally only suitable for mono-modal registration [51], whereas the statistical measure of mutual information, which is independent of image intensity distributions, is more suitable for multi-modal registration [52,53]. In the following paragraphs we briefly discuss some important registration aspects, but for an extensive overview we refer the interested reader to [54,55].

(11)

1

(a) fixed (b) moving (c) translation

(d) rigid (e) affine (f) B-spline

Figure 1.4: Different transformation models were applied on the moving image (b) to be spatially aligned to the fixed image (a). Linear transformation models (c-e) could capture global deformations, whereas, the B-spline transformation (f ) could model local deformations (this figure reprinted from the user manual of the publicly availably registration toolbox ofelastix[56]).

Non-linear image registration (also known as elastic registration) is generally a challenging problem because different parts of an image need to be differentially deformed and this requires a rigorous mathematical model to be defined based on the image characteristics [54]. The first step of a non-linear image registration problem is to first apply a linear registration model to roughly align the images; correcting for global deformations and placing them on the same coordinate space. Then, a non-linear deformation model is used to fine-tune the alignment. The B-Splines transformation model, see Equation 1.2, is among the most efficient and widely used non-linear transformation models because B-Splines are compactly supported and thus compu- tationally efficient, in addition to its inherent smoothness due to the differentiability characteristics [46,49,57].

Tµ(x) = x + X

xk∈Nx

pkβ3(x − xk

σ ), (1.2)

where xk the control points,σ the uniform spacing between the B-Spline control points,β3is a cubic B-Spline polynomials, pkthe B-Spline coefficients, and Nxis the set of all control points within the B-Spline’s compact support at data-point x. Cubic-splines are the most commonly applied because of its salient property of minimal curvature (i.e.

most smooth), for more information the interested reader is referred to [57]. B-Splines are characterized by local support and this means that a transformation of a certain point can be performed using only its neighboring control points. Therefore, B-Splines

(12)

1.6.IMAGEREGISTRATION

1

11

are suitable for modeling local deformations while assuring fast computations.

A crucial step in nonlinear registration using the B-Splines transformation is defin- ing the spacing between the B-Splines control points. The control points are used to capture the local deformation in an image, and thus the spacing between those points determines how accurately the algorithm can correct for local deformations [57].

Nevertheless, sometimes a single grid of uniform control points spacing is not sufficient to correctly model the nonlinear deformations. This is the case, for example, in a mouse brain histological image which is a structure-rich environment with multi-scale anatomical structures that are highly variable. This can be tackled using multiple grids with different control points spacing that can be implemented in a multi-resolution scheme [58]. In this scheme, small control points spacing is defined at the finer level and linearly increased (usually multiple of two) to correct for large deformations at coarser levels.

Multi-resolution image registration aims at reducing the registration complexity and helps for fast optimization convergence. This scheme relies mainly on so-called scale-space or image pyramid, in which the original image is converted into a series of different down-sampled and smoothed images using typically Gaussian kernels [51].

The optimizer starts first at the coarser level focusing on aligning large structures and then uses the resultant transformation parameters as an initialization for the following smoother level to refine the registration, and so on until reaches to the finest level and corrects for the smallest deformations. This approach has been found less likely to be trapped in a local optimum [48,51].

A cost function is used to measure the similarity between the fixed image and the transformed moving image. There are several choices presented in literature [59], some of which are the sum of squared difference (SSD), kappa statistics, and mutual information (MI). The SSD measure is typically used for mono-modal images of equal intensity distributions, nevertheless, kappa statistics [60] metric is highly recommended for binary image registration problem. The measure of MI is highly suited for multi- modal image registration [52,53,61,62] as it can perform independently of the intensity distribution. The MI originated from information theory as a measure of statistical dependence between two datasets, and it was introduced for image registration by Viola et al [61] and Maes et al [53]. The MI metric computes both marginal and joint entropies of two images, and the best alignment is achieved through a transformation model that minimizes the joint entropy.

To avoid any potential unrealistic deformations associated with a non-linear reg- istration process, such as texture distortions or folding, a regularization term should be added to the cost function to penalize such deformations. Common examples for the regularization are the incompressibility constrain [63], rigidity penalty [64], elastic energy [65], and curvature constrain [66].

Non-linear registration is a computationally intensive process that can be signif- icantly improved by a proper selection of the optimizer and the number of pixels used for the cost function calculations. Among a number of popular optimizers[56, 67], the adaptive stochastic gradient decent technique has shown efficiency mainly due to its good convergence properties and its compatibility with random sampling strategies and thus can perform faster still [56]. Considering all image pixels during the

(13)

1

cost function derivative calculations would drastically slow down the registration, and would make it impractical for many real life applications. Instead, Thevanez and co- workers [68] have shown that a subset of image pixels may suffice for the optimization convergence. Stochastic subsampling strategies, in which a new random pixels subset is chosen at every optimization iteration, were found more robust than the deterministic subsampling that is usually biased and doesn’t guarantee convergence to the correct solution [56]. A main drawback of the deterministic subsampling strategy is using the same subset of pixels in all optimization iterations and this would deteriorate the precision accuracy specifically with increasing the down sampling factors as it is in the multi-resolution registration scheme.

1.7. A

LIGNING

MSI D

ATASETS TO A

S

TANDARDIZED

T

EMPLATE Preclinical and clinical research requires the comparison of cohorts of animals and patients respectively, to ensure any reported molecular changes are detectable within the biological variation of an animal/patient cohort. It is thus imperative to carefully define the anatomy and histological regions that will be compared, as variation in anatomy or histology will lead to additional sources of measurement variance that undermine the ability to find biomarkers.

For the analysis of rodent brains existing atlases, the Allen Brain Atlas for mouse brain [69] and the Rat Brain in Stereotaxic coordinates by Paxinos [70], provide reference atlases that may be used to automate the data analysis once the MSI datasets are registered to them. There are a number of recent studies that demonstrate the significant potential of MSI in uncovering biomulecular changes associated with a number of neurologic disorders in rodent models of Parkinson’s disease [6,7], Alzheimer’s disease [8,9], seizure [10], and stroke [11]

The registration of MSI datasets to a rodent brain atlas requires: i) determination of the location of the tissue section within the atlases stereotaxic coordinates, and ii) corrections for global and local deformations that will exist between the reference atlas and the experimental data (MSI and histology). This thesis tackles these challenges through developing an automatic registration pipeline for the analysis of mouse brain coronal tissue sections using the Allen Brain Atlas reference atlas. The Allen Brain Atlas was chosen because it also represents a unique resource of multi-modal imaging of the mouse brain [69], as shown in Figure1.5. The Allen Brain Atlas portal (http://www.brain- map.org/) provides a publically accessible resource of multi-modal brain imaging, which are all available in a common coordinate space, and includes magnetic resonance imag- ing data, high-resolution histological images, high resolution neuronal connectivity, and gene expressions of more than 20,000 genes. Accordingly, once registered the gene expression data may also be used to aid the interpretation of the MSI data.

1.8. T

HESIS OUTLINE

The main focus of this thesis is to develop methods to automatically align (register) and analyze MSI datasets. There are four main objectives, namely: 1) developing automatic registration methods to integrate MSI data and histology, 2) developing an automatic pipeline that could analyze large cohorts of MSI datasets, 3) speed up the MSI

(14)

1.8.THESIS OUTLINE

1

13

Figure 1.5: Allen Brain Atlas (ABA)[69] provides publically available resource of multi-modal imaging data of mouse brain that are all mapped to the same coordinate space.

acquisition process required in some applications by targeting only those histological regions that will be used in a subsequent statistical analysis (necessary to increase the sample throughput for high mass resolution MALDI MSI), and 4) development of t-SNE based MSI molecular histology, and an assessment of its clinical impact in oncology.

Each of these developments is presented in a separate chapter and a brief summary is given below.

It is now well established that MALDI MSI has the ability of MSI to uncover biomolec- ular changes in rodent brain models of neurologic disorders. Large scale preclinical investigations require localizing and comparing such disease-specific biomolecular signatures from multiple animals. The number of animals required and the high dimen- sionality of the MSI data make a manual analysis of the data from such large scale studies entirely impracticable and prone to error, primarily because of the multidisciplinary expertise needed to perform the analysis correctly. The statistical analysis of the MSI data would be greatly simplified if all MALDI MSI datasets, from the different animals, could be placed onto a single anatomical atlas, so that molecular signatures from specific regions could be readily compared. This registration needs to correct for the local and global differences in brain anatomy between the reference atlas and the tissue sections from the animal cohort, and the required anatomical annotations needed to localize the MSI changes. Chapter 2 addresses these challenges through presenting an automatic registration pipeline to align mouse brain MSI datasets to the standardized coordinate space of the Allen Brain Atlas.

(15)

1

The high dimensionality of MSI data means that there may be multiple spatial correspondences between individual MS images and the higher resolution histological image. This multiplicity of solutions has hampered the automation of the co-registration process. Currently there is no automatic algorithm that is able to explicitly register mass spectrometry imaging data spanning different ionization techniques or mass analyzers, and so the full potential of modern mass spectrometry has not been exploited for MSI. In Chapter 3, we present a fully automated generic approach for registering mass spectrometry imaging data to histology and demonstrate its capabilities for multiple mass analyzers, multiple ionization sources, and multiple tissue types.

Based on the developments presented in Chapters 2 and 3, two pre-clinical studies were analyzed and are presented in Chapters 6 and 7. The region-specific statistical analysis of a MALDI-MSI cohort of 32 mouse brains, spanning three different molecular classes, is presented in Chapters 6 and aimed at uncovering biomolecular changes associated with cortical spreading depression in a mouse model of migraine. Another pre-clinical application is presented in Chapters 7 in which SIMS-MSI was used to investigate the lipid accumulation in the brain of a transgenic mouse that lacked multifunctional protein 2 (Mfp2). MFP2 is a protein that is responsible for formation of bile acids and the degradation of pristanic acid and the very-long chain fatty acids (VLCFA). Mice that lack the Mfp2 suffer from severe neuromotor dysfunctions and die before the age of 6 months. It is, therefore, of interest to investigate and anatomically localize which biomolecules are associated with that pathology. SIMS-MSI data from this mouse model is aligned to the Allen Brain Atlas, and which was used to determine the precise anatomical localization of lipid deposition associated with that pathology and compare with gene expression data in the Allen Brain Atlas public resource.

Higher spatial/mass resolution leads to increased data size and longer data ac- quisition times. For clinical/preclinical applications, which analyze large series of patient/animal tissues, this poses a challenge to keep data load and acquisition time practicable. We aim at addressing this challenge through targeting only certain anatom- ical regions of interest for analysis by MSI; therefore a new concept termed histology- guided mass spectrometry imaging is presented in Chapters 4. It is an automated image registration pipeline to register an annotated, high-resolution histological image of an adjacent tissue section to a lower resolution optical image of the MSI-prepared (matrix coated) tissue section. Subsequently, the annotated borders are propagated to the low- resolution image, enabling the exclusive analysis of the annotated regions by MSI.

The application of multivariate analysis methods to MALDI MSI data of tumors has been shown to reveal tumor subpopulations with distinct molecular signatures [3,12,71]. Recently it was demonstrated that some of these subpopulations could be statistically associated with patient outcome [4]. In that work the number of subpop- ulations was manually varied and the clinical impact investigated. To better exploit this molecular histology capability for oncology research the number of subpopulations needs to be automatically determined from the available MSI data. The challenge has been to automatically identify those tumor subpopulations that drive patient outcome within the highly complex datasets (hyper-dimensional data, intratumor heterogeneity, patient variation). In Chapter 5, we report a data-driven bioinformatics pipeline that non-linearly maps the hyper-dimensional MALDI-MSI data into a three-dimensional

(16)

1.8.THESIS OUTLINE

1

15

space followed by spatial clustering and statistical analysis, to identify molecularly- distinct, clinically-relevant tumor subpopulations that drive patient outcome.

This thesis is concluded with summary and discussion in Chapter 8.

(17)

Referenties

GERELATEERDE DOCUMENTEN

A doubly constrained spatial interaction model is used to estimate service quality index based on a function of generalised journey time (GJT) and the service quality survey

Tracing the centre of gravity of global mining over the past two centuries demonstrates its role as a foundation of society throughout history (International Council of Mining

Er wordt gevraagd of, nu het Kwaliteitsverdrag er is, er geen plan gemaakt kan worden welke impact dit heeft/ zou moeten hebben, welke wettelijke gronden er zijn, etc. Diana

Symposium `Strijdpunten in en over Community Arts’, Groningen Research Group Lifelong Learning in Music &amp; the Arts.?. Over

Deze kaders zijn er ook op financieel vlak en in het kader van eventuele toekomstige ruimtelijke ingrepen waar nu al plannen voor worden gemaakt en die van invloed kunnen zijn op

Desalniettemin lijkt de diepte van de textuur B2-horizont het patroon van de kalkhoudende loess te bevestigen: op het centrale, vlakkere deel van het plateau bevindt

(c) Multivariate method – Contribution profile of the masses (or ions) whose presence corresponds spatially to the binary specification of the upper hippocampus area.. (a) Gray

Batya Friedman, PhD, is a Professor in the Information School, Adjunct Professor in the Department of Computer Science, and Adjunct Professor in the Department