Automation of identification and transformation of fluorescence images in CLEM data post-processing

(1)

THESIS

submitted in partial fulfillment of the

requirements for the degree of

MASTER OF SCIENCE

in

PHYSICS

specialised in Soft and Biological Matter

Author:

Nick Oikonomeas-Koppasis, MSc

Student ID:

S1996215

Supervisor:

Dr. Sander van Kasteren

2

nd

_corrector:

_{Prof.dr.ir. John van Noort}

EC:

36

(2)

(3)

Automation of identification and transformation of fluorescence images in CLEM

data post-processing.

Nick Oikonomeas-Koppasis, MSc Gorelaeus Laboratories, Leiden University Einstenweg 55, 2333 CC Leiden, The Netherlands

March 18, 2019 Abstract

We introduce an open source, freely available algorithm for automatic correlation of fluorescence and electron microscopy images in Correlative Light-Electron Microscopy (CLEM). With a minimal amount

of manual input, two custom modules perform an automatic identification of the fluorescent nuclei and the watershed nucleus of a TEM image and hence calculate the necessary affine transformation for accurate correlation. The process can be performed on a group of pre-processed images and allows for an

(4)

1.Introduction

In 1796, Edward Jenner introduced the first vaccine, using cowpox to immunize the population against smallpox. This discovery marks the beginning of series of discoveries, continued in the early 20th_{century that}

have almost eradicated diseases such measles, mumps and rubella. The immune system engages in an endless evolutionary race against pathogens where adaptability is the key to survival. The first line of defence is the innate immune system, which is tasked, among other functions, with the phagocytosis of pathogens and the activation of the adaptive immune system, when needed, through a process knows as antigen presentation. Antigen presentation is one of the keys for developing effective vaccines as well as immunotherapy1_.

Microscopy techniques developed in the past decade have opened the possibility for a deeper understanding in both the structural and functional aspect of biological processes2 _{such as degradation of}

pathogens and antigen presentation. There is however a discrepancy between our ability to collect data and handle it. No technique shows more evidence for that than Correlative Light – Electron Microscopy (CLEM), a technique that aims to match the high structural information provided by Electron microscopy with the functional provided by light microscopy3,4_.

When the extremely high resolution of electron microscopy (EM) is complemented with the functional information from light microscopy (LM) it is possible to study the intracellural processes that take place during an infection. With the development of a protocol1_{that allows the acquisition of LM and EM images of the same}

sample, it is possible to acquire and correlate the two image modalities of the same cells and study the immune system's response to different pathogens.

The correlation of the two image modalities requires extensive manual input, in which reference points must be first recognised, hence the LM image needs to be transformed to match and overlayed5,6_{. There are}

available solutions implemented in platforms such as Icy7_{that require, however, similar amount of manual input}

and are difficult to streamline. The difficulties in the correlation of the microscopy images have forced researchers in making the choice between utilizing only a fraction of their data or spend valuable time in manually performing the task.

In addition to the correlation difficulties, often the size of files acquired through electron microscopy can result in slow processing and force a compromise between quality and quantity. The computing power required to approach the problem can be a hurdle for research groups. Our aim was not only to offer an automatic solution to correlation but to achieve such task in a way that it can be performed without high-end equipment. As explained in the method section, handling the electron micrograph in tiles allows the handling of large stitches without the necessity of a large amount of RAM or the need of a processor cluster.

Finally, it remains fundamentally important that all post-processing for correlation avoids biasing of the data. The use of commercial software that aims for aesthetically pleasing results often integrates automatic processes that are not ideal for handling scientific data-sets. By working solely with geometrical transformations

(5)

of the fluorescence images we avoid any biasing and allow the researcher to correctly perform the necessary analysis on the output.

In order to devise a solution to the multiple difficulties of this task, we compartmentalized the problem and tackled each of its components separately. In this way, a series of open source algorithms were created that can be utilized individually as stand-alone scripts, allowing researchers to customize their approach according to their needs.

It is fundamentally important to mention that this is the very early stages of development of our technique. There is a series of improvements and optimization that is required before this algorithm performs to a satisfactory success rate from start to finish. The discussion section contains a series of optimizations that are underway.

2. Sample Preparation

2.1 Mammalian and bacterial cell culturing conditions and infection experiment

D1 dendritic cells were cultured at 37°C 5% CO2 in R1 medium8,9 consisting of IMDM complete

medium (Sigma Aldrich), containing 10% heat-inactivated fetal calf serum (FCS), 2 mM GlutaMAX, 10 mM HEPES pH 7.3, 1 mM pyruvate, penicillin 100 l.U./mL and streptomycin 50 μg/mL, 50 μM 2-Mercaptoethanol (Life Technologies) and 1X non-essential amino acids (Gibco), supplemented with conditioned medium produced by mGM-CSF-transfected NIH/3T3 fibroblasts corresponding to 10 ng/mL of mGM-CSF as final concentration9_.

E. coli B834(DE3) expressing GFP2_{and S. typhimurium SL1344 expressing DsRed}10_{were inoculated from a}

solid culture and grown overnight at 37 °C in Lysogeny Broth (LB) medium, supplemented with 100 µg/mL ampicillin. The following day cultures were diluted to an OD600 of ~0.2 in LB medium and grown at 37 °C till an

OD600 of ~0.8. OD600 values were measured to determine the correct infection volume, assuming that an OD600 of

1 corresponds to 1x109_{bacteria/mL for E. coli or 1.6x10}9_{bacteria/mL for S. typhimurium}5_.

After replacing the R1 medium for minimal cell medium (IMDM, supplemented with 10% FCS), a mixture of E. coli and S. typhimurium was added to D1 dendritic cells at an MOI of 50. Cells and bacteria were co-incubated for 45 minutes at 37°C 5% CO2 and subsequently washed three times with minimal cell medium to remove

non-internalized bacteria. The cells were then incubated in the presence of 1 µM BMV109 cathepsin probe3_{for 1}

(6)

2.2 Preparation of cryo-sections on TEM grids.

Samples were prepared for cryo-sectioning according to the Tokuyasu method as described elsewhere11,12,13_{. Briefly, D1 dendritic cells infected with E. coli and S. typhimurium and treated with BMV109}

cathepsin probe were fixed for 2 hours at room temperature in freshly prepared 2% paraformaldehyde (Electron Microscopy Sciences) and 0.2% glutaraldehyde (Electron Microscopy Sciences) in 0.1 M phosphate buffer pH 7.2. Fixed cells were embedded in 12% gelatin (bloom 250, Dera NV) and cut with a razor blade into 0.5 mm3

cubes. The sample blocks were infiltrated in phosphate buffer containing 2.3 M sucrose for 3 h. Sucrose-infiltrated sample blocks were mounted on aluminum pins and plunged in liquid nitrogen. The frozen samples were stored under liquid nitrogen.

Ultrathin cell sections were obtained according to the Tokuyasu method as described elsewhere13,14_.

Briefly, the frozen sample was mounted in a cryo-ultramicrotome (Leica). The sample was trimmed to yield a squared block with a front face of about 300 x 250 μm (Diatome trimming tool). Using a diamond knife (Diatome) and antistatic devise (Leica) a ribbon of 70 nm thick sections was produced that was retrieved from the cryo-chamber with a droplet of 1.15 M sucrose containing 1% methylcellulose. Obtained sections were transferred to a Ti specimen grid previously coated with formvar and carbon.

Sections were labelled as follows; thawed cryo-sections on an EM grid were left for 30 minutes on the surface of 2% gelatin in phosphate buffer at 37 °C. Subsequently grids were incubated on drops of PBS containing 20 mM glycine. Sections were then labelled with DAPI (2 µg/ml), and additionally washed with PBS and water.

2.3 Microscopy.

Labeled cell sections on grids were transferred to microscopy slides, covered with 5 µL 30% glycerol and sealed with a cover slip and pressure sensitive tape. The resulting slides were imaged by CLEM as described previously14_{. Samples were imaged with a Leica TCS SP8 confocal microscope (63x oil lens, N.A.=1.4). A}

maximum intensity projection of confocal z-stacks was used, as the sample ribbon and underlying formvar film are never in a perfect plane.

3.Automatic Correlation

Correlation of the two image modalities requires points of reference in both images. The nuclei are an ideal structure to serve this purpose as they have a well defined shape in both modalities and allow to easily check if the correlation is correct. The noise for the fluorescent nuclear DAPI stain is negligible and a good

(7)

segmentation of the individual nuclei is possible. In addition, the ordering of nuclei in both modalities have a 1 to 1 mapping, allowing the formation of a map (shown in Figure 1) which serves as a visual guide for correlation, though it plays no role in the automatic process of it.

After segmentation a tiling is performed which allows the correlation of individual cells. Tiles in the LM image are cropped according to the centroid location of each segmented nucleus, the coordinates are then transformed according to the real pixel size ratio of the two modalities. This allows to crop tiles on the EM stitch, using rastering, that are a rough estimate of corresponding nuclei in the LM image. The tiles are then binned in order to create a computationally manageable data set. A watershed mask of the binned EM tiles is extracted manually and automatically compared with all the segmented masks of the nuclei from the LM image in order to find the best possible match. Once this is achieved, a series of transformations are performed on the LM mask based on calculations that quantify rescaling, translational and rotational differences. Finally, all transformation are applied to all the channels of the multicolour LM image and are saved for the final overlay. A more detailed description of each step in the process of automatic correlation is given below, along with a more in-depth explanation of the choices that were made along the way

3.1 Segmentation of Nuclei

In order to achieve automatic recognition and correlation it is important to acquire a map of recognised and segmented nuclei in the DAPI channel. There is a series of scientific software that can achieve this feat. We chose to build a pipeline for segmentation in Cell Profiler15_{since it is one of our ultimate goals to implement our}

algorithms in an open access, easy to use platform. This also opens the possibility for further optimization of the process by the community. Plenty of literature and help is provided online for building a pipeline in Cell Profiler to recognise structures in a fluorescence image but it requires tweaking the parameters according to the acquired data. There is no set recipe that can be applied efficiently to all data and it is therefore strongly recommended that researchers accustom themselves with the software, as a good segmentation is the cornerstone of this approach. The complete pipeline is described in Supplementary Material I and is continuously being optimized to fit the needs of the method that follows. Once the nuclei are segmented, all identified objects are cropped, centered and saved in the form of binary masks. Each mask should ideally represent one nucleus, though it is always subject to data acquisition, pre-processing16,17,18_{and segmentation}19,20,21_{. The same cropping is performed}

on all the channels of the multi-color LM image which will later serve for the correlation process. It becomes increasingly difficult to accurately correlate larger fields, due to the non-linear mapping between the LM and EM images, resulting from sample processing and image acquisition. However, if inter-cellular information is required over larger areas, it should be possible to combine correlated areas afterwards with a small loss of continuity in correlated areas or a loss in the accuracy of correlation in structures away from the centre of transformation.

(8)

(9)

Figure 1 outlines the process after segmentation. A map (a) is generated, presented here as a visual aid rather than for its functionality in our analysis, and each nucleus that was segmented is isolated, cropped and centered (b). The collection of masks obtained from the fluorescent channel is compared with manually generated masks obtained for the TEM image (c) (explained bellow) and, after correct matching, produces the final correlation (d).

For completeness, it has to be noted that areas that appear intensely black in the EM image correspond to folds that occur during the cutting of the samples. Folds that form on the nucleus of a cell contained stained material and will also appear in the LM image, thus being segmented as part of the nucleus. The number and size of folds can be reduced by a more careful cutting of the samples, a process however, which remains relatively difficult and is prone to human error. When watersheding the nuclei in the EM image, this has to be taken into account, and folds that are part of the nucleus should also be watersheded.

3.2 Pre-processing of EM images

Some manual steps need to be taken before the algorithms for automatic correlation can be initiated. In this there are two distinct cases based on the type of data that needs to be analysed. In case only a few cells are imaged, it is required that cells are cropped so that every tile contains a whole cell. For a magnification of 11000X it was found that the optimal case is cropping tiles of 6000X6000 pixels with the nucleus of the cell at the center of the tile. Once the tiles are cropped, they need to be binned eight times across rows and columns creating a final size of 750 by 750 pixels. This is critical to reduce the computational requirements and speed up the process.

The next step requires the acquisition of a mask of the nucleus from the binned image. There are various applets freely available that can assist with this process. Pixel Annotation Tool22_{was chosen as the interface}

allows for quick handling of multiple images. After the mask of the nucleus is saved no more manual input is required.

An approach is presented that will tackle the case of a greater number of cells. To achieve this, it is necessary to locate two cells that can be identified in both the fluorescence and electron micrograph. This is required in order to calculate a shift between the two images. An algorithm was written, that takes as input the location of the two cells and the real size per pixel and uses the location of the segmented nuclei in the LM image to automatically crop, save and bin all cells in the EM image. This tiling process requires rastering23_{of the}

EM image and for this guidelines are offered in Supplementary Material II. Once the image is tiled and binned, the same masking process must be performed, as described above.

(10)

(11)

Flow Chart 1 shows all the steps following image acquisition to achieve automatic correlation. Figure (a) in the flow chart shows the overlay of all recognised segmented nuclei found in a large area of the LM image (one opening of a TEM grid, ~180μm2_{), followed by (b) an example of the mask of an individual tile of the EM}

image, extracted with the Pixel Annotation Tool. (c) The final result of the automatic correlation performed on the same tile.

3.3 Selection

Once all the EM image masks have been acquired the selection process can begin. Having followed the steps mentioned above, the resulting masks acquired from segmentation and manual masking should have comparable sizes. In our case, after rescaling of the LM image and binning of the EM image, the difference was 2%. The small difference in pixel size in the images corresponds to the relative size of the structures (i.e. the nucleus) within the two modalities. This allows a comparison of the characteristics, as described below, eradicating the enormous (4 orders of magnitude) difference in size.

In order to predict the best possible match, the rescaled masks will be compared, looping through all the segmented masks and introducing a series of filters24_{. The segmented masks that manage to pass all the}

introduced filters will then be scored according to their similarity to the masks of the nuclei in the EM image25,26,27_{. The one with the lowest score represents the one that is the most similar tile we are trying to}

correlate. A mathematically more analytical explanation of the filters and the scoring process can be found in Supplementary Material IV.

3.4 Affine Transformations

An affine transformation on a plane is a linear mapping function T that preserves co-linearity and ratios of distances. Given an augmented matrix M, called the transformation matrix, it is possible to remap an image with a specified rotation, re-scaling size, shearing and translation (see Supplementary Material III). For a 2D image, an M32 matrix is required to perform such a transformation. When performing the correlation of

fluorescence to electron micrographs, such a transformation is necessary to correct for the pixel size difference, small variations in angle, and following the tiling approach we have taken, a translational correction. The later, arises from the non-linear mapping between the two modalities, which results EM tiles where the nucleus does not appear in the centre of the image.

The first problem encountered in such correlations is the non-linear mapping between the images. The non-linearity can be attributed to the perspective of imaging and the flexibility of the sample that can change slightly between imaging modalities. It goes beyond the scope of this paper28 _{to discuss this non-linearity in}

depth. However, a linear approximation is allowed if within the relative scale of interest the non-linear effect is negligible. In practice, an linear approximation is possible if it is performed on the size of a cell.

(12)

(13)

Following the selection process, the most similar segmented mask is selected to undergo a series of transformations26_{that are recorded and later performed on the fluorescence channels of the same tile. To}

calculate the required transformations, an algorithm compares the masks25,27_{and measures differences in angles,}

size, area, perimeter and moment of inertia. Currently, an effective way to quantify the required shearing has not been implemented. It is within the future aims of this method, as it will allow to achieve more accurate correlation away from the center of transformation, which is subject to a bigger rotation and thus more prone to error.

We follow the order of rescaling, translation and finally rotation according to the method described in Supplementary Material V. We found that repeating this cycle twice gives the best results as subsequent iterations offer negligible improvement. Figure 2 (above) shows three examples of correlated nuclei. It has to be noted that the correlation accuracy is inversely proportional to the distance from the centre of the nucleus as the linear approximation becomes less accurate.

4. Discussion

The method for performing a semi-automatic correlation described in this report is subject to multiple improvements. It is essential to note that the primary aim of this project was to test if a fully automatic correlation would be possible. We have proven that it is and now have a starting point upon which we can develop, improve, and optimize our approach. Using the tool we have developed we are ready to outline a few of those improvements prior to solving them.

Firstly, the segmentation process can be greatly improved15,29_{if the pipeline is optimized. The difficulty}

faced contrary to other researchers that use segmentation to analyse their data was the necessity to maintain a decent outline of the shape, as otherwise the matching process fails29_{. The preprocessing of the images plays the}

most crucial role in achieving optimal results20,21,30_{. Noise does not play a major role as it is relatively easy to}

filter it out before segmenting30_.

Secondly, the manual watersheding of the EM image is subject to large deviation on each application. The problem of segmentation in electron microscopy is not new and has certainly seen multiple approaches with partial success31_{. It remains the main bottleneck of this analysis. It was attempted to approach the problem with}

various machine learning algorithms, both scientific software and custom, with limited to no success. In this, the size of the files present an issue as they require a cluster for training or large amounts of RAM which were not available. It is noted here that the use of .hdf files is becoming increasingly more common and that will allow for a much smoother handling of large datasets. The solutions offered at the moment are not optimal for performing the tasks required for an automatic correlation.

(14)

specific case of the algorithm there are a series of improvements that could be implemented to achieve optimal results in the future. Currently the loop for matching is iterating through all the possible segmented masks. Though this already gives decent results (50-60% accurate recognition) it is not optimal for widespread use. A logical solution is to utilize a “neighborhood search” technique in which two different processes occur. Firstly, the similarity matching of a nucleus25,26,27_{is cross-checked with the matching factors of its neighboring nuclei,}

reducing the probability of a false match. Secondly, once the first nucleus is recognised correctly, its relative position is recorded and used as an anchor for subsequent matches. With this approach, in large stitches where hundreds of cells are present, a grid is slowly built serving as both a map and a matching table.

The final aim of this project was to introduce an effective way to quantify localization information from the correlated images. In order to efficiently extract data and introduce statistical significance it is required to first achieve a higher success rate which was not yet satisfactory. With an accurate segmentation, it will be possible to collect the characteristic values of the masks25,26,27_{(area, perimeter etc.) and use this data as a training}

set for a neural network to perform the automatic correlation.

The analysis of functional information provided in LM images in conjunction with the ultra-structural information in EM images can offer an explanation for the antigen presentation pathways. In order to achieve that, it is important to observe the pattern of the process. This observation becomes easier with the collection and analysis of large data-sets, in a time efficient manner, to allow researchers to reconstruct the antigen presentation process from a collection of snapshots obtained. The rarity of certain events, such as bacterial division within the cell or fusion of the lysosomes with pathogen-containing phagosomes, creates an even bigger necessity for more data to obtain a full understanding of the process.

5. Conclusion

At the beginning of this project the main aim was to devise a fully automatic CLEM correlation algorithm. The foundation of such an algorithm was layed down, upon which we can now develop a fully functional, user friendly application that will serve to improve post-processing and handling of datasets. We have also opened the door for quantification and an analytical approach of this method. With great caution, we hypothesize that this could be the first step to a fully automatic correlation; one which ideally skips all manual steps and creates a complete workflow, taking the two separate images as input and delivering one correctly correlated image as the output.

Through a cost efficient design with respect to computing power, we have managed to achieve that using a personal computer rather than a cluster, making it available to everyone. We aim to have the final algorithm wrapped in a Cell Profiler module, making it user friendly for those that are not familiar with coding. We do suggest however to not treat the process as a black box but familiarize with the concepts described in the supplementary material as well as, if possible, the source code. We stress that this is still work in progress and

(15)

subject to bugs, errors and development.

Supplementary Material

I. Pipeline for segmentation

Cell Profiler15_{was used to perform segmentation of the nuclei}29_{in the DAPI channel. Before describing}

the pipeline it is noted that this is not the optimal approach and steps have been taken for improvement and have already been met with increased success, therefore this section is outdated. For the shake of completeness, the pipeline used for this dataset will still be described.

A series of pre-processing16,21,30_{steps were employed before segmentation. Initially, the edges of the}

image were cropped to exclude the grid from the analysis as it can affect further steps. Hence a Gaussian blur16_,

was applied, which corresponds to convolving an image with a Gaussian function. This is done to prevent over-segmentation, however, it can result in clumping (segmenting nuclei close to each other as one). The sigma value for the blur is chosen empirically and it depends on the acquired data.

An opening of the image was applied to reduce the clumping and then the IdentifyPrimaryObjects module19,20,21_{was used, provided by Cell Profiler to segment the nuclei in the image. An adaptive threshold}

strategy with tile sizes 1/10 the of the original size was chosen, in which the image is automatically tilled and the threshold of each tile is calculated. Otsu Thresholding30_{was selected, to separate foreground and background}

classes of pixels which in turn is turned into a binary image of segmented nuclei by minimizing the variance within each class. This is not the optimal method, however, it is a general approach when there is limited knowledge of the characteristics of the image. To improve the segmentation, the three-class thresholding was chosen, available only for Otsu Thresholding, in which a third class of intensity is calculated. In this case, the class was added to the foreground (nuclei) class. Finally, to de-clump nuclei that reside close to each other, the shape option was chosen. Based on this pipeline it remained the optimal choice. As mentioned above this method is now outdated and has been heavily optimized for a more accurate segmentation.

II. GDAL and BigTiff handling

One of the biggest difficulties faced handling the data obtained is the size of the stitch of the EM image. At 11.000X magnification a full grid window (~180μm2_{) can be as big as 15GB making it inaccessible for most}

software. The method followed for automatic correlation required the generation of tiles from the original image, at the relative size of a single cell cropped according to a map obtained from the segmentation in the LM image.

(16)

To achieve that two sets of input were required. A csv file generated from Cell Profiller15_{, containing the x and y}

location of the centroids of each segmented nucleus, and the coordinates of a nucleus located in both the EM image and the LM image. The later is required to calculate the shift from the origin. Using the real pixel size a stretch factor is calculated.

The distance from the origin of the located nucleus is recorded, and its coordinates are multiplied with the stretch factor obtained. Hence, all the x and y coordinates obtained from segmentation are multiplied and corrected by subtracting the distance calculated from the located nucleus. The new coordinates are saved in a .txt file, in a form suitable to be read from an automatic tiling algorithm that uses the GDAL library23_.

In order to use the algorithm, the stitch has to be read as a raster dataset. The algorithm reads the coordinates of the .txt file and automatically crops tiles centered at the given coordinates with size 6000X6000 pixels as well as, a version of each tile binned 8X8 times. From the later, the binary masks required for matching are extracted, in a computationally cheap way. The module used is the GDAL:gdal_translate which allows resampling and rescaling.

This treatment stems from necessity and not convenience. One of the main improvements set by this project is to omit the need for a separate script that performs this operation. It is possible that with the extensive use of .hdf5 files this will become possible in the near future. At the time this report is being written, all data was obtained in .tiff format and this was the most efficient approach found.

III. Affine Transformations

As described in the main text an affine transformation is a linear mapping that preserves co-linearity. What follows is an extensive explanation of the components of such transformation as well as a mathematical description.

The available transformations include translation along the x and y axis, rotation around a center, scaling and shearing28_{. An image is represented as a matrix of intensities, therefore, any transformation performed on the}

matrix shifts the intensities accordingly. The transformation is of the form: x '=M₃₂x (III.1)

where x' is the transformed vector map, x is the original vector map and M32 is the augmented transformation

matrix.

The matrix is of the form:

M

₃₂

=

[

a b t

x

(17)

where a, b, c and d contain the scaling, rotating and shearing information and tx and ty are the x and y translation.

Each of the transformations can be separately represented by individual matrices:

Scale=

[

S

x

0

0 S

_y

0

]

, Rotation=

[

cosθ −sinθ 0

sinθ

cosθ

0 ]

, Shear=

[

1 h

_x

0

(18)

where Sx and Sy represent a scaling factor, θ is the counterclockwise angle of rotation and hx and hy are the

shearing factors28_.

The augmented matrix contains 6 elements and therefore requires 3 sets of corresponding x and y coordinates in order to be calculated. Softaware, such as the EC-CLEM4_{plug-in for ICY require the manual}

input of three corresponding points in the two image modalities to proceed with correlation. The method of this paper used the characteristics of the masks in order to calculated individual components of the transformation and perform them in a serial manner. Any transformation performed corresponds to a matrix multiplication of the transformation matrix and the image matrix and can be saved and reapplied to all fluorescent channels.

IV. Scoring and matching

Following segmentation of the nuclei in both images, the next step required the automatic selection and matching of the EM image nucleus mask to the fluorescence one. To achieve that a series of metrics are used to compare characteristics of each segmented nucleus with the watersheded mask22_{of a nucleus in the EM image.}

Two main conditions have to be satisfied for the process to be considered successful, namely, correct matching and broad applicability. In order to satisfy those two conditions, a series of filters were applied, that excluded masks that exceeded certain limits (30% deviation of characteristics on the upper or lower bound from the EM mask)24,25,26_{. Currently, without any optimization of the process, a 40% correct matching was achieved.}

The following characteristics were used to filter and then calculate a matching factor between masks: 1. Area

2. Perimeter

3. Bounding rectangle angle 4. Bounding ellipse eccentricity

5. Pixel distribution (according to distance) 6. Bounding ellipse major axis

7. Bounding ellipse minor axis 8. Moment of inertia

The order in which the filters were applied plays a role in the computational efficiency of the algorithm, i.e. an order that quickly filters out the majority of the nuclei will result in much faster matching as it will exclude it much earlier in the calculation and proceed to the next. Filters are only applied for the area, perimeter, bounding rectangle angle, eccentricity and pixel distribution according to distance. The matching factor is

(19)

calculated by taking the absolute difference of each characteristic and dividing by the characteristic itself to reduce bias as shown in IV.1 (subscript f denotes LM characteristic and e denotes EM characteristic):

Matching factor =

∣

(Characteristic

f

−Characteristic

e

)

∣

Characteristic

e

×...

(IV.1)

where all eight characteristics are multiplied to yield the final matching factor. Hence, all calculated factors are collected and ordered in ascending order with the lowest representing the closest match.

All characteristics except the pixel distribution are measured using the measure.regionprops25,26,27_module

of the skimage python library. A contour search is performed, in the binary mask of the EM image which allows the collection of all the above properties. Currently, this is done on individual masked nuclei but can be performed in groups that will offer a more accurate analysis. A brief explanation is offered here that serves mostly as a gateway to justify the choices that were made and the order in which they were applied.

The area characteristic refers to the number of pixels that are non-zero in the masks. Previous scripts have ensured that all segmented masks are of similar size in order for the comparison to be accurate. A 30% margin is allowed that if it exceeded the loop breaks. This filters out almost half the masks and is chosen as the first filter because it is computationally cheap. Similarly the perimeter of the contour is used as a next metric for filtering. Both this characteristics are straight forward and do not require further explanation.

The bounding rectangle angle is calculated by fitting a minimum area rectangle that contains the whole contour and calculating its angle with the x-axis. Similarly, the bounding ellipse eccentricity fits the minimum area ellipse around the contour and calculates its eccentricity. The combination of these two characteristics ensures that we choose nuclei with the correct ratio of x and y axis length at the correct orientation. A 30% margin is allowed on both filters. Only a fraction of masks pass through this filter (in the order of 10's), and test that are more computationally expensive can be now used.

The pixel distribution according to distance was hard coded for the purpose of selection. The characteristic calculates the vector from the origin (top left corner of the image) of each pixel that belongs to the mask and compares the distribution. First, the centroid of the contour is calculated and the LM image mask is translated, so that the centres of both masks coincide. The sum of all distances is calculated and used, not as a filter but, as a contributing component to the final selection factor.

Finally all nuclei that have successfully pass all the filters applied are scored. The scoring factor calculated by IV.1 contains all the filtering characteristics and in addition the minor and major axis of the bounding ellipse and the moment of inertia. All the values are normalized in order to carry the same weight in consideration of the similarity of the masks. The moment of inertia is calculated with the assumption that all non-zero pixels have the same weight according to:

(20)

I =

_∑

i

m r

2 (IV.2)

where I is the moment of inertia that equals the sum of individual contribution of elements (pixels) i multiplied by the squared distance from the centroid of the contour. The moment of inertia grows as the square of the distance, making it more sensitive to edge contribution and as such making it an ideal metric.

V. Transformation Loop

As mentioned in the main text, once correct selection was achieved, transformation of the masks to perform correlation can initiate. The series followed was rescaling, translation and rotation based on the comparison of the two masks. All transformations were individually collected and applied to the multicolour LM image.

The order of the process arose naturally from the method used to calculate the required transformation. The mask is first rescaled to the original size of the tiles (6000X6000 pixels). The centroid of the contours were calculated in both modalities, and a translation was performed on the LM mask according to the difference. At this point, if overlayed, the two masks should have the same centre Hence, the bounding rectangle angle is calculated and the LM mask is rotated according to the difference. This marked the end of the first loop of transformations.

The loop is repeated, with a slight difference in the first two steps. A new rescale factor is calculated based on the area of the mask rather than the pixel size of the image. This results in different size images (depending on the scaling factor), and to ensure that subsequent transformations are done correctly it is required to pad or crop the mask of the LM image. A padding of the image is applied first, hence, a cropping to the correct size starting from the top left corner. Finally, the last two steps of the transformation loop are repeated.

All the transformations are collected and looped over all the individual channels of the multicolour LM image. The transformed channels are saved individually in .tiff format. Though automatic overlay is possible in the end, the results are not optimal and commercial software was used to produce aesthetically pleasing results. However, no further transformation was required and it was merely a matter of overlaying the two images with a lightened background.

The correctly selected nuclei were transformed accurately with a frequency of 80% (20/25). The lack of shearing created small discrepancies especially on large cells as the further away we moved from the centroid the less accurate the correlation (non-linearity problem). Further development of this algorithm aims to correct for that.

(21)

REFERENCES

[1] Chen, G., Bodogai, M., Tamehiro, N., Shen, C. & Dou, J. Editorial Cancer Immunotherapy : Theory and Application. J. Immunol. Res. 3–5 (2018).

[2] Ando, T., Bhamidimarri, S. P. & Brending, N. The 2018 correlative microscopy techniques roadmap. J. Phys. D Appl. Phys 51, (2018).

[3]. De Boer, P., Hoogenboom, J. P. & Giepmans, B. N. G. Correlated light and electron microscopy: Ultrastructure lights up! Nat. Methods 12, 503–513 (2015).

[4] Howes, S. C., Koning, R. I. & Koster, A. J. Correlative microscopy for structural microbiology. Curr. Opin. Microbiol. 43, 132–138 (2018).

[5] Anderson, K. L., Page, C., Swift, M. F., Hanein, D. & Volkmann, N. Marker-free method for accurate alignment between correlated light, cryo-light, and electron cryo-microscopy data using sample support features. J. Struct. Biol. 201, 46–51 (2018).

[6] Sosinsky, G. E., Giepmans, B. N. G., Deerinck, T. J., Gaietta, G. M. & Ellisman, M. H. Markers for Correlated Light and Electron Microscopy. Methods Cell Biol. 79, 575–591 (2007).

[7] Paul-Gilloteaux, P. et al. eC-CLEM: flexible multidimensional registration software for correlative microscopies. Nat. Methods 14, 102 (2017).

[8] Fuertes Marraco, S. A.; Grosjean, F.; Duval, A.; Rosa, M.; Lavanchy, C.; Ashok, D.; Haller, S.; Otten, L. A.; Steiner, Q. G.; Descombes, P.; et al. Novel Murine Dendritic Cell Lines: A Powerful Auxiliary Tool for Dendritic Cell Research. Front. Immunol. 2012, 3 (NOV), 1–25. https://doi.org/10.3389/fimmu.2012.00331.

[9] Mortellaro, A.; Urbano, M.; Citterio, S.; Foti, M.; Granucci, F.; Ricciardi-castagnoli, P. Generation of Murine Growth Factor-Dependent Long-Term Dendritic Cell Lines to Investigate Host–Parasite Interactions. In

Macrophages and Dendritic Cells; Reiner, N. E., Ed.; Humana Press, 2009; Vol. 531, pp 17–27. https://doi.org/10.1007/978-1-59745-396-7.

[10] Kuijl, C.; Pilli, M.; Alahari, S. K.; Janssen, H.; Khoo, P.-S.; Ervin, K. E.; Calero, M.; Jonnalagadda, S.; Scheller, R. H.; Neefjes, J.; et al. Rac and Rab GTPases Dual Effector Nischarin Regulates Vesicle Maturation to Facilitate Survival of Intracellular Bacteria. EMBO J. 2013, 32 (5), 713–727.

https://doi.org/10.1038/emboj.2013.10.

[11] Tokuyasu, K. T. A Technique for Ultracryotomy of Cell Suspensions and Tissues. J. Cell Biol. 1973, 57 (2), 551–565. https://doi.org/10.1083/jcb.57.2.551.

[12] van Elsland, D. M.; Pujals, S.; Bakkum, T.; Bos, E.; Oikonomeas-Koppasis, N.; Berlin, I.; Neefjes, J.; Meijer, A. H.; Koster, A. J.; Albertazzi, L.; et al. Ultrastructural Imaging of Salmonella–Host Interactions Using Super-Resolution Correlative Light-Electron Microscopy of Bioorthogonal Pathogens. ChemBioChem 2018, 19 [13] Möbius, W. & Posthuma, G. Sugar and ice: Immunoelectron microscopy using cryosections according to the Tokuyasu method. Tissue Cell (2018). doi:10.1016/J.TICE.2018.08.010

[14] van Elsland, D. M.; Bos, E.; de Boer, W.; Overkleeft, H. S.; Koster, A. J.; van Kasteren, S. I. Detection of Bioorthogonal Groups by Correlative Light and Electron Microscopy Allows Imaging of Degraded Bacteria in Phagocytes. Chem. Sci. 2016, 7 (1), 752–758. https://doi.org/10.1039/C5SC02905H.

(22)

biological image analysis. Biotechniques 42, 71–75 (2007).

[16] Gedraite, E. S. & Hadad, M. Investigation on the Effect of a Gaussian Blur in Image Filtering and Segmentation. 14–16 (2011).

[17] Chen, S. Chaotic spread spectrum watermarking for remote sensing images. J. Electron. Imaging 13, 220 (2004).

[18] Kapur JN, Sahoo PK, Wong AKC (1985) "A new method of gray level picture thresholding using the entropy of the histogram." Computer Vision, Graphics and Image Processing, 29, 273-285.

[19] Ridler, T. W. & Calvard, S. Picture Thresholding Using an Iterative Selection Method. IEEE Transacations Syst. Man, Cybern. SMC-8, 630–632 (1978).

[20] Padmanabhan, K., Eddy, W. F. & Crowley, J. C. A novel algorithm for optimal image thresholding of biological data. J. Neurosci. Methods 193, 380–384 (2010).

[21] Sezgin M, Sankur B (2004) "Survey over image thresholding techniques and quantitative performance evaluation." Journal of Electronic Imaging, 13(1), 146-165

[22] A. Breheret, Pixel Annotation Tool, https://github.com/abreheret/PixelAnnotationTool, 2017

[23] GDAL/ORG contributors (2018). GDAL/ORG Geospatial Data Abstaction software Library. Open Source Geospatial Foundation. URL http://gdal.org

[24] Lorensen, W. E. & Cline, H. E. Marching Cubes: A High Resolution 3D Surface Construction Algorithm. SIGGRAPH Comput. Graph. 21, 163–169 (1987).

[25] Wilhelm Burger, Mark Burge. Principles of Digital Image Processing: Core Algorithms. Springer-Verlag, London, 2009.

[26] B. Jähne. Digital Image Processing. Springer-Verlag, Berlin-Heidelberg, 6. edition, 2005.

[27] T. H. Reiss. Recognizing Planar Objects Using Invariant Image Features, from Lecture notes in computer science, p. 676. Springer, Berlin, 1993.

[28] Niemann, H. Linear and nonlinear mapping of patterns. Pattern Recognit. 12, 83–87 (1980).

[29] Du, X. & Dua, S. Segmentation of fluorescence microscopy cell images using unsupervised mining. Open Med. Inform. J. 4, 41–9 (2010).

[30] D. Liu and J. Yu, "Otsu Method and K-means," 2009 Ninth International Conference on Hybrid Intelligent Systems, Shenyang, 2009, pp. 344-349.

[31] Nagata, N. et al. Semi-automatic organelle detection on transmission electron microscopic images. Sci. Rep. 5, 1–9 (2015).

Aknowledgements

To Thomas Bakkum, MSc, for his immeasurable help throughout this project, providing his extensive assistance with all sample preparation, microscopy image acquisition and writing of this report.

To Laura Quarto, for assisting with the creation of the graphs for this report. To the BioSyn Group for the opportunity.