Super-Resolution Methods for Image Analysis in Forensic Science

(1)

Super-Resolution Methods for Image

Analysis in Forensic Science

Name student: Floor Dussel

Student ID: 12152404

Supervisor: Arjan Mieremet

Examiner: Zeno Geradts

Master Forensic Science

Number of words: 8000

Date: 13-01-2020

(2)

Content

Abstract ... 3 1. Introduction ... 3 1.1. Background ... 3 1.2. SISR vs. MISR ... 4 1.3. Neural networks ... 5

1.4. Super-Resolution in Forensic Science ... 6

2. Results ... 8

3. Risks for use in Forensic Science... 11

3.1. Validation of methods ... 11

3.2. Learning-based neural networks... 11

3.3. Bias ... 11

4. Possibilities for use in Forensic Science ... 13

4.1. Validation of methods ... 13

4.2. Tactical information or evidence in court ... 13

4.3. Clusters ... 14

4.4. Methods specific to applications ... 14

5. Discussion ... 16

6. Conclusion ... 18

7. References ... 19

Appendices ... 22

Appendix I – Search strategy ... 22

(3)

Abstract

Information obtained from digital images or videos can be very valuable in the field of forensic science. Most of the time, however, the images or videos retrieved in a case will be of low resolution. To obtain a high-resolution version of these low-resolution images, super-resolution (SR) techniques can be used. When only one image is used as input for such a technique, this is called Single-Image Super-Resolution (SISR). With more than one image (or a sequence of video frames) as input, it is referred to as Multiple-Image Super-Resolution (MISR). These methods can be applied to an image manually or by the use of a neural network. In this literature review, the possibilities and the risks for the use of SR methods in the field of forensic science are evaluated. An overview of several SR methods developed in the past five years is given and these methods are assigned to a cluster based on their use of SISR or MISR and the presence of a neural network. It is concluded that SR methods should only be applied in the tactical stage of an investigation (to identify suspects) and if the image would be used as evidence in court, that the original image should be presented. When a SR method is used in the tactical stage and multiple images are available, the use of MISR without a neural network is advised. This is the case, because the least assumptions are made in MISR and the conversion can be checked and controlled when it is done manually, as opposed to a conversion made by a trained network. More research is necessary before SR techniques could be used in forensic science; the SR methods analysed in this review are not ready yet for use in forensic science.

1. Introduction

1.1. Background

In the field of forensic science, information obtained from digital images or videos can prove very valuable. These images or videos can, for example, be obtained from surveillance footage or footage shot by bystanders. This footage can be used to gain tactical information, but it can also be used as evidence in case work. For example, the use of surveillance camera footage at the time of an incident can be used by the police in the process to identify a suspect but might also serve as evidence in court after the investigation is completed (Kamenicky et

al., 2016)(Villena et al., 2018).

However, the digital images retrieved in a case will more often than not be of low resolution. The resolution of a digital image can be classified in different ways (Yang & Huang, 2010). When resolution is mentioned in this review, there will be referred to spatial resolution. Spatial resolution describes the density of the pixels in an image and is measured in pixels per unit area (Yang & Huang, 2010). A low-resolution image can be the result of physical limitations of the camera or can result from processing after the image is recorded. The physical limitations of the camera are for example limitations in resolution resulting from limitations of the imaging sensors and limitations of image details resulting from limitations in optics (due to lens blurs, lens aberration effects, aperture diffractions and optical blurring due to motion). Trying to minimize these limitations increases the costs of the cameras. For surveillance cameras, next to these limitations, other limitations are the speed of the camera and hardware storage (Yang & Huang, 2010). Another reason why an image can be of low resolution is post-processing. This can be done, for example, by compression to save storage space. For smartphones and tablets, this can make up a larger part of the problem than the physical limitations (Verolme & Mieremet, 2017).

To obtain more information from such low resolution (LR) images, a conversion to high resolution (HR) images is desirable. For the purpose of obtaining a HR image from such a LR image, super-resolution (SR) techniques have been developed. These can be applied for both

(4)

post-processing. There have been several publications about new SR methods, all using different approaches to try to solve the SR problem, most of which have been developed in the field of computer vision and image processing (Wang, Chen, & Hoi, 2019). SR can be used for various applications, among which the enhancement of quality for HDTV, surveillance video, medical images, satellite images or web-based images (Chikate, Gangamwar, Jawade, Jogawe, & Gujar, 2016; Shi et al., 2016; Yang et al., 2017).

For most of these applications, the aim is to obtain an optically more pleasing version of the original (LR) image. Therefore, the quality of the obtained HR image is in some cases assessed by human assessors. When the aim is to have a perceptually high-quality image, these human assessors can give a more reliable assessment than a quantitative method (Wang et al., 2019). In some articles, however, the Peak Signal-to-Noise Ratio (PSNR) is calculated for the proposed method. This is a quantitative measurement of the reconstruction quality, comparing the obtained HR image with the ground truth image (sometimes also referred to as the ‘real’ HR image). The PSNR is defined using the maximum possible pixel value and the mean square error (MSE) (Wang et al., 2019).

When viewing this distinction between different methods of quality assessment, it is clear that different methods can and sometimes should be used for different goals. This also can be said to apply to images and videos retrieved from different sources. If an image is retrieved from a security camera it will have had different post-processing steps than if it is retrieved from a smartphone, for example. Therefore, it seems logical that a HR image closest to the ground truth of the image from the security camera might be obtained with a different SR method than a HR image that is closest to the ground truth of an image from the smartphone.

A few distinctions between SR methods will be reviewed in the next sections, to form an advice for the possible application of SR in the field of forensic science.

1.2. SISR vs. MISR

Two groups of SR techniques can be distinguished. One uses one LR image to predict the HR image and is called Single-Image Super-Resolution (SISR). The other uses multiple LR images from the same scene to compose the HR image and is referred to as Multiple-Image Super-Resolution (MISR).

SISR is performed using, for example, interpolation or machine learning techniques such as a neural network (Villena et al., 2018). When multiple images of the same scene are available, MISR can be used. This can be the case when multiple cameras from almost the same point of view have recorded the same scene, or when a video is available. When different frames of a video sequence are obtained MISR can also be performed, since these frames can contain many small shifted or rotated LR images of a given object (Villena et al., 2018). There is potential to retrieve new information from each of these LR images, but only when the images did not shift by integer units. If they shifted by integer units, the same information is present and thus this information cannot be used for the construction of a HR image (Park, Park, & Kang, 2003). Park et al. (2003) made a very nice figure that depicts a schematic overview of the MISR method and the use of subpixel shifts, which is shown in figure 1 for clarification purposes. Sub-pixel shifts between the different images of a scene can be used to obtain more information from the LR images. In sub-pixel shifts, the motion between the images is used to form the HR image. This can be the motion within the scene or the motion from the camera relative to the scene (Chikate et al., 2016; Yang & Huang, 2010).

(5)

Figure 1: ‘1. Basic premise for super resolution.’ from Park et al. (2003). In this figure the different possible inputs for MISR can be seen. Also, it is visualised how the use of sub-pixel shifts can make it possible to retrieve more information for the HR image and why integer shifts do not give this possibility. SISR can be preferred over MISR, because of its higher efficiency (Yang et al., 2019). A HR image made using MISR is based on more data, which means less data is ‘guessed’ or estimated from the input images than when SISR is performed. In SISR, the lost frequency components cannot be recovered, but this can be (partly) done using MISR (Yang & Huang, 2010). Therefore, this could be a reason to prefer MISR over SISR.

1.3. Neural networks

There are multiple methods used to obtain a HR image from one or more LR images, amongst which up-sampling methods such as nearest-neighbour interpolation, bilinear interpolation, bicubic interpolation and learning-based up-sampling methods such as transposed convolution layer and sub-pixel layer. Applying these methods can be done manually, but the conversion can also be made by a network that is trained for the conversion of images from LR to HR. A learning-based neural network is developed using a training set of LR images and their known HR versions. These LR-HR image pairs have been made for training and testing purposes, by decreasing the resolution of an image to obtain a LR version. The original image is then considered the HR image of the pair, also referred to as the ground truth. With the use of these image pairs, the network is then trained using a loss-function. First, it makes a HR image from the input LR image. Then, the loss-function is used to evaluate the performance of the network by comparing the HR image the network produced with the HR image from the test set (ground truth). The network is then optimized by adapting it’s functioning to minimize the loss-function (Wang et al., 2019).

Even though there are many types of these learning-based methods in the field of image processing, they are not yet applied in the forensic field. In this review, a distinction is made between no networks and learning-based neural networks in the methods reviewed. It was observed when reviewing the literature, that the methods that were developed for super-resolution could be put in either of these two groups. Also, it was decided that learning-based neural networks would be one of the criteria for comparison, since these are of special concern for forensic science. It has to be kept in mind that it obviously depends on the specific method and application, but there are risks in using neural networks and especially the ones that are trained by a certain database (see section 3.2).

(6)

1.4. Super-Resolution in Forensic Science

With the phenomena of TV shows such as CSI, the expectations of the court and public about capabilities of current image processing techniques in forensic science have risen. In these shows, an image is quickly run through a program, which then improves the image quality such that the digits on for example a number plate can suddenly be read perfectly. Unfortunately, this is not how it works in real forensic work. There are limitations to improving the image quality and caution has to be taken not to alter the information in an image. In (potential) criminal cases, the resolution of the images that are recovered from a camera is often low because of either physical limitations or because of post-processing, as explained in section 1.1. Therefore, the use of SR techniques might help to extract more information from these LR images. Even though a lot of research is being done on SR methods in the field of image processing (Wang et al., 2019; Yang et al., 2019), SR methods are not yet applied to real casework in forensic science. This might not happen soon, or at all, since the information in an image is altered by applying a SR method. It is important to realise, that in the field of forensic science, accuracy is of the highest importance. When an image is said to be more visually satisfying after processing, the image without the processing step can be preferred in a forensic setting, since it is the more accurate representation of the information available (Verolme & Mieremet, 2017). If the choice is made to perform a technique, such as SR, to enhance the resolution of the image, one with the least assumptions should be chosen. As explained above, images can be used for multiple goals in the forensic context. Given the example where a crime is committed, and the perpetrator is caught on a surveillance camera, there are multiple steps in the investigation where this footage can be used. First, this footage will play a role in trying to find the perpetrator by using the information from one or multiple frames from the video to search for a suspect. This is what we call tactical information. Later, the images might be used as evidence against a suspect. For example, to connect a person to the crime or the crime scene using CCTV footage.

There have been few articles published regarding SR in forensic science and none have yet reported to have implemented SR for use in regular casework in forensic science. In general, it is advised to use MISR for surveillance footage. It can, for example, be used to enhance more details in a view of a scene, a car or for the use of enhancing facial features. A few authors have proposed their own method for SR in forensic science (Ghazali, Zamani, Abdullah, & Jameson, 2012; Shao, Chao, Luo, & Lin, 2017; Villena et al., 2018). There is also advised on the possibility to use SR in forensic science, with the warning for the possibility of the appearance of new details that cannot be explained by the original image (Verolme & Mieremet, 2017).

As opposed to image enhancement, with the aim of obtaining a more optically pleasing image for HDTV, it is quite hard to validate a method for the aim of using it in forensic science. This is the case, since it is possible to compare an image or video for HDTV with the result and to check whether it is the result that is desired. This cannot be done in forensics. Firstly, because it is obvious that an investigator can never be allowed to have a ‘desired’ outcome. If that were the case, it would mean an investigator is biased and very subjective and that a suspect can never have a fair trial. Since every human being has the right to a fair trial, which is stated in article 6 of the European Convention on Human Rights (ECHR, 2019), this should never be allowed to happen. Secondly, in forensic science it is much harder to validate a method and measure the error rates of the method, since in casework the ground truth is not known and therefore the error rate cannot be measured. The error rate of a method should be approximated using fake case examples made for this testing, but it might be difficult to make these test examples and to assess their resemblance to real cases.

(7)

This review aims to give an advice whether it would be beneficial to forensic science to implement a SR method and, if so, what kind and at which part in the process of an investigation. This brings us to the following research question: What Super-Resolution methods have been developed in the past five years and would it be beneficial to use these techniques in image analysis in Forensic Science?

In this literature review, several super-resolution techniques developed in the past five years will be compared and evaluated based on a set of criteria, with the aim to advice on their possible use for the forensic field. To do this, the methods that will be compared will be divided into a certain cluster. On which criteria there is distinguished between these clusters, which methods were assigned to which cluster and an overview of the results can be found in section 2. Several risks of implementing super-resolution techniques and the possibilities for doing so in forensic science are stated in sections 3 and 4, respectively. Finally, a discussion and conclusion are given on if and how to implement super-resolution techniques in the field of forensic science (see sections 5 and 6).

(8)

2. Results

To be able to give a recommendation for the field of forensic science, each reviewed method was assigned to a cluster. The methods are divided based on their characteristics, such as SISR/MISR and the method being performed manually or using a neural network.

The following clusters were formed:

- Cluster 1: SISR, Neural network (learning-based) - Cluster 2: MISR, Neural network (learning-based) - Cluster 3: SISR, no neural network

- Cluster 4: MISR, no neural network

Articles in which the authors describe their new SR method were obtained by searching online and by reading review articles (see Appendix I). The most relevant articles were selected. This led to a total of 21 methods, which were analysed and assigned to one of the four clusters (Table 1).

Table 1. Overview of different SR methods in literature of the past 5 years. The methods are described based on their application to video or still images, the input in the form of multiple images (MISR) or a single image (SISR), their use of a neural network, and additional information. Each method is assigned to a cluster (1-4). The table is ordered chronologically on year and within each year alphabetically on the last name of the first author.

Year Authors Video sequence or still images

MISR/ SISR

Neural network

Additional information Cluster

2016 Chikate et

al.

Image sequence from MRI scan

MISR Yes* Dictionary-based learning. Performed on biomedical images

2

2016 Goklani,

Shravya, & Jignesh

Still images SISR No Dictionary trained online, using only information from input image. Trained specifically for each LR image. No training time and memory space necessary as is needed for a learned dictionary.

3

2016 Kamenicky

et al.

Video sequence MISR No Software for forensic analysis of image and video analysis using various algorithms, amongst which SR.

4

2016 Kim, Lee,

& Lee, (2016a)

Still images SISR Yes Very deep networks. Residual-learning and extremely high learning rates used to optimize a very deep network fast.

1

2016 Kim, Lee,

& Lee, (2016b)

Still images SISR Yes Deeply-recursive convolutional network.

1

2016 Liu et al. Still images SISR Yes Deep network. A combination of sparse coding and a deep network.

1

2016 Shi et al. Video sequence and still images

SISR Yes Convolutional neural network. Sub-pixel convolution layer to learn the upscaling operation for image and video super-resolution.

1

2017 Caballero

et al.

Video sequence MISR Yes Spatio-temporal networks. Real-time video SR. Motion

compensation.

2

2017 Cruz,

Mehta, Katkovnik,

Still images SISR No Self-similarity based approach. Collaborative filtering of patch groups in 1D

(9)

&

Egiazarian

similarity domain, coupled with an iterative back-projection

framework.

2017 Han,

Zhao, & Wang

Still images SISR Yes* Dictionary learning is used. Specifically developed for noisy images. HR patches constructed via distance penalty weight model.

1

2017 Ledig et

al.

Still images SISR Yes A generative adversarial network. 1

2017 Lim, Son,

Kim, Nah, & Lee

Still images SISR Yes Enhanced deep SR network. Residual scaling techniques are employed to train large models. A single- and multi-scale model have been developed. The multi-scale further reduces model size and training time.

1

2017 Romano,

Isidoro, & Milanfar

Still images SISR Yes Learning-based framework; pre-learned filters. Sharpener applied to HR images used for learning. Emphasis on speed of method.

1

2017 Shao et al. Video sequence MISR No Manual feature registration, projection onto convex set. Application to surveillance footage.

4

2017 Tai, Yang,

& Liu

Still images SISR Yes Deep recursive residual network. Very deep convolutional neural network model

that strives for deep yet concise networks.

1

2017 Yang et

al.

Still images SISR Yes Deep edge guided recurrent residual network. 1 2018 Jingxuan, Jian, Yonghui, & Rong

Still images SISR Yes Convolutional neural network. Residual structure is added to the network, so the entire network can converge better.

1

2018 Villena et

al.

Video sequence MISR No Variational Bayesian approach with super-

Gaussian priors. Tested on synthetic video sequences and a real case.

4

2018 Yang,

Xue, & Wang

Still images SISR Yes Learning-based algorithm. Algorithm is based on morphological component analysis and applies dictionary learning 1 2018 Zhang, Tian, Kong, Zhong, & Fu

Still images SISR Yes Residual dense network. In residual dense blocks, the dense connections between layers allows for full usage of local layers.

1

2019 Xu, Ma, &

Sun

Still images SISR Yes Dual convolutional neural

network. LR from HR by applying blurring, down sampling, Bayer sampling and noise.

1

*This method was developed using a learning-based dictionary instead of a learning-based neural network. Since the same risks and possibilities for forensic science were considered to apply for both learning-based methods, this method is here

(10)

The number of methods assigned to each cluster was counted and a new table was created, which can be found below (see Table 2). For clarification purposes a graph was made from these results, which can be found in Appendix II.

Table 2. Overview of number of methods in each cluster. From the second till the fifth column, cluster 1 (SISR with neural network (NN)), cluster 2 (MISR with NN), cluster 3 (SISR without NN) and cluster 4 (MISR without NN) are shown. In the last column, the total number of methods reviewed from each year can be found. In the last row, the total number of methods assigned to each cluster is shown.

Year Cluster 1 (SISR + NN) Cluster 2 (MISR + NN) Cluster 3 (SISR) Cluster 4 (MISR) Clusters 1-4 (total) 2016 4 1 1 1 7 2017 6 1 1 1 9 2018 3 - - 1 4 2019 1 - - - 1 2016-2019 (total) 14 2 2 3 21

Before the literature was reviewed, one of the expectations was that MISR would be on the rise over the years. This was expected, since the use of the sub-pixel shifts sounded like a very useful development to enhance details. MISR is said to lead to a more reliable result because it enhances details that originate from different frames instead of ones that are estimated from only one image. It was expected that reliability would be important and therefore MISR would be prominently used. When multiple methods were analysed, however, the opposite was seen. This higher presence of SISR in the table can be explained by the fact that these methods can be applied faster than MISR, since less data is processed and therefore less computing power is needed. This is more important in applications like HDTV, since it is the demand to do these conversions fast and sometimes even in real-time.

It was observed that SISR often goes hand in hand with neural networks. This can be seen by the fact that fourteen methods are assigned to cluster 1 and only two are assigned to cluster 3 (see Table 2 and Appendix II). When considered, it seems logical that SISR and neural networks mostly go together. Since there is a limited amount of information to work with, predications must be made. A learning-based model is then a good choice, since it can build a model based on test images that the developers deem similar to the real dataset. This database then gives a basis for the predictions to be made.

Another expectation was that SR methods would be more suitable for use in forensic science. However, when reviewing the articles above, it was seen that many assumptions were made in all of them and that most of them are developed with a very specific goal. Therefore, the emphasis in development was often on speed instead of on accuracy, where we would want it to be in forensic science. There are three methods placed in cluster 4, all of which are applied to surveillance footage for possible use in forensic science. This makes sense, since the combination of the use of MISR and no neural networks is in this review also considered to be the most reliable. Therefore, if a SR method would be applied in forensic science, one that would fit in this cluster would be advised. However, when reviewing these methods developed for forensic science, some points of discussion were found (see section 5). When a method will be used, it would be best to develop one that resembles the ones from cluster 3 but developed by the investigators’ own lab, so the forensic scientists know exactly what assumptions there are made and can keep this in mind when reporting on the case.

(11)

3. Risks for use in Forensic Science

3.1. Validation of methods

The validation of the methods by the authors of the articles has been done by either quantitative methods such as PSNR, or by visual comparisons. For forensic science, the most objective analysis should be used. Unfortunately, it is not possible to compare the PSNR values of different methods in this review, since not all the methods were performed on the same set of images. However, it is mentioned here that there are PSNR values given, since it is a quantitative way to assess the quality of the transformation to HR images. Also, they can be used to compare methods when the same images are used to test the methods. This would be advisable to do with known sets of test images when several methods are tested for use in forensic science, preferably with as many different images as possible to try to validate the methods. The risk about comparing PSNR values when they are not measured on the same dataset is to make a judgement about the quality of the conversion to HR image and thus about the method, while it might give a different result for the images it will be performed on.

3.2. Learning-based neural networks

In the learning-based approach of a network, there is relied heavily on the similarity between the training set and the test set that were used to develop the network. No set number of images that can be given as a threshold when deciding whether the number of images used for training purposes was sufficient to make sure the method is valid for generic images (Sun & Shum, 2015). This makes it hard to validate these methods for application in forensic science.

In every SR method, a dataset of images is used to test the performance of the method. In learning-based neural networks, there is also a dataset used in the training phase of the network. In this training dataset, LR images are made from HR images. In this way, for each image in the set an image pair consisting of the ‘real’ HR image (the ground truth) and the LR image is thus obtained. The method can be performed on the LR image to produce a HR image, after which its quality of performance can be assessed by comparing this performance with the ground truth. To assess the quality of your method, this is a completely logical approach. There is a need for a ‘real’ HR image to compare the result of the processing of the LR image to. However, the LR image is generated by a process known to the developers of the method and these filters are not always realistic. Sometimes, the dataset is even adjusted to make sure the images labelled to be HR (these are the ‘real’ HR images; this can also be said to be subjective) don’t include ones with noticeable noise and blur (Xu et al., 2019). Therefore, it cannot be said with confidence that the method also has good performance in real situations.

When a network is trained using a set of a LR and HR images and a loss function is used to evaluate the result, it adapts to reduce this loss. Since it is not really known what is adjusted in the network to give better results, this can be considered a risk in forensic science.

3.3. Bias

Seen from a forensic point of view, there is another very important aspect to discuss regarding the quality assessment of the methods that are discussed. Namely, the concept of bias. A LR image for training and testing is generated from a HR image by the same people that develop the algorithm to create a HR image from this LR image. Then, it is not really surprising that the result of most methods is that the algorithm creates a nice HR image which resembles the ground truth image better than when another method is applied to the dataset. These other methods that their method’s performance is measured against are often more general

(12)

will not give good results for each real image, since it seems to be like there is ‘hardcoding’ for the image processing in this case.

Therefore, any method that would be considered for use in forensic science should be tested with a new known dataset of images (or with multiple datasets for different goals: one including images resembling CCTV footage, one including faces, etc.). This database could include LR images of which a HR version is available as in the above-mentioned approach, but then the conversion from that HR image to the test LR image should be done in a realistic way or not in the same way for each image. Another option is to take pictures of the same scene with different cameras which produce images of different resolution. Then, the LR image could be processed to obtain a higher resolution. Afterwards, the information in the obtained HR image can be compared to the HR image from another source to assess the quality of the conversion.

(13)

4. Possibilities for use in Forensic Science

4.1. Validation of methods

As mentioned before, sometimes a visual assessment of the quality of the result of the transformation is made (Wang et al., 2019). This qualitative assessment can be very informative when the goal is to obtain a visually more pleasing version of the original. When the goal is to have a HR version of the original which is as close to the ground truth as possible, however, it is better to use an approach that is quantitative and aims for an objective analysis of the result. The PSNR value can be calculated to quantitively assess the construction of the HR image. When the maximum possible pixel value is fixed the PSNR is only related to the pixel-level MSE between images, which only cares about the difference between the pixel values at the same positions instead of human visual perception (Wang et al., 2019). This makes it a better method to compare methods objectively in literature and a possible way of assessing for image quality for forensic science, since in forensic science one of the goals is to make assessments as objective as possible.

4.2. Tactical information or evidence in court

Even though a lot of risks are mentioned in this review and caution towards implementing SR methods is highly advised, there are still possibilities for SR methods to be applied in the field of forensic science. The advice here would be to do this in the tactical stage and not as evidence in court. If images need to be put forward as evidence in court, it would be necessary that the original images will be presented as evidence in the case.

Since SR methods are not yet validated for use in casework, this is the safest option. As stated before, in forensic science accuracy is very important. However, the predictions and estimations made with the use of SR techniques cannot be said to have a validated and high accuracy. This could mean that an innocent person could be convicted for something he/she did not do (false positive), or a guilty person be set free for something he/she did do (false negative). The false positive rate of any technique used to analyse evidence in forensic has to be minimised, since the worst-case scenario in forensic science is not to set a guilty person free but to convict an innocent person. Since this false positive rate cannot be assessed for SR techniques, it would be advisable that if these techniques would be used in forensic science, they would only be used in the tactical stage of the investigation and therefore do not get any evidential value attributed to their results. When the image or video would later be presented in court as evidence, the original footage should be used. Since SR techniques would only be applied to footage of bad quality, this means that to convict a person in a case like this, other evidence most likely needs to be put forward for a conviction.

As an example, imagine the following hypothetical hit-and-run case: A car hit a person crossing the road at a pedestrian crossing and then drove off. There was another person walking at the sidewalk next to the road. After the incident happened, he took a picture of the car that was already driving off before attending to the victim. In the picture the car is visible in the distance and the license plate is also visible, but unfortunately unreadable. The resolution is high enough, however, that after application of a SR technique a license plate number can be read.

In this example, the picture taken by the bystander could be used both as tactical information to find a possible suspect and as evidence in court to show that this car was seen near the scene of the crime, aiming to try to tie a suspect to the crime scene. To ensure the least false positives are made in the criminal justice system, the version with higher resolution can be used to obtain the license plate number, which can be used to find the owner of the car with

(14)

however, this does not prove that the owner is the perpetrator. Maybe a car with a license plate resembling his license plate committed the hit-and-run. There is other evidence necessary to tie this car to this crime, such as blood, DNA or fibres of the victim on the outside of the car. If this would be found, the car can be considered connected to the scene. In court, the evidence connecting the victim/crime scene to the car would and should weight a lot more than the image that led to that evidence. When the case is brought to court, the original image should be used as evidence, since the validity of the prediction of the license plate might be assumed or proven by the finding of the other evidence but does not stand on its own.

4.3. Clusters

As explained in the previous section, it would not be wise and fair to use images that are altered using a SR method as evidence in court. For the tactical stage in the investigation, it can be used. If SR methods would be implemented in the tactical stage to recover or estimate more details in a LR image, the advice would be to choose MISR over SISR. This is the case because, as explained above, in MISR details can be uncovered from multiple images rather than estimating them on the basis of a single image alone. This is not always possible, since there will not always be multiple images available to perform MISR. If it is not possible to use MISR, a SISR method with the least assumptions should be chosen.

It would be best to not use neural networks in the conversions from LR to HR images. This advice is formed based on the bias that databases can give to networks and the fact that the inner workings of a network are not always understood. Since a learning-based network is trained, it learns from its input and the score of its result compared to the ground truth using a loss function. This can be seen as a black box: it is not known what exactly happens inside, we only perceive the input and output. This is not ideal for use in forensic science, since it should be known what happens to the data to make sure it is not altered, and the information obtained from the data is still trustworthy.

The advice to use MISR and no learning-based neural network, when possible, means that it would be best to use a method in forensic science from cluster 4. Therefore, less bias can be expected because there is no training of a network with a specific database. The fact that MISR would be used gives more reliability for the new information retrieved using SR, which could translate to a lower false-positive rate which is very much desired in forensic science. If a method from cluster 4 would be best to use. It would be best to develop one within the researcher’s own institute instead of using one of the methods analysed here, because of some reasons mentioned in the discussion and so all assumptions are known and can be taken into account when evaluating the result.

4.4. Methods specific to applications

The possibilities for image processing in the field of forensic science are very broad. As stated before, two reasons for a LR image can be distinguished, namely, the physical limitations of the camera and post-processing of the image. For the post-processing of the picture the applied algorithm might be known, and it might therefore be possible to reverse the applied processing. For the physical limitations, however, it will not be possible to have a reversion, but a prediction of a higher resolution version can be made. This also means that for the post-processing a network can be trained more specifically, since the post-post-processing steps can be imitated to form pairs of LR-HR images for training purposes.

Therefore, for different applications, it might be more appropriate to apply a different type of method. For example, for CCTV the underlying mechanism for the compression of the files to save storage space could be known, which means it is known how the image was processed. Then, a method that changes the image in a way that reverses the effects of a known

(15)

compression step would be extremely useful. However, for other recovered images it might not be possible to know what processing steps have been performed on them, for example in a case of sharing of illicit pictures. Then, a method that considers a lot of assumptions about the processing performed on the image will not give a result close to the truth of the original image. This is the case, because methods with a lot of assumptions might not be representative for the way the images are processed in reality.

A possible solution for this problem will thus be to develop methods for a specific cause. For example, when images are saved on a certain medium or sent to a server for storage, such as for the CCTV camera from the example above, they will sometimes be compressed. Therefore, it might be good advice to use a specific SR method for a specific purpose in forensic science. Footage could be recorded and stored in multiple locations (with and without compression) to give the chance to test which method would produce the HR image from the compressed data that is most similar to the original. It could be argued that it might be easier to contact the company that incorporated the post-processing, like the one that manufactured and installed the CCTV camera in this example, to ask them about their compressing methods of the pictures. This is of course an option and if it would be possible to obtain the protocol from the company this will be easiest. However, many of these companies are not that forward about their methods, which means still a method should be developed. If such a method could then be programmed or trained specifically for a certain type of images, it could lead to more suitable methods and thus more accurate estimations of the ground truth.

(16)

5. Discussion

When comparing multiple super-resolution (SR) methods published in the past five years, it was seen that many methods were developed for use with only one image as input (SISR). However, it is argued that with the input of multiple images (MISR) a more reliable enhancement of the details in the images can be achieved. In this review, there is advised against the use of learning-based neural networks for use in forensic science, since these have to be trained and their performance is highly dependent on the training set of images that is used to do this. Therefore, the advice is to use a method from cluster 4 in forensics. If no sequence of images is available to perform MISR, SISR could also be used in the tactical phase. However, since the result of such methods includes even more assumptions and estimations, the result should be interpreted with even more care. No evidential value should be attributed to either of these SR methods.

All three methods from cluster 4 in table above were developed to be applied to surveillance footage (see Table 1). This is not surprising, since the cluster is the most advisable to use in forensic science and MISR can be performed when videoframes are available. Both Shao et

al. (2017) and Villena et al. (2018) focus on faces in surveillance footage, while Kamenicky et al. (2016) show an example for the use of their program for higher quality footage of license

plates. These methods, however, do not seem ready to be used in forensic science yet. In Villena et al., a face detection algorithm is used to detect a face for MISR application and the investigator then still has to mark the best frame. In Shao et al., (2017) the investigator has to annotate the faces for comparison manually. This means such a face detection algorithm also needs to be validated and annotation of faces by hand increases the subjectivity of the method. As stated above, this cluster would be a good start to look for a method to use in forensic science. However, it would be best to develop a new method within the institute that will use it, so the institute knows what assumptions are made and these can be considered when reporting on the use of the method.

One of the reasons why it is hard to train a network for forensic science, is because its performance cannot be assessed using real case examples for testing and obtaining error rates. That is, because in case work the truth is not known. This is not only the case for neural networks, but also for the testing and validation of other methods. This fact brings us to a point of critique against the reporting of one of the articles in cluster 4. As stated before, methods should be tested with footage that resembles real case scenario’s, but for which the ground truth is known. However, in Villena et al. (2018) a pilot study is described that seems to report on the opposite. The study describes participants that get to see the enhanced and the original footage from a real case, after which they are asked to identify the person in the footage from a list of suspects. It is stated that the study is done to understand the influence of digital media on human perception and cognition. Even though it has not been stated that this study is done to validate their enhanced footage and thus their method, the conclusion that is drawn from the study does seem to reflect that. It is written that “The subjects in Group 1 were able to identify the perpetrators more accurately from the MISR images. […] The results were thus overwhelmingly in favour of the MISR enhanced images” (Villena et al., 2018 (p. 46)). However, this implicates that they assume the suspect indicated by the subjects is indeed the perpetrator. Since this is a real case, in which no one was convicted, this cannot be assumed. Earlier in their article, it is even stated that the evidence was deemed insufficient by the prosecution office to unambiguously identify the perpetrators, which means it cannot be stated that a suspect and a perpetrator are the same person (innocent until proven guilty). In their conclusion they do come back to this point and repeat that the quality of the images in this case was insufficient to unambiguously identify the perpetrators. In their article they start off with multiple examples in which they show the performance of their technique and its PSNR

(17)

values. Therefore, this comment is by no means meant to discredit their SR method. However, the pilot study does not seem to serve a clear goal and might confuse the reader, since it seems like an attempt to validate their method. This example illustrates how difficult it can be to set up studies to test the performance of a method for forensic science and how careful one must be with the use of words such as ‘identify’ in the forensic context.

In this review, only a subset of the SR methods developed in the past five years was analysed. Also, some articles that were published in different fields were reviewed, but almost all articles that were reviewed were developed for similar areas of research and several articles were even published in the same journal. This must be kept in mind when viewing the results, since they might not be representative for all research done into SR techniques. Since a lot of the SR methods are developed in image processing field with the aim of developing a HR image that is visually pleasing, they are developed in a different way than they would be when the goal is to use them in forensics.

Finally, most of the SR techniques from the image processing field are performed on images in colour, while some surveillance footage is still recorded in greyscale. Before applying the same methods on greyscale as on RGB-images, it should be checked if this is possible or if conversions need to be made.

(18)

6. Conclusion

In this literature review, several SR methods developed in the past five years were analysed. It might be beneficial to use SR techniques in forensic science, but only if care is taken when conclusions are drawn from the resulting image and there is no evidential value attributed to this image. From the comparison of these SR methods and their possibilities and risks for forensic science, the following advice was formed: If an image processing technique such as SR will be used in forensic science, it should only be applied in the tactical stage to point to a suspect. If the image in question would later be used as evidence in a case, then the original image should be used for analyses and comparisons. When an image is used as evidence, but the quality of the image is bad, it will be necessary that additional evidence is presented to the court. If super-resolution will be used in the tactical phase, the recommendation is to apply a method using MISR without neural networks, since the least assumptions will be made (MISR rather than SISR) and there are no neural networks used (which means more is known about the changes that were made to the image).

The methods shown in this review that use both MISR and no neural networks are not ready yet for use in forensic science. More research is necessary in this area before a method can be used in forensic science in practice.

(19)

7. References

Caballero, J., Ledig, C., Aitken, A., Acosta, A., Totz, J., Wang, Z., & Shi, W. (2017). Real-time video super-resolution with spatio-temporal networks and motion compensation.

Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 4778–4787. https://doi.org/10.1109/CVPR.2017.304

Chikate, A., Gangamwar, S., Jawade, S., Jogawe, T., & Gujar, A. (2016). Dictionary Learning

Based Super-Resolution Reconstruction of Biomedical Images. 6(5), 4939–4943.

https://doi.org/10.4010/2016.1220

Cruz, C., Mehta, R., Katkovnik, V., & Egiazarian, K. O. (2017). Single image super-resolution based on wiener filter in similarity domain. IEEE Transactions on Image Processing,

27(3), 1376–1389. https://doi.org/10.1109/TIP.2017.2779265

ECHR (2019). Guide on Article 6 of the European Convention of Human Rights (right to a fair trial). European Court of Human Rights (ECHR). https://doi.org/10.1007/978-3-642-60274-0_24

Ghazali, N. N. A. N., Zamani, N. A., Abdullah, S. N. H. S., & Jameson, J. (2012). Super resolution combination methods for CCTV forensic interpretation. Proceedings of the

12th IEEE International Conference on Intelligent Systems Design and Applications (ISDA), 853–858. https://doi.org/10.1109/ISDA.2012.6416649

Goklani, H. S., Shravya, S., & Jignesh, N. S. (2016). Image Super-Resolution using Single Image Semi Coupled Dictionary Learning. International Journal of Image Processing

(IJIP), 10(3), 135–144.

Han, Y., Zhao, Y., & Wang, Q. (2017). Dictionary learning based noisy image superresolution

via distance penalty weight model. PLoS ONE, 12(7).

https://doi.org/10.1371/journal.pone.0182165

Jingxuan, H., Jian, Z., Yonghui, Z., & Rong, W. (2018). Image Super-resolution Reconstruction Algorithm Based on Convolutional Neural Network. 2018 IEEE International Conference

on Automation, Electronics and Electrical Engineering (AUTEEE), 267–271.

https://doi.org/10.1109/AUTEEE.2018.8720786

Kamenicky, J., Bartos, M., Flusser, J., Mahdian, B., Kotera, J., Novozamsky, A., … Horinek, J. (2016). PIZZARO: Forensic analysis and restoration of image and video data. Forensic

Science International, 264, 153–166. https://doi.org/10.1016/j.forsciint.2016.04.027

Kim, J., Lee, J. K., & Lee, K. M. (2016a). Accurate image super-resolution using very deep convolutional networks. Proceedings of the IEEE Computer Society Conference on

Computer Vision and Pattern Recognition, 1646–1654.

https://doi.org/10.1109/CVPR.2016.182

Kim, J., Lee, J. K., & Lee, K. M. (2016b). Deeply-recursive convolutional network for image super-resolution. Proceedings of the IEEE Computer Society Conference on Computer

Vision and Pattern Recognition, 1637–1645. https://doi.org/10.1109/CVPR.2016.181

Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., … Shi, W. (2017). Photo-realistic single image super-resolution using a generative adversarial network.

Lim, B., Son, S., Kim, H., Nah, S., & Lee, K. M. (2017). Enhanced Deep Residual Networks for Single Image Super-Resolution. Proceedings of the IEEE Computer Society

(20)

Liu, D., Wang, Z., Wen, B., Yang, J., Han, W., & Huang, T. S. (2016). Robust Single Image Super-Resolution via Deep Networks with Sparse Prior. IEEE Transactions on Image

Processing, 25(7), 3194–3207. https://doi.org/10.1109/TIP.2016.2564643

Park, S. C., Park, M. K., & Kang, M. G. (2003). Super-resolution Image Reconstruction : A Technical Overview. IEEE Signal Processing Magazine, 21–36.

Romano, Y., Isidoro, J., & Milanfar, P. (2017). RAISR: Rapid and Accurate Image Super Resolution. IEEE Transactions on Computational Imaging, 3(1), 110–125. https://doi.org/10.1109/TCI.2016.2629284

Shao, J., Chao, F., Luo, M., & Lin, J. C. (2017). A Super ‑ resolution Reconstruction Algorithm for Surveillance Video. Journal of Forensic Science and Medicine, 3, 26–30. https://doi.org/10.4103/jfsm.jfsm

Shi, W., Caballero, J., Huszar, F., Totz, J., Aitken, A. P., Bishop, R., … Wang, Z. (2016). Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. Proceedings of the IEEE Computer Society Conference

on Computer Vision and Pattern Recognition, 1874–1883.

https://doi.org/10.1109/CVPR.2016.207

Sun, J., & Shum, H. Y. (2015). Image super-resolution using gradient profile prior. U.S. Patent

No. 9,064,476. Washington, DC: U.S. Patent and Trademark Office.

Tai, Y., Yang, J., & Liu, X. (2017). Image super-resolution via deep recursive residual network.

Verolme, E., & Mieremet, A. (2017). Application of forensic image analysis in accident investigations. Forensic Science International, 278, 137–147.

https://doi.org/10.1016/j.forsciint.2017.06.039

Villena, S., Vega, M., Mateos, J., Rosenberg, D., Murtagh, F., Molina, R., & Katsaggelos, A. K. (2018). Image super-resolution for outdoor digital forensics. Usability and legal

aspects. Computers in Industry, 98, 34–47.

https://doi.org/10.1016/j.compind.2018.02.004

Wang, Z., Chen, J., & Hoi, S. C. H. (2019). Deep Learning for Image Super-resolution: A Survey. IEEE, 1–23. Retrieved from http://arxiv.org/abs/1902.06068

Xu, X., Ma, Y., & Sun, W. (2019). Towards Real Scene Super-Resolution with Raw Images.

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,

1723–1731. Retrieved from http://arxiv.org/abs/1905.12156

Yang, J., & Huang, T. (2010). Image super-resolution: Historical overview and future challenges. In P. Milanfar (Ed.), Super-resolution imaging (pp. 20–34). Boca Raton: CRC Press, Taylor & Francis Group.

Yang, Weiguo, Xue, B., & Wang, C. (2018). Image Super Resolution Reconstruction Based MCA and PCA Dimension Reduction. Advances in Molecular Imaging, 08, 1–13. https://doi.org/10.4236/ami.2018.81001

Yang, Wenhan, Feng, J., Yang, J., Zhao, F., Liu, J., Guo, Z., & Yan, S. (2017). Deep Edge Guided Recurrent Residual Learning for Image Super-Resolution. IEEE Transactions on

Image Processing, 26(12), 5895–5907. https://doi.org/10.1109/TIP.2017.2750403

Yang, Wenming, Zhang, X., Tian, Y., Wang, W., Xue, J.-H., & Liao, Q. (2019). Deep Learning for Single Image Super-Resolution: A Brief Review. IEEE Transactions on Multimedia,

(21)

Zhang, Y., Tian, Y., Kong, Y., Zhong, B., & Fu, Y. (2018). Residual Dense Network for Image Super-Resolution. Proceedings of the IEEE Computer Society Conference on Computer

(22)

Appendices

Appendix I – Search strategy

In the start of September, I started searching for articles on Google Scholar using search terms including ‘Super-resolution forensic science’, ‘Super-resolution techniques image processing forensic science’, ‘Super resolution image enhancement’ and other synonyms. An additional filter was set to select on articles from the past 5 years. Several articles were obtained, from which I started to read a few overviews. The most relevant articles to forensic science were prioritized, such as ones about CCTV footage or super-resolution used for facial recognition. Since there were not that many articles found about super-resolution methods in general or for the field of forensic science, the search for methods was expanded to the last 10 instead of 5 years. Also, a few overviews of SR methods from 2003 and 2012 were read. Throughout the next few months, these articles were read.

In October, I searched again if there was more to find regarding super-resolution for forensic science. Using the search term ‘Super Resolution forensic science’ (About 12.700 results (0,08 sec)) a few new articles were found. During a meeting with my supervisor, we agreed that my research would stay within the past 5 years. After this meeting, he sent me four articles to have a starting point for my search for new literature. Among these were two review articles from 2019, which provided a good starting point since they referenced to quite recent research. From this point on, only research published since 2016 was used in the results. For the introduction, however, a few of the older articles were used.

In November and December, most of the time for this review was spent by reading the acquired articles and by writing some sections of the literature review. A few new articles were obtained from the references of the review articles and one new article was obtained by searching on Google Scholar using the search term ‘super-resolution image reconstruction filetype:pdf’ (About 2.700 results (0,03 sec)) and the filter ‘since 2016’.

In the end of December and the start of January, most of the time spent on this literature review was spent on writing. However, there was once searched for additional articles using the terms ‘super-resolution image reconstruction filetype:pdf forensic science (About 172 results (0,11 sec))’ and ‘super-resolution image restoration’ (About 34.200 results (0,06 sec)) with a filter for ‘since 2016’ in Google Scholar.

(23)

Appendix II – Graph with overview of methods per cluster per year

Figure 2. Overview of the number of methods in each cluster per year and for the total of all years that are analysed in this review. The clusters shown are: cluster 1 (SISR with neural network (NN)), cluster 2 (MISR with NN), cluster 3 (SISR without NN) and cluster 4 (MISR without NN).

0 5 10 15 20 25 2016 2017 2018 2019 2016-2019 (total) N UM BE R O F M ETH O D S YEAR

Super-Resolution Methods for Image Analysis in Forensic Science