Results and Evaluation - CONTENT AWARE SMART CROPPING

Content Aware Smart Cropping

CHAPTER 4. CONTENT AWARE SMART CROPPING

4.4 Results and Evaluation

4.4.1 Results Summary - Saliency maps

Figure4.13 compares some sample saliency maps for all the approaches described in Section 4.3and Section4.2.2with respect to DSS. The MODIFIED MSRCNN SAL achieves the best saliency maps in comparison to the other techniques.

Figure 4.13: Summary of the resulting Saliency maps

4.4.2 User study

A user study is conducted using Amazon Sagemaker Ground truth [4], a labeling service to build high quality training datasets for machine learning models. The dataset used for evaluation consists 500 images of various categories of OLX marketplace in order to sclae the application to all the categories. The cropping results using saliency maps from DSS and all the approaches based on Mask R-CNN framework for saliency detection are the input for the user study. Thus, a total of 2500 images are used as input for user study. Every image is assessed by 5 users and each user rates the image on a scale of 0 to 4. When an image is evaluated by 5 users, the image is removed from the evaluation list. Figure 4.14 is a screen shot from the Amazon Sagemaker Ground truth for evaluating our crop bounding box.

CHAPTER 4. CONTENT AWARE SMART CROPPING

Figure 4.14: Screenshot from Amazon Sagemaker Ground Truth to evaluate the crop bounding box

4.4.3 Results

The total number of images evaluated by 5 users among 2500 images is 466, as the user study experiment is not completed yet. We provide our results over this subset of the dataset.

The users rate the image on a scale of 0 to 4, with 0 corresponding to the worst crop and 4 corresponding to the best crop. For the analysis of the results, any image with score greater than 2 is considered to be successful and the images with score lesser than or equal to 2 are considered as unsuccessful crops. Table 4.1 shows the number of images evaluated per approach, percentage of successful and unsuccessful crops per approach and the average user score for crops per approach.

DSS 102 52 50.09 50 49.91 2.12

MRCNN+SAL 100 64 64 36 36 2.57

MRCNN SAL 86 54 62.7 32 37.3 2.46

MSRCNN SAL 88 58 65.9 30 34.1 2.61

MODIFIED MSRCNN-- SAL

90 72 80 18 20 3.04

Table 4.1: User study results

It is clear from Table4.1that all the different approaches with Mask R-CNN outperforms DSS network in terms of success of cropping based on user opinion. Cropping using the sa-liency maps from DSS network results in success rate of 50.09%. The Mask R-CNN sasa-liency output merged with saliency map from DSS network increases the number of successful crops

CHAPTER 4. CONTENT AWARE SMART CROPPING

by 14%. This is a significant improvement, however, the technique is computation intensive since it requires outputs from two networks.

The Mask R-CNN trained with the saliency dataset also increases the number of successful crops by about 13%. The advantage in this case is that it is computationally efficient com-pared to the previous technique of merging the outputs of Mask R-CNN and DSS networks.

In order to obtain a robust crop quality metric, we use the Mask Scoring R-CNN approach and training mask Scoring R-CNN with saliency dataset, which increases the number of suc-cessful crops by about 16% compared to the DSS networks.

The Mask Scoring R-CNN with DSS backbone has the highest success rate and improves the cropping by 30% compared to the DSS network. The techniques based on Mask R-CNN outperform DSS network since Mask R-CNN contains Region proposal network stage where the entire feature map in different scales (outputs of FPN network) is scanned to detect ob-jects in the image.

Figure 4.15: Mean user opinion score per approach

Figure 4.15 shows the mean user opinion score per approach using Mask R-CNN and DSS networks. The mean user opinion score for crops using the saliency maps from the DSS network is 2.12. The MODIFIED MSRCNN SAL has a significantly higher mean user opinion score of 3.04, improving the average crop quality by 42.7% compared to the DSS baseline.

Mask Scores to evaluate crop quality

In order to avoid bad crops, there is a need to avoid unsuccessful crops. To evaluate the quality of the crop, MaskIoU score from Mask Scoring R-CNN is used. Figure4.16b shows the histogram of the MaskIoU score from MSRCNN SAL vs mean user opinion score. The

CHAPTER 4. CONTENT AWARE SMART CROPPING

brighter the bin in the histogram, more correlated is the data. It is clear that for the user opinion score greater than 3, the mask quality score is also above 0.9, thus the mask quality is good.

(a) Density plot of Mask scores values for suc-cessful and unsucsuc-cessful crops

(b) 2-D histogram of MaskIoU Score v/s mean user opinion score

Figure 4.16: Correlation of Mask Scores from MSRCNN SAL with mean user opinion score Figure4.16a shows the density distribution of mask scores over successful and unsuccess-ful crops based on the user study. The x-axis of the graph represents the mask scores from MSRCNN SAL. The y-axis is the density values which are derived based on the x-axis values to make the total area under the curve to integrate to 1.

The Receiver operating characteristics curve is shown in the Figure4.17aand the precision recall curve is shown in Figure 4.17b. Since, the requirement is to reduce the false positives rate, we select the threshold of 0.8, which has the low false positive rate, while also not reducing the true positive rate. The threshold value of 0.8 for the MaskIoU score avoids 73.4% of the negative crops and allows 70.6% of positive crops along with 26.6% of negative crops as shown from the confusion matrix in Figure4.18.

CHAPTER 4. CONTENT AWARE SMART CROPPING

(a) ROC curve

(b) Precision-recall curve, AP is the average pre-cision

Figure 4.17: Selection of threshold for the metric

Figure 4.18: Confusion matrix for the metric - MaskIoU score from MSRCNN SAL

The DSS network is used as the backbone of the Mask R-CNN network in MODI-FIED MSRCNN SAL. Figure 4.19b shows the histogram of the mask score from MODI-FIED MSRCNN SAL vs mean user opinion score. As already mentioned above, the brighter the bin in the histogram, more correlated is the data. It is clear that for the user opinion score greater than 3, the mask quality score is also above 0.8.

CHAPTER 4. CONTENT AWARE SMART CROPPING

(a) Density plot of Mask scores values for suc-cessful and unsucsuc-cessful crops

(b) 2-D histogram of MaskIoU Score v/s mean user opinion score

Figure 4.19: Correlation of Mask Scores from MODIFIED MSRCNN SAL with mean user opinion score

Figure4.19ashows the density distribution of mask scores over successful and unsuccessful crops based on user study. The x-axis of the graph represents the mask scores from MODI-FIED MSRCNN SAL. The y-axis is the density values which are derived based on the x-axis values to make the total area under the curve to integrate to 1.

The Receiver operating characteristics curve is shown in the Figure4.20aand the precision recall curve is shown in Figure 4.20b. Since, the requirement is to reduce the false positives rate, we select the threshold of 0.8, which has low false positive rate, while also not reducing the true positive rate. The threshold value of 0.8 for the MaskIoU score avoids 82.4% of the negative crops and allows 69.86% of positive crops along with 17.6% of negative crops as shown from the confusion matrix in the Figure4.21. Thus, we can conclude that the MaskIoU score from MODIFIED MSRCNN SAL is the most robust metric compared to the metrics discussed in AppendixA and also compared to the MaskIoU score from MSRCNN SAL.

CHAPTER 4. CONTENT AWARE SMART CROPPING

(a) ROC curve

(b) Precision-recall curve, AP is the average pre-cision

Figure 4.20: Selection of threshold for the metric

Figure 4.21: Confusion matrix for the metric - MaskIoU score from MODI-FIED MSRCNN SAL

Chapter 5

Conclusions

This thesis proposes a smart cropping application for images uploaded by users on online marketplaces. The baseline saliency detection network(DSS) was used to obtain saliency map to implement smart cropping by applying post processing techniques on the saliency map and the results are analyzed. The success rate of baseline implementation is 50.1% based on user study and cropping mostly failed when the original image was of low quality, the object in the image is not complete, the original image contains text or thin edges or the image is of land, house, apartments, garages etc. Based on the observations and a need to evaluate the saliency map to use the application in production, a few hypotheses were proposed to design metrics for evaluation of saliency maps. A novel approach of using Mask R-CNN for saliency detection is proposed in order to improve the success rate of cropping and obtaining the crop quality score. The DSS network is used as the backbone of the Mask R-CNN network for saliency detection, and it shows significant improvement in detecting the salient regions in the image. The success rate of cropping the image using the modified Mask R-CNN is 80%

based on the user study.

5.1 Contribution

Thumbnail images on Marketplace websites gives the benefit to fit multiple ad-postings on a single page. The traditional method of compressing the image to create the thumbnail reduces the resolution of the image and the object to be sold may not be apparent to the user. In this direction, we develop a smart cropping application based on Saliency detection in an image to re-compose the image in order to increase the visibility of the salient object.

The baseline saliency detection network (DSS) is implemented to obtain the saliency map of the image. The image is post processed to crop around the salient region based on the sali-ency map obtained. The results are manually evaluated as successful and unsuccessful crops and the correspnding original images and saliency maps are analyzed to detect the anomalies which result in unsuccessful crops. Based on the analysis, different hypotheses are proposed to avoid the negative results and the corresponding quality metrics are validated. Among the metrics proposed, the metric ’Average salient pixel’ performs comparatively better than the rest of the metrics discussed. However, the metric does not perform well if the saliency detection network detects only certain regions of the image with high confidence. Thus, with a need to design and validate a robust metric and to improve the saliency detection across all

CHAPTER 5. CONCLUSIONS

categories of images, Mask R-CNN is re-purposed for saliency detection.

The smart cropping application should have higher success rate in order to be used in production. The quality of the crop depends on the quality of the saliency map. In order to improve the quality of saliency map, the instance segmentation framework, Mask R-CNN is re-purposed for saliency detection. We propose three different approaches with Mask R-CNN re-purposed for Saliency detection. The resulting saliency map and the corresponding crop of every approach is presented. The results are also subject to user study using Amazon Sagemaker Ground Truth. Based on the user study, the cropping has improved 42.7% using Mask R-CNN for saliency detection compared to the cropping based on the saliency map by DSS network.

In document Eindhoven University of Technology MASTER Smart cropping of image based on saliency detection Manjunath Shetty, A. (pagina 44-52)