• No results found

Crop quality metrics

A.3 Metric Evaluation

In this section, we validate the hypotheses proposed in the previous section by analysing the different metrics corresponding to these hypotheses to identify most useful crop quality assessment metric. The input image quality is assessed using Laplacian variance for sharpness and Neural Image Assessment (NIMA [14]) for overall aesthetics of the image. The area of the salient regions and the average of all the salient pixels are computed using simple image processing using OpenCV.

A.3.1 Data for evaluation

The dataset for metric evaluation consists of 1925 images from the fashion category within the OLX website. The selection of fashion category is application specific. These images are cropped using the smart cropping application and the results are manually evaluated as being successful or unsuccessful crops. In total, 1763 crops are evaluated as successful crops and 158 bad crops. The success rate per category is shown in FigureA.2.

Figure A.2: Success rate of cropping per sub-categories of Fashion category from OLX.

A.3.2 Laplacian variance/ Sharpness Score of the image

Laplacian variance [35] is a widely used operator for edge detection and it detects the regions of rapid intensity changes in the image by measuring the second derivative of the image. A single channel of the image is convolved with a 3x3 Laplacian kernel as shown in FigureA.3 and the variance of the response is taken as the Laplacian variance. If the variance is high, the image is said to have many edges and is considered sharp. Thus, the higher the Laplacian variance, the sharper the image is.

APPENDIX A. CROP QUALITY METRICS

Figure A.3: Laplacian kernel

The Laplacian variance for every image in the evaluation dataset is recorded and compared against the manual quality score given to the image. The density distribution of Laplacian variance values over different quality images is shown in the graph in Figure A.4.

Figure A.4: Density plot of Laplacian variance values for successful and unsuccessful crops The x-axis of the graph represents the Laplacian variance values and the y-axis is the probability density values, selected based on the x-axis values, making the total area under the curve integrate to 1.

The precision recall curve is shown in FigureA.5. Since, the requirement is to reduce the false positives rate, we select the threshold of 100, which has low false positive rate, while also not reducing the true positive rate.

The confusion matrix for the threshold value of 100 is shown in FigureA.6.

The distribution of successful and unsuccessful crops almost overlap completely and the peaks are overlapping indicating that there is no Laplacian variance value that can be used as the threshold to differentiate the positive and the negative results. This analysis shows that the crop quality does not depend on the sharpness of the image.

A.3.3 Aesthetics score/ NIMA

Neural Image Assessment: NIMA [14] is a deep convolutional neural network which predicts aesthetic score for an image. The approach in NIMA is to train the model with histogram of user ratings rather than a single rating for an image. For any given image, NIMA gives out a distribution of ratings on a scale of 1-10 and it assigns a probability to each of the possible

APPENDIX A. CROP QUALITY METRICS

Figure A.5: Precision-recall curve, AP is the average precision

Figure A.6: Confusion matrix of metric Laplacian variance with threshold value of 100

scores. We use a pretrained NIMA endpoint which is trained for 20 epochs for our evaluation.

The image quality is said to be high if the NIMA score is high.

The density distribution of NIMA scores over successful and unsuccessful images are shown in the FigureA.7, the x-axis representing NIMA scores and the y-axis represents the density values, selected based on the x-axis values, making the total area under the curve integrate to 1.

The precision recall curve is shown in Figure A.8. Since, the requirement is to reduce the false positives rate as discussed above, we select the threshold of 0.6, which has low false positive rate, while also not reducing the true positive rate.

The confusion matrix for the threshold value of 0.6 is shown in FigureA.9.

The distributions of successful and unsuccessful crops are almost similar to that of Lapla-cian variance indicating that the crop quality does not depend on the image sharpness or aesthetics. Another major drawback of using NIMA scores is that the NIMA is a computa-tionally intensive model over the smart cropping model based on DSS.

APPENDIX A. CROP QUALITY METRICS

Figure A.7: Density plot of NIMA values for successful and unsuccessful crops

Figure A.8: Precision-recall curve, AP is the average precision

Figure A.9: Confusion matrix of metric NIMA score with threshold value of 0.6

A.3.4 Ratio of Salient area to total image area

The area of the salient region is calculated by converting the salient maps to binary images and calculating the number of bright pixels. The ratio of this area to the total image area is calculated and is compared against the manual image quality score. The quality of saliency

APPENDIX A. CROP QUALITY METRICS

map is considered to be directly proportional to the ratio based on the observations of the saliency maps of successful crops. The density distribution of the ratios over successful and unsuccessful crops are shown in the graph shown in Figure A.10. The x-axis represents the ratio of salient area to total image area and the y-axis represents density values, selected based on the x-axis values, making the total area under the curve integrate to 1

Figure A.10: Density plot of ratios of salient are to total image area for successful and unsuccessful crops

The precision recall curve is shown in Figure A.11. Since, the requirement is to reduce the false positives rate as discussed above, we select the threshold of 0.3, which has low false positive rate, while also not reducing the true positive rate.

Figure A.11: Precision-recall curve, AP is the average precision

The confusion matrix shown in A.12 shows that this metric outperforms the Laplacian variance and NIMA for the threshold of 0.3. However, the metric gives about 30% false negatives which will reduce the performance of the application. This metric also cannot distinguish between a bad saliency map and an image with significantly small salient area

APPENDIX A. CROP QUALITY METRICS

Figure A.12: Confusion matrix of metric ratio of salient area with threshold value of 0.3

compared to the image size and will avoid both of them, an example of which is shown in FigureA.13.

Figure A.13: Examples of saliency maps that will be rejected based on the metric, salient area ratio

APPENDIX A. CROP QUALITY METRICS

A.3.5 Average of all salient pixels

Each pixel value in the saliency map represents the probability of that pixel being salient. In order to find the average of the salient pixels, the saliency map is converted into a grayscale image to get the singe channel. The pixels greater than a particular threshold (we use 10) are considered in finding the average value. The maximum value of this metric is 255 since it represents a perfect white pixel. The higher the average value, the more confident the saliency map is.

The density distribution of the average salient pixel values over successful and unsuccessful crops are shown in the graph in FigureA.14. The x-axis represents the average salient pixel values and the y-axis represents density values, selected based on the x-axis values, making the total area under the curve integrate to 1

Figure A.14: Density plot of average of salient pixel values for successful and unsuccessful crops

The precision recall curve is shown in Figure A.15. Since, the requirement is to reduce the false positives rate as discussed above, we select the threshold of 210, which has low false positive rate, while also not reducing the true positive rate.

APPENDIX A. CROP QUALITY METRICS

Figure A.15: Precision-recall curve, AP is the average precision

From the density distribution in FigureA.14, we can see that this metric provides a better distinction between positive and negative saliency maps. As shown in the confusion matrix in Figure A.16, the threshold of 210 results in 25% of successful crops and about 65% of unsuccessful crops being rejected. This metric does not perform well in the case only some regions of an object are detected with high confidence which leads to a bad crop, as shown in FigureA.17.

Figure A.16: Confusion matrix of metric average salient pixel with threshold value of 210

Figure A.17: Examples of saliency map of negative results the metric average salient pixel value will not avoid

APPENDIX A. CROP QUALITY METRICS