• No results found

Content Aware Smart Cropping

CHAPTER 4. CONTENT AWARE SMART CROPPING

5.2 Future Work

The Mask Scoring R-CNN framework with the modified backbone for saliency detection uses ResNet-101 for feature extraction. The inference time with this network is 4 CPU seconds.

An exploratory analysis of various CNNs as the backbone feature extraction network for the Mask Scoring R-CNN can be useful to make the model more computationally efficient.

The success rate of smart cropping for images on marketplace depends on the category that the object in the image belongs to. A category aware cropping by using the category information of the object uploaded by the user can aid in removal of the negative results for categories like real-estates, houses, apartments, land areas for sale or garages. The modified Mask Scoring R-CNN architecture can be extended to predict the underlying category of the image.

Background removal of the image also enhances the visibility of the object of interest in the Marketplace websites. Our approach of saliency detection can be further tailored for background removal in the images. The images that are not professionally captured can be enhanced using various image enhance techniques like background removal, color and exposure enhancement and super resolution used in combination with smart cropping to increase the buyer experience.

Bibliography

[1] COCO - Common objects in Context. https://www.fonearena.com/blog/241836/

twitter-is-using-neural-networks-for-smart-auto-cropping-of-images.html.

1

[2] COCO - Common objects in Context. https://blog.thepapermillstore.com/

design-principles-rule-of-thirds/. 3

[3] COCO - Common objects in Context. http://cocodataset.org/download. 22,56 [4] COCO - Common objects in Context. https://docs.aws.amazon.com/sagemaker/

latest/dg/sms.html. 31

[5] COCO - Common objects in Context. https://github.com/waspinator/

pycococreator. 56

[6] MS Windows NT kernel description. https://www.gettyimages.nl/. 8

[7] Understanding COCO Dataset. http://www.immersivelimit.com/tutorials/

create-coco-annotations-from-scratch/coco-dataset-format. 56

[8] Mart´ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghem-awat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Man´e, Ra-jat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Vi´egas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org. 14

[9] H. Ao and N. Yu. Edge saliency map detection with texture suppression. In 2011 Sixth International Conference on Image and Graphics, pages 309–313, Aug 2011. 45

[10] Li-Qun Chen, Xing Xie, Xin Fan, Wei-Ying Ma, Hong-Jiang Zhang, and He-Qin Zhou. A visual attention model for adapting images on small displays. Multimedia Syst., 9(4):353–

364, October 2003. 8

[11] Doug DeCarlo and Anthony Santella. Stylization and abstraction of photographs. In Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Tech-niques, SIGGRAPH ’02, pages 769–776, New York, NY, USA, 2002. ACM. 8

BIBLIOGRAPHY

[12] Yubin Deng, Chen Change Loy, and Xiaoou Tang. Image Aesthetic Assessment: An Experimental Survey. Master’s thesis, 2016. 6

[13] S. Dhar, V. Ordonez, and T. L. Berg. High level describable attributes for predicting aesthetics and interestingness. In CVPR 2011, pages 1657–1664, June 2011. 6

[14] Hossein Talebi Esfandarani and Peyman Milanfar. NIMA: neural image assessment.

CoRR, abs/1709.05424, 2017. 46,47

[15] Seyed A. Esmaeili, Bharat Singh, and Larry S. Davis. Fast-at: Fast automatic thumbnail generation using deep neural networks. CoRR, abs/1612.04811, 2016. 8

[16] Junho Jeon Eunbin Hong and Seungyong Lee. Cnn based repeated cropping for photo composition enhancement. In CVPR Workshop, 2017. 7

[17] Chen Fang, Zhe Lin, Radomir Mech, and Xiaohui Shen. Automatic image cropping using visual composition, boundary simplicity and content preservation models. pages 1105–1108, 11 2014. 8

[18] Kaiming He, Georgia Gkioxari, Piotr Doll´ar, and Ross B. Girshick. Mask R-CNN. CoRR, abs/1703.06870, 2017. 1

[19] Kaiming He, Georgia Gkioxari, Piotr Doll´ar, and Ross B. Girshick. Mask R-CNN. CoRR, abs/1703.06870, 2017. 10,11,12

[20] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015. 25

[21] Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Ali Borji, Zhuowen Tu, and Philip H. S.

Torr. Deeply supervised salient object detection with short connections. CoRR, abs/1611.04849, 2016. ix,ix,9,10,14

[22] Zhaojin Huang, Lichao Huang, Yongchao Gong, Chang Huang, and Xinggang Wang.

Mask scoring R-CNN. CoRR, abs/1903.00241, 2019. ix,11,12,13,21

[23] Itseez. Open source computer vision library. https://github.com/itseez/opencv, 2015. 15

[24] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1254–

1259, Nov 1998. 8

[25] Yan Ke, Xiaoou Tang, and Feng Jing. The design of high-level features for photo quality assessment. Master’s thesis, 2006. 6

[26] Christof Koch Laurent Itti. Comparison of feature combination strategies for saliency-based visual attention systems, 1999. 8

[27] Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew P. Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi. Photo-realistic single image super-resolution using a generative adversarial network. CoRR, abs/1609.04802, 2016. 1

BIBLIOGRAPHY

[28] Debang Li, Huikai Wu, Junge Zhang, and Kaiqi Huang. A2-RL: aesthetics aware rein-forcement learning for automatic image cropping. CoRR, abs/1709.04595, 2017. 7 [29] Guanbin Li and Yizhou Yu. Deep contrast learning for salient object detection. CoRR,

abs/1603.01976, 2016. 8

[30] Tsung-Yi Lin, Piotr Doll´ar, Ross B. Girshick, Kaiming He, Bharath Hariharan, and Serge J. Belongie. Feature pyramid networks for object detection. CoRR, abs/1612.03144, 2016. ix,12,27,28

[31] Xin Lu, Zhe Lin, Hailin Jin, Jianchao Yang, and James Z. Wang. Rapid: Rating pictorial aesthetics using deep learning. In Proceedings of the 22Nd ACM International Conference on Multimedia, MM ’14, pages 457–466, New York, NY, USA, 2014. ACM. 6

[32] L. Mai, H. Jin, and F. Liu. Composition-preserving deep photo aesthetics assessment.

In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 497–506, June 2016. 7

[33] Niloy J. Mitra Xiaolei Huang Shi-Min Hu Ming-Ming Cheng, Guo-Xin Zhang. Global contrast based salient region detection. pages 409–416, 2011. 45

[34] Jongchan Park, Joon-Young Lee, Donggeun Yoo, and In So Kweon. Distort-and-recover:

Color enhancement using deep reinforcement learning. CoRR, abs/1804.04450, 2018. 1 [35] Jos´e Luis Pech-Pacheco, Gabriel Crist´obal, Jes´us Chamorro-Mart´ınez, and Joaqu´ın

Fern´andez-Valdivia. Diatom autofocusing in brightfield microscopy: a comparative study.

Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, 3:314–

317 vol.3, 2000. 46

[36] Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. CoRR, abs/1506.02640, 2015. 10

[37] Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. Faster R-CNN: towards real-time object detection with region proposal networks. CoRR, abs/1506.01497, 2015.

10

[38] Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39:1137–1149, 2015. 12

[39] Fred Stentiford. Attention based auto image cropping. 2007. 8

[40] Bongwon Suh, Haibin Ling, Benjamin B. Bederson, and David W. Jacobs. Automatic thumbnail cropping and its effectiveness. In Proceedings of the 16th Annual ACM Sym-posium on User Interface Software and Technology, UIST ’03, pages 95–104, New York, NY, USA, 2003. ACM. 8

[41] Jingdong Wang, Huaizu Jiang, Zejian Yuan, Ming-Ming Cheng, Xiaowei Hu, and Nan-ning Zheng. Salient object detection: A discriminative regional feature integration ap-proach. International Journal of Computer Vision, 123(2):251–268, 2017. 14

BIBLIOGRAPHY

[42] W. Wang, J. Shen, and H. Ling. A deep network solution for attention and aesthetics aware photo cropping. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(7):1531–1544, July 2019. 1

[43] Wenguan Wang and Jianbing Shen. Deep cropping via attention box prediction and aesthetics assessment. CoRR, abs/1710.08014, 2017. 9

[44] Saining Xie and Zhuowen Tu. Holistically-nested edge detection. CoRR, abs/1504.06375, 2015. 9

[45] J. Yan, S. Lin, S. B. Kang, and X. Tang. Learning the change for automatic image cropping. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, pages 971–978, June 2013. 8

[46] Chun yan Yu, Wei shi Zhang, and Chun-Li Wang. A saliency detection method based on global contrast. 2015. 45

[47] Wenming Yang, Xuechen Zhang, Yapeng Tian, Wei Wang, and Jing-Hao Xue. Deep learning for single image super-resolution: A brief review. CoRR, abs/1808.03344, 2018.

1

[48] C. Yeh, C. Huang, and C. Lin. Deep learning underwater image color correction and contrast enhancement based on hue preservation. In 2019 IEEE Underwater Technology (UT), pages 1–6, April 2019. 1

[49] Runsheng Yu, Wenyu Liu, Yasen Zhang, Zhi Qu, Deli Zhao, and B. O. Zhang. Deepex-posure: Learning to expose photos with asynchronously reinforced adversarial learning.

In NeurIPS, 2018. 1

[50] Jing Zhang, Yuchao Dai, Fatih Porikli, and Mingyi He. Deep edge-aware saliency detec-tion. CoRR, abs/1708.04366, 2017. 45

[51] L. Zhang, M. Song, Y. Yang, Q. Zhao, C. Zhao, and N. Sebe. Weakly supervised photo cropping. IEEE Transactions on Multimedia, 16(1):94–107, Jan 2014. 8

Appendix A