University of Groningen Deep learning and hyperspectral imaging for unmanned aerial vehicles Dijkstra, Klaas

(1)

Deep learning and hyperspectral imaging for unmanned aerial vehicles

Dijkstra, Klaas

DOI:

10.33612/diss.131754011

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below.

Document Version

Publisher's PDF, also known as Version of record

Publication date: 2020

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):

Dijkstra, K. (2020). Deep learning and hyperspectral imaging for unmanned aerial vehicles: Combining convolutional neural networks with traditional computer vision paradigms. University of Groningen. https://doi.org/10.33612/diss.131754011

Copyright

Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

(2)

141

Chapter 6 Discussion and conclusion

I

n the first part of this chapter the answers to the research questions are discussed. In the second part the usefulness of combining deep learning with traditional computer vision paradigms is discussed. The final part will discuss the envisioned path for the future based on the conclusions from this dissertation.

The main research question addressed in this dissertation is: “How can deep learning be utilized to mitigate the limitations imposed by small aerial platforms employing hyperspectral imaging technology?” It should be clear that it is virtually impossible to overcome all limitations imposed by small aerial platforms when using hyperspectral imaging devices. Much of the limitations are a result of the underlying physics or are caused by theoretical limits. However, in this dissertation, solutions to mitigate several of the limitations have been demonstrated and discussed.

6.1 Research questions

A methodology for selecting the optimal spectral bands from a hyperspectral cube has been proposed for distinguishing two potato diseases on leaves in Chapter2. A per-pixel classification of potato leaves was performed using several classifiers, and it was demonstrated, using feature selection and extraction, that a subset of three bands still provides useful results. This showed how the laboratory set up of a 28-band imaging system could be reduced to a three-band system while still retaining sufficient accuracy. The Liquid Crystal Tunable Filter (LCFT) used for collecting the hyperspectral images is difficult to use on a Unmanned Aerial Vehicle (UAV) because of its temporal instability (each

(3)

hyperspectral plane is taken at a different time). However, when reduced to a three-band camera system, it is suitable for usage on a UAV. For example, by using three separate cameras.

Prior to this research, distinguishing Alteraria and ozone damage was difficult to perform because of visual similarities between the diseases (Turkensteen et al., 2010). This research contributes by providing a machine-learning-based methodology for distinguishing Alternaria and ozone damage on potato plant leaves in laboratory conditions using hyperspectral imaging. Other studies used commodity multi-spectral cameras or regular RGB cameras for crop-health monitoring on a UAVs (Mohanty et al.,2016;Rebetez et al.,2016). These cameras are limited with respect to the spectral resolution and the spectral bands that are captured are intented for measuring Chlorophyll of vegetation (Berra et al., 2017). This research contributes by demonstrating various methods for selecting important subsets of spectral bands from a hyperspectral image cube. This gives information on which bands need to be captured for detecting specific diseases like Alternaria and ozone. This methodology provides an answer to the questions: “How can machine learning be used to automatically select important spectral bands?” and “What is the subsequent effect of using less bands on the performance of the posed problem?”.

The usage of a 16-band light-weight hyperspectral camera system with a Multispectral Color Filter Array (MCFA) (Geelen et al., 2015), mounted on a UAV, has been proposed in this dissertation. The hyperspectral images produced by this system suffer from crosstalk and their spatial resolution is reduced sixteen fold to accommodate for the increase in spectral resolution (Keren and Osadchy, 1999). Several methods for demosaicking images have been proposed in literature: using edge information, neural networks, inpainting and linear models (Monno et al., 2012; Wang, 2014; Wang et al., 2017; Aggarwal and Majumdar, 2014). This research contributes by proposing a deep-learning-based method for demosaicking a 4 × 4 image mosaic and simultaneously reducing crosstalk between spectral bands. A custom Convolutional Neural Network (CNN) has specifically been designed to reduce crosstalk and to increase the spatial resolution of the images created by this type of hyperspectral camera system. Usually crosstalk (Hirakawa, 2008) is considered detrimental. However, this research contributes by observing that the spatial and spectral correlations in the hyperspectral information

(4)

6.2. Computer vision and deep learning 143

actually benefit the upscaling process. This way of using deep learning as a signal processing method demonstrates an answer to the question: How can the quality and resolution of hyperspectral images be improved using deep learning?

A notoriously difficult challenge in computer vision is to detect objects, which are connected in the image, as separate objects. In this dissertation two versions of the deep learning architecture CentroidNet have been introduced specifically for this task. The architecture was tested on aerial images of potato crops that were collected using a low-cost commodity UAV. This research contributes by showing that our approach achieves a higher F1 score for counting potato-crops compared to other one-stage object detection models in literature (Redmon and Farhadi, 2017; Lin et al.,2017c). It was found that the CentroidNet architecture is particularity suitable for counting small and connected objects as individuals by estimating their centroids. This provides an answer to the question: “How can deep learning be used to solve a challenging image-processing task using images produced by low-cost commodity UAVs?”

The trinity of technologies has provided an interesting framework to provide focus for the research on mitigating several inherent limitations of UAVs and hyperspectral cameras by using deep learning. However, the results of this research are not only relevant within this framework of thought. Spectral-band selection can be applied in any hyperspectral application to add efficiency for small platforms. Similarly, it has been shown that CentroidNet is applicable in a broader range of applications like cell-nuclei counting and bacterial-colony counting. CentroidNet shows increased or on-par performance compared to state-of-the-art instance segmentation methods (Redmon and Farhadi, 2018; He et al., 2017). Furthermore, the research in this dissertation contributes to a more fundamental insight in the relation between computer vision and deep learning which is discussed in the next section.

6.2 Computer vision and deep learning

The framework of this research is the trinity of the technologies mentioned in the main title of this dissertation and discussed in the previous section. The subtitle relates to the insight into the relation between computer vision and deep learning gained during this research. During experimentation

(5)

within this application framework, the particular usefulness of combining these two technologies became clear. This section elaborates more on this conclusion.

Nowadays many deep learning architectures exist and the number is still growing. Most of these have been designed to solve specific technical shortcomings of predecessor CNN models. For example, vanishing gradients are addressed by ResNet (Szegedy et al., 2017), spatial resolution problems are mitigated by U-Net (Ronneberger et al.,2015) and computational complexity is decreased by Xception (Chollet,2017). These architectures are general, complex and are applicable in many areas. This makes them very widespread and extremely successful regardless of the application. The question arises if, for some applications, simpler, less computationally complex models could suffice? This could potentially reduce the run time, amount of data needed and even the ecological footprint of deep learning for specific applications.

General architectures can learn to model highly complex functions and generalization methods like weight decay are used to forget details during training, which in turn reduces the complexity of the model and make it perform better on specific applications. If the model of a task is partly known and can be represented with a custom CNN architecture, then this CNN is more suited for the task it was designed for. This method of incorporating prior knowledge results in a simpler model which is less prone to over-fitting. In this dissertation a CNN for crosstalk correction and upscaling was designed for images produced by a specific hyperspectral camera. Prior knowledge about the structure and size of the MCFA, or mosaic, of the hyperspectral camera is used to set the hyperparameters of the underlying CNN model. This includes the convolution filter sizes, amounts and strides. In traditional computer vision a solution would be engineered for a specific application and most parameters would be set manually. In the proposed approach a solution is designed in terms of specific convolutions and its parameters are trained end-to-end using deep learning. This successful combination of computer vision and deep learning to reduce model complexity for a certain task has been discussed in this dissertation in Chapter3.

Deep learning has endeavored to remove the need for manual feature design and aims to learn solutions to practical problems in an end-to-end fashion without the need for traditional computer vision (LeCun et al.,

(6)

6.2. Computer vision and deep learning 145

1998; Krizhevsky et al., 2012). In this dissertation the usefulness of combining computer vision and deep learning has been shown by means of CentroidNet. The input image is preprocessed using a CNN so it can be easily processed by traditional computer vision algorithms to segment object instances. Where in the traditional tandem, computer vision is used to preprocess image information (Bougharriou et al., 2017;Suleiman and Sze, 2014), in the CentroidNet algorithm, traditional computer vision is used to postprocess information. This could be generalized to situations where in fully-engineered computer vision solutions, certain elements can be replaced with CNNs. With this method, difficult parts of the vision pipeline can be trained directly from data to improve the overall system perfomance. Difficult parts in this context can be elements which are hard to parameterize and which can be represented by CNNs. With this hybrid approach, in the computer vision parts, prior knowledge of the application is retained. This combination of computer vision and deep learning was successfully demonstrated in this dissertation in Chapter4

and Chapter5.

At the core of CentroidNet is the notion of trained 2D vectors. For each pixel in the image the CNN predicts two sets of vectors. One vector points to the nearest centroid and the other vector points to the nearest border of the object with the nearest centroid. Each vector can be considered a vote, and by aggregating votes the location of centroid and the delineation of the object is determined. This can be viewed as a variant of majority voting similar to a Hough transform or an ensemble of detectors. This approach has shown to be successful in other situations (Mukhopadhyay and Chaudhuri, 2015; Viola and Jones, 2001) and also contributes to the working of CentroidNet. In the ablation studies of CentroidNetV2 it was found that the loss function that was designed to better reflect the nature of the outputs of CentroidNet consistently achieved better results in predicting object instances. This shows that, when designing hybrid CNN models, like CentroidNet, in which the nature of the output of the algorithm changes, other parts of the training process should be redesigned accordingly. This was discussed in Chapter5.

In the work that has been discussed in this dissertation sometimes a relatively small amount of images was used or very deep neural network architectures with many parameters have been used. It is important that the number of samples and the number of parameters in the deep

(7)

learning network is balanced. Each architecture that has been used in this dissertation performs a pixel-to-pixel mapping of the input image to the output. In some cases a traditional sliding window is used (like in Chapter2) and in other cases a fully convolutional neural network is used to provide this mapping (Chapter 3, 4 and 5). This means that a single image logically consists of a large amount of samples from the perspective of the deep learning architecture. The number of samples from one image then depends on the size of the footprint of the model and the size of the input image. This is probably the reason why in the demosaicking experiments discussed in Chapter 3 no overtraining was observed. In those experiment both the footprint was small and the images were large.

When taking into account the break-through research and the multitude of new applications of deep learning for many classic vision tasks, it seems that the field of computer vision is mostly superseded by deep learning. However, the experiments in this dissertation have shown that designing custom deep learning algorithms for specific computer vision tasks and by combing both technologies, clearly provides an interesting direction for future research into deep learning.

6.3 Future work

Research into other challenges posed by the trinity of deep learning, hyperspectral imaging and UAVs can be performed in the future. In one direction, research could primarily focus on elements where all three technologies need to be combined. Alternatively, the definition of these three areas could be further broadened. For example, instead of deep learning, a more broader range of machine learning and artificial intelligence concepts could be included. Additionally, the hyperspectral imaging notion can be expanded to encompass multiple cameras to provide an even richer view of the electromagnetic spectrum to the deep learning approaches. For example, hyperspectral color cameras, short wave infrared cameras and thermal cameras could be combined. Along this line of thought the image sources can be expanded in multiple dimensions including 3D imaging and video. In future research UAVs could be synonym for “small platforms”, both in weight and computing power. This could include the small deployment platforms used in edge

(8)

6.3. Future work 147

computing like the Jetson series, Coral or Movidius1. The main research question can be restated to encompass this broader context: “How can artificial intelligence be utilized to mitigate the limitations imposed by small platforms that collect and process images from sources with high spatial, spectral and temporal dimensionality?”

In the next part of this section future research directions of the specific research topics that were discussed in this dissertation are proposed. In the concluding part of this section, possible future research into the combination of computer vision and deep learning is discussed.

From the 28-dimensional hyperspectral cube, three channels were identified for distinguishing between Alternaria and Ozone damage on potato-plant leaves in laboratory conditions. Future research could focus on testing this approach on potato plants in agricultural fields using UAVs. Additionally, using a similar methodology, research could focus on identifying other crop diseases like Phytophthora. Deep learning could then also be used to exploit key morphological features of the brownish lesions specific for certain crop diseases.

Filter mosaics are applied in many ways by sensor manufacturers. This includes regular Red Green Blue (RGB) Bayer filters and the hyperspectral MCFA sensors like the one used in this research, but also per-line-filter approaches are available. Recent camera sensors even use a mosaic of polarization filters combined with RGB filters. Future research could focus on crosstalk correction and upscaling within this family of mosaic filter patterns by using the approaches discussed in Chapter 3. Additionally, research could focus on using deep learning for upscaling beyond the native resolution of the imaging sensor using Hyperspectral Single Image Super Resolution (HSISR).

CentroidNet is a hybrid CNN for instance segmentation that is particularly suitable for counting objects in images. Future research could focus on replacing other parts of the algorithm with a CNN. For example, the voting space might be postprocessed with an additional deep learning module to provide clearer centroids which are easier to locate with traditional computer vision methods. This would reduce the amount of hyperparameters while still retaining the core design of centroid and border majority voting. Future research could investigate how, in other

(9)

traditional computer vision methods, parts can be replaced with deep learning to form new hybrid algorithms.

This dissertation started with the classic tandem of feature extraction to feed the machine learning models. Subsequently the idea of designing custom CNN architectures for specific applications was discussed. The final part of this work focused more on preprocessing images with CNNs and then using deterministic computer vision algorithms to do the final processing. As mentioned earlier this can be generalized to replacing parts of computer vision programs by CNNs which consequently allows these parts to be trained from data. This line of thought can be extended into a future where (parts of) traditional software algorithms can be replaced by their trainable counterparts. This does not necessarily have to be limited to redefining software parts in terms of convolutions. But when a software program is differentiable and there is enough training data available it can likely be trained. This would instigate a paradigm shift in how scientific software is developed. Future research could focus on how to develop software that is differentiable in a general sense so it can be trained with gradient descent methods. This is already an active field of research called Differentiable Programming (∂P). In a ∂P system, graphs are directly represented by the source code of the software program and can sometimes be compiled and optimized directly. An introduction into

∂P is given byInnes et al.(2019). In recent workRiba et al.(2019) provide

a differentiable computer vision library. These examples indicate that, in the future, the defining differences between the fields of computer vision, deep learning and, ultimately, software and algorithm development will probably fade when an increasing amount of software is developed to be trainable with gradient descent methods.

(10)

149

Bibliography

Aggarwal H.K. and Majumdar A., Single-sensor multi-spectral image demosaicing algorithm using learned interpolation weights. International Geoscience and Remote Sensing Symposium (IGARSS) (IEEE 2014), (pp. 2011–2014).

Al-Waisy A.S., Qahwaji R., Ipson S. and Al-Fahdawi S., A multimodal deep learning framework using local feature representations for face recognition. Machine Vision and Applications , (2017) pp. 1–20.

Badrinarayanan V., Kendall A. and Cipolla R., SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. Transactions on pattern analysis and machine intelligence 39(12), (2017) pp. 2481–2495.

Bai M. and Urtasun R., Deep watershed transform for instance segmentation. Conference on Computer Vision and Pattern Recognition (2017), (pp. 2858–2866).

Ballard D.H., Generalizing the hough transform to detect arbitrary shapes. Pattern recognition 13(2), (1981) pp. 111–122.

Bayer B.E., Color Imaging Array. U. S. patent 3971 065 (1976), (p. 10). Baygin M., Karakose M., Sarimaden A. and Akin E., An Image Processing

based Object Counting Approach for Machine Vision Application. Conference on Advances and Innovations in Engineering (2018), (pp. 966–970).

Behmann J., Mahlein A.K., Paulus S., Dupuis J., Kuhlmann H., Oerke E.C. and Plümer L., Generation and application of hyperspectral 3D plant models: methods and challenges. Machine Vision and Applications 27(5), (2016) pp. 611–624.

(11)

Berra E.F., Gaulton R. and Barr S., Commercial Off-the-Shelf Digital Cameras on Unmanned Aerial Vehicles for Multitemporal Monitoring of Vegetation Reflectance and NDVI. Transactions on Geoscience and Remote Sensing, volume 55 (2017), (pp. 4878–4886).

Bishop C.M., Pattern Recognition and Machine Learning (Information Science and Statistics) (Springer-Verlag, Berlin, Heidelberg 2006).

Bougharriou S., Hamdaoui F. and Mtibaa A., Linear svm classifier based hog car detection. 2017 18th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA) (IEEE 2017), (pp. 241–245).

Chang C.C. and Lin C.J., LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, (2011) pp. 1–27. Cheema G.S. and Anand S., Automatic detection and recognition of individuals in patterned species. Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer 2017), (pp. 27–38).

Chen B. and Miao X., Distribution line pole detection and counting based on yolo using uav inspection line video. Journal of Electrical Engineering & Technology , (2019) pp. 1–8.

Chen H., Qi X., Yu L. and Heng P.A., Dcan: deep contour-aware networks for accurate gland segmentation. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (2016a), (pp. 2487–2496).

Chen L.C., Papandreou G., Kokkinos I., Murphy K. and Yuille A.L., DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. International Conference on Learning Representations , (2016b) pp. 834–848.

Chen L.C., Zhu Y., Papandreou G., Schroff F. and Adam H., Encoder-decoder with atrous separable convolution for semantic image segmentation. European Conference on Computer Vision (2018), (pp. 801–818).

(12)

BIBLIOGRAPHY 151

Chollet F., Xception: Deep learning with depthwise separable convolutions. Conference on Computer Vision and Pattern Recognition (2017), (pp. 1800–1807).

Cohen J.P., Boucher G., Glastonbury C.A., Lo H.Z. and Bengio Y., Count-ception: Counting by Fully Convolutional Redundant Counting. International Conference on Computer Vision (2017), (pp. 18–26).

Cortes C. and Vapnik V., Support-vector networks. Machine learning 20(3), (1995) pp. 273–297.

Couprie C., Farabet C., Najman L. and Lecun Y., Convolutional nets and watershed cuts for real-time semantic labeling of RGBD videos. Journal of Machine Learning Research 15(1), (2014) pp. 3489–3511.

Dai F., Liu H., Ma Y., Cao J., Zhao Q. and Zhang Y., Dense scale network for crowd counting. arXiv preprint arXiv:1906.09707 (2019) .

Dalal N. and Triggs B., Histograms of oriented gradients for human detection. Conference on Computer Vision and Pattern Recognition (2005), (pp. 886–893).

de Boer J., Barbany M.J., Dijkstra M.R., Dijkstra K. and van De Loosdrecht J., Twirre V2: Evolution of an architecture for automated mini-UAVs using interchangeable commodity components. International Micro Air Vehicle Conference and Competition (2015).

Degraux K., Cambareri V., Jacques L., Geelen B., Blanch C. and Lafruit G., Generalized inpainting method for hyperspectral image acquisition. Proceedings - International Conference on Image Processing, ICIP, volume 2015 (2015), (pp. 315–319).

Deng L., Li J., Huang J.T., Yao K., Yu D., Seide F., Seltzer M., Zweig G., He X., Williams J. et al., Recent advances in deep learning for speech research at microsoft. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE 2013), (pp. 8604–8608).

Dijkstra K., van de Loosdrecht J., Schomaker L.R. and Wiering M.A., CentroidNet: A deep neural network for joint object localization and counting. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (2018a), (pp. 585–601).

(13)

Dijkstra K., van de Loosdrecht J., Schomaker L.R. and Wiering M.A., Hyperspectral demosaicking and crosstalk correction using deep learning. Machine Vision and Applications 30(1), (2018b) pp. 1–21.

Dijkstra K., van de Loosdrecht J., Schomaker L.R.B. and Wiering M.A., Hyper-spectral frequency selection for the classification of vegetation diseases. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2017).

Dong C., Loy C.C., He K. and Tang X., Image Super-Resolution Using Deep Convolutional Networks. Transactions on Pattern Analysis and Machine Intelligence 38(2), (2016) pp. 295–307.

Eichhardt I., Chetverikov D. and Jankó Z., Image-guided ToF depth upsampling: a survey. Machine Vision and Applications 28(3-4), (2017) pp. 267–282.

Elsken T., Metzen J.H. and Hutter F., Neural Architecture Search: A Survey. Journal of Machine Learning Research 20(55), (2019) pp. 1–21.

Erhan D., Bengio Y., Courville A., Manzagol P.A., Vincent P. and Bengio S., Why Does Unsupervised Pre-training Help Deep Learning? Journal of Machine Learning Research 11, (2010) pp. 625–660.

Ferrari A., Lombardi S. and Signoroni A., Bacterial colony counting with Convolutional Neural Networks. Conference of the IEEE Engineering in Medicine and Biology Society (2015), (pp. 7458–7461).

Fukushima K., Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics 36(4), (1980) pp. 193–202.

Galteri L., Seidenari L., Bertini M. and Del Bimbo A., Deep Generative Adversarial Compression Artifact Removal. International Conference on Computer Vision , (2017) pp. 4826–4835.

Geelen B., Blanch C., Gonzalez P., Tack N. and Lambrechts A., A tiny VIS-NIR snapshot multispectral camera. Advanced Fabrication Technologies for Micro/Nano Optics and Photonics, volume 9374. International Society for Optics and Photonics (International Society for Optics and Photonics 2015).

(14)

BIBLIOGRAPHY 153

Gharbi M., Chaurasia G., Paris S. and Durand F., Deep joint demosaicking and denoising. ACM Transactions on Graphics 35(6), (2016) pp. 1–12. Goodfellow I., Bengio Y. and Courville A., Deep Learning (MIT Press 2016).

URL: http://www.deeplearningbook.org

Goyal P., Dollár P., Girshick R., Noordhuis P., Wesolowski L., Kyrola A., Tulloch A., Jia Y. and He K., Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. arXiv preprint arXiv:1706.02677 (2017) .

Grigorescu S., Trasnea B., Cocias T. and Macesanu G., A survey of deep learning techniques for autonomous driving. Journal of Field Robotics 37(3), (2020) pp. 362–386.

Hallermann N. and Morgenthal G., Unmanned aerial vehicles (uav) for the assessment of existing structures. IABSE Symposium Report, volume 101 (International Association for Bridge and Structural Engineering 2013), (pp. 1–8).

He K., Gkioxari G., Dollar P. and Girshick R., Mask R-CNN. Conference on Computer Vision and Pattern Recognition (2017), (pp. 2980–2988).

Hirakawa K., Cross-talk explained. Proceedings - International Conference on Image Processing, ICIP (IEEE 2008), (pp. 677–680).

Hopfield J.J., Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences of the USA, volume 79 (1982), (p. 2554–2558).

Hsieh M.R., Lin Y.L. and Hsu W.H., Drone-Based Object Counting by Spatially Regularized Regional Proposal Network. Conference on Computer Vision and Pattern Recognition (2017), (pp. 4165–4173).

Hubel D.H. and Wiesel T.N., Receptive fields of single neurones in the cat’s striate cortex. The Journal of physiology 148(3), (1959) pp. 574–591.

Innes M., Edelman A., Fischer K., Rackauckas C., Saba E., Shah V.B. and Tebbutt W., A differentiable programming system to bridge machine learning and scientific computing. arXiv preprint arXiv:1907.07587 (2019) .

(15)

Jetley S., Sapienza M., Golodetz S. and Torr P.H., Straight to shapes: Real-time detection of encoded shapes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), (pp. 6550–6559). Jia Y., Shelhamer E., Donahue J., Karayev S., Long J., Girshick R.,

Guadarrama S. and Darrell T., Caffe: Convolutional architecture for fast feature embedding. International Conference on Multimedia (ACM 2014), (pp. 675–678).

Kainz P., Urschler M., Schulter S., Wohlhart P. and Lepetit V., You Should Use Regression to Detect Cells. International Conference on Medical Image Computing and Computer-Assisted Intervention (Springer, Cham 2015), (pp. 276–283).

Karras T., Laine S. and Aila T., A Style-Based Generator Architecture for Generative Adversarial Networks. Conference on Computer Vision and Pattern Recognition (2019), (pp. 4401–4410).

Kass M., Witkin A. and Terzopoulos D., Snakes: Active contour models. International Journal of Computer Vision 1(4), (1988) pp. 321–331.

Keren D. and Osadchy M., Restoring subsampled color images. Machine Vision and Applications 11(4), (1999) pp. 197–202.

Khan A.U.M., Torelli A., Wolf I. and Gretz N., AutoCellSeg: Robust automatic colony forming unit (CFU)/cell analysis using adaptive image segmentation and easy-to-use post-editing techniques. Nature Scientific Reports 8(1), (2018) p. 7302.

Krizhevsky A., Sutskever I. and Hinton G.E., ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems (2012), (pp. 1097–1105).

LeCun Y., Bengio Y. and Hinton G., Deep learning. Nature 521(7553), (2015) p. 436.

LeCun Y., Bottou L., Bengio Y. and Haffner P., Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), (1998) pp. 2278–2324.

(16)

BIBLIOGRAPHY 155

Lee H., Grosse R., Ranganath R. and Ng A.Y., Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. International Conference on Machine Learning (ACM 2009), (pp. 609–616).

Li Y., Zhang X. and Chen D., CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes. Conference on Computer Vision and Pattern Recognition (2018), (pp. 1092–1100). Liang X., Lin L., Wei Y., Shen X., Yang J. and Yan S., Proposal-free network

for instance-level object segmentation. IEEE transactions on pattern analysis and machine intelligence 40(12), (2017) pp. 2978–2991.

Lighthill J. (1973). Artificial intelligence: A general survey.

Lin H.W., Tegmark M. and Rolnick D., Why Does Deep and Cheap Learning Work So Well? Journal of Statistical Physics 168(6), (2017a) pp. 1223–1247.

Lin T.Y., Dollár P., Girshick R., He K., Hariharan B. and Belongie S., Feature pyramid networks for object detection. Conference on Computer Vision and Pattern Recognition (2017b), (pp. 2117–2125).

Lin T.Y., Goyal P., Girshick R., He K. and Dollár P., Focal loss for dense object detection. Proceedings of the IEEE international conference on computer vision (2017c), (pp. 2980–2988).

Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C.Y. and Berg A.C., SSD: Single shot multibox detector. European Conference on Computer Vision (Springer 2016), (pp. 21–37).

Long J., Shelhamer E. and Darrell T., Fully Convolutional Networks for Semantic Segmentation. Conference on Computer Vision and Pattern Recognition (2015), (pp. 3431–3440).

Lowe D.G., Three-dimensional object recognition from single two-dimensional images. Artificial intelligence 31(3), (1987) pp. 355–395. Lowe D.G., Object recognition from local scale-invariant features.

International Conference on Computer Vision, volume 99 (1999), (pp. 1150–1157).

(17)

Luebke D., Harris M., Govindaraju N., Lefohn A., Houston M., Owens J., Segal M., Papakipos M. and Buck I., GPGPU: general-purpose computation on graphics hardware. Conference on Supercomputing (ACM 2006), (p. 208).

Marr D., Vision: A computational investigation into the human representation and processing of visual information (MIT Press 1982).

Mazur M., Six ways drones are revolutionizing agriculture. MIT Technology Review .

Mihoubi S., Losson O., Mathon B. and Macaire L., Multispectral demosaicing using intensity-based spectral correlation. International Conference on Image Processing, Theory, Tools and Applications (IEEE 2015), (pp. 461–466).

Milletari F., Ahmadi S.A., Kroll C., Plate A., Rozanski V., Maiostre J., Levin J., Dietrich O., Ertl-Wagner B., Bötzel K. and Navab N., Hough-CNN: Deep Learning for Segmentation of Deep Brain Regions in MRI and Ultrasound. Computer Vision and Image Understanding , (2017) pp. 92–102. Minsky M. and Papert S., Perceptrons (MIT Press 1969).

Mohanty S.P., Hughes D.P. and Salathé M., Using deep learning for image-based plant disease detection. Frontiers in Plant Science 7, (2016) p. 1419.

Monno Y., Tanaka M. and Okutomi M., Multispectral demosaicking using guided filter. Digital Photography VIII, volume 8299. International Society for Optics and Photonics (SPIE 2012), (pp. 204 – 210).

Mukhopadhyay P. and Chaudhuri B.B., A survey of Hough Transform. Pattern Recognition 84(3), (2015) pp. 993–1010.

Mustaniemi J., Kannala J. and Heikkilä J., Parallax correction via disparity estimation in a multi-aperture camera. Machine Vision and Applications 27(8), (2016) pp. 1313–1323.

New York Times, New navy device learns by doing. New York Times , (1958) p. 23.

(18)

BIBLIOGRAPHY 157

Nguyen H.T., Wistuba M. and Schmidt-Thieme L., Personalized tag recommendation for images using deep transfer learning. Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer 2017), (pp. 705–720).

Özlü A. (2018). TensorFlow Object Counting API.

URL: https://github.com/ahmetozlu/tensorflow_object_counting_api

O’Mahony N., Campbell S., Carvalho A., Harapanahalli S., Hernandez G.V., Krpalkova L., Riordan D. and Walsh J., Deep learning vs. traditional computer vision. Science and Information Conference (Springer 2019), (pp. 128–144).

Paredes J.A., Gonzalez J., Saito C. and Flores A., Multispectral imaging system with UAV integration capabilities for crop analysis. International Symposium of Geoscience and Remote Sensing (IEEE 2017), (pp. 1–4). Pathak A.R., Pandey M. and Rautaray S., Application of Deep Learning for

Object Detection. Procedia Computer Science 132, (2018) pp. 1706–1717. Peng J., Hon B.Y.C. and Kong D., A structural low rank regularization

method for single image super-resolution. Machine Vision and Applications 26(7-8), (2015) pp. 991–1005.

Pietikäinen M., Hadid A., Zhao G. and Ahonen T., Computer vision using local binary patterns, volume 40 (Springer Science & Business Media 2011).

Pullanagari R.R., Kereszturi G. and Yule I.J., Mapping of macro and micro nutrients of mixed pastures using airborne AisaFENIX hyperspectral imagery. Journal of Photogrammetry and Remote Sensing 117, (2016) pp. 1–10.

Radoglou-Grammatikis P., Sarigiannidis P., Lagkas T. and Moscholios I., A compilation of uav applications for precision agriculture. Computer Networks 172, (2020) p. 107148.

Rahman M.A. and Wang Y., Optimizing intersection-over-union in deep neural networks for image segmentation. International Symposium on Visual Computing (2016), (pp. 234–244).

(19)

Rasmussen J., Ntakos G., Nielsen J., Svensgaard J., Poulsen R.N. and Christensen S., Are vegetation indices derived from consumer-grade cameras mounted on UAVs sufficiently reliable for assessing experimental plots? European Journal of Agronomy 74, (2016) pp. 75–92.

Rebetez J., Satizábal H.F., Mota M., Noll D., Büchi L., Wendling M., Cannelle B., Perez-Uribe A. and Burgos S., Augmenting a convolutional neural network with local histograms - a case study in crop classification from high-resolution UAV imagery. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2016). Redmon J. and Farhadi A., YOLO9000: better, faster, stronger. Conference

on Computer Vision and Pattern Recognition (2017), (pp. 7263–7271). Redmon J. and Farhadi A., Yolov3: An incremental improvement. arXiv

preprint arXiv:1804.02767 (2018) .

Ren M. and Zemel R.S., End-to-end instance segmentation with recurrent attention. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), (pp. 6656–6664).

Ren P., Fang W. and Djahel S., A novel yolo-based real-time people counting approach. International Smart Cities Conference (ISC2) (IEEE 2017), (pp. 1–2).

Ren S., He K., Girshick R. and Sun J., Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Advances in Neural Information Processing Systems (2015), (pp. 1137–1149).

Riba E., Mishkin D., Ponsa D., Rublee E. and Bradski G., Kornia: an open source differentiable computer vision library for pytorch (2019).

Ronneberger O., Fischer P. and Brox T., U-net: Convolutional networks for biomedical image segmentation. Conference on Medical Image Computing and Computer-Assisted Intervention (2015), (pp. 234–241).

Rosenblatt F., The Perceptron: A Probabilistic Model for Information Storage and Organization in The Brain. Psychological Review , (1958) pp. 65–386.

(20)

BIBLIOGRAPHY 159

Sauget V. and Hubert M., Application note for CMS camera and CMS sensor users : Post - processing method for crosstalk reduction in multispectral data and images . Advanced Fabrication Technologies for Micro/Nano Optics and Photonics , (2016) pp. 1–8.

Schmidhuber J., Deep learning in neural networks: An overview. Neural networks 61, (2015) pp. 85–117.

Schmidt U., Weigert M., Broaddus C. and Myers G., Cell detection with star-convex polygons. International Conference on Medical Image Computing and Computer-Assisted Intervention (Springer 2018), (pp. 265–273).

Shelhamer E., Long J. and Darrell T., Fully Convolutional Networks for Semantic Segmentation. Transactions on Pattern Analysis and Machine Intelligence 39(4), (2017) pp. 640–651.

Shi J. and Malik J., Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence, volume 22 (2000), (pp. 888–905).

Simic Milas A., Romanko M., Reil P., Abeysinghe T. and Marambe A., The importance of leaf area index in mapping chlorophyll content of corn under different agricultural treatments using uav images. International Journal of Remote Sensing 39(15-16), (2018) pp. 5415–5431.

Srivastava N., Hinton G., Krizhevsky A., Sutskever I. and Salakhutdinov R., Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research 15, (2014) pp. 1929–1958. Stahl T., Pintea S.L. and Van Gemert J.C., Divide and Count: Generic Object Counting by Image Divisions. IEEE Transactions on Image Processing 28, (2019) pp. 1035–1044.

Statt N., The ai boom is happening all over the world, and it’s accelerating quickly. The Verge .

Suleiman A. and Sze V., Energy-efficient hog-based object detection at 1080hd 60 fps with multi-scale support. 2014 IEEE Workshop on Signal Processing Systems (SiPS) (IEEE 2014), (pp. 1–6).

(21)

Szegedy C., Ioffe S., Vanhoucke V. and Alemi A.A., Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. Conference on Artificial Intelligence (2017), (pp. 4278–4284).

Szegedy C., Liu W., Jia Y., Sermanet P., Reed S., Anguelov D., Erhan D., Vanhoucke V. and Rabinovich A., Going deeper with convolutions. Conference on Computer Vision and Pattern Recognition (2015), (pp. 1–9). Thomas J. and Gausman H., Leaf reflectance vs. leaf chlorophyll and

carotenoid concentrations for eight crops. Agronomy journal 69(5), (1977) pp. 799–802.

Turkensteen L.J., Spoelder J. and Mulder A., Will the real Alternaria stand up please: Experiences with Alternaria–like diseases on potatoes during the 2009 growing season in The Netherlands. PPO-Special Report no. 14. Proc. 12th EuroBlight Workshop. France , (2010) pp. 165 – 170.

van Beers F., Lindstrom A., Okafor E. and Wiering M.A., Deep Neural Networks with Intersection over Union Loss for Binary Image Segmentation. Conference on Pattern Recognition Applications and Methods (2019), (pp. 438–445).

van de Loosdrecht J., Dijkstra K., Postma J.H., Keuning W. and Bruin D., Twirre: Architecture for autonomous mini-UAVs using interchangeable commodity components. IMAV 2014: International Micro Air Vehicle Conference and Competition (2014), (pp. 26–33).

Venables W.N. and Ripley B.D., Modern Applied Statistics with S, volume Fourth edition (Springer 2002).

Viola P. and Jones M., Rapid object detection using a boosted cascade of simple features. Computer Vision and Pattern Recognition, volume 1 (IEEE 2001), (pp. 511–518).

Wan J., Luo W., Wu B., Chan A.B. and Liu W., Residual regression with semantic prior for crowd counting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019), (pp. 4036–4045).

Wang D., Yu G., Zhou X. and Wang C., Image demosaicking for Bayer-patterned CFA images using improved linear interpolation. 2017

(22)

BIBLIOGRAPHY 161

Seventh International Conference on Information Science and Technology (ICIST) (2017), (pp. 464–469).

Wang T., Celik K. and Somani A.K., Characterization of mountain drainage patterns for GPS-denied UAS navigation augmentation. Machine Vision and Applications 27(1), (2016) pp. 87–101.

Wang Y.Q., A multilayer neural network for image demosaicking. International Conference on Image Processing (IEEE 2014), (pp. 1852–1856). Wang Z., Bovik A.C., Sheikh H.R. and Simoncelli E.P., Image quality

assessment: From error visibility to structural similarity. Transactions on Image Processing 13(4), (2004) pp. 600–612.

Wilkinson S., Mills G., Illidge R. and Davies W.J., How is ozone pollution reducing our food supply? Journal of Experimental Botany 63(2), (2012) pp. 527–536.

Wu Z., Shen C. and Hengel A.v.d., Bridging category-level and instance-level semantic image segmentation. arXiv preprint arXiv:1605.06885 (2016) .

Xie W., Noble J.A. and Zisserman A., Microscopy cell counting and detection with fully convolutional regression networks. Computer methods in biomechanics and biomedical engineering: Imaging & Visualization 6(3), (2018) pp. 283–292.

(23)