Prof. Luc Van GoolTRACE

(1)

^.

high hopes deep learning

Prof. Luc Van Gool TRACE

(Toyota Research on Automated Cars in Europe)

(2)

The times they are a changin… and fast

• In recent years, a tsunami of deep learning has taken hold of much of computer vision (and signal processing in general) .

• These Convolutional Neural Networks bear similarities to what the brain does.

• In this talk: CNNs as recent step in a wider process, and (probably wrong) predictions of things to come.

• Object class recognition will be taken as a case in point

(3)

Object class recognition

• In recent years, there has been a shift away from the

recognition of specific objects towards the recognition of instances of object classes

• This is quite a feat, as on top of the challenges with specific objects, there is the additional variability among the

instances of the same object class

(4)

Illumination Object pose Clutter

Viewpoint Occlusions

Object class recognition

(5)

Illumination Object pose Clutter

Viewpoint Intra-class variation

Occlusions

Object class recognition

(6)

Intra-class and inter-class variation

The difference between classes can be as small as that between instances of the same class … yet the distinction needs to be made

Object class recognition

(7)

Object class recognition

• A tale of letting data speak

• Recognition pipelines require - feature design

- feature selection

- a classifier

(8)

Object class recognition

• In the very early expert systems, AI inspired

• - feature design - manually

- feature selection - manually - a classifier - manually

e.g. early decision trees

(9)

Object class recognition

• In somewhat later systems, PR inspired

• - feature design - manually

- feature selection - manually - a classifier - automatically

e.g. Support Vector Machines

(10)

Object class recognition

• In several systems of about 7 years old

• - feature design - manually

- feature selection - automatically - a classifier - automatically

e.g. Adaboost, Random Forests

(11)

Object class recognition

• In recent deep learning systems

• - feature design - automatically

- feature selection - automatically - a classifier - automatically

e.g. Convolutional Neural Nets

(12)

Object class recognition

Side-by-side: an Adaboost pipeline using hand-designed features,

and the same Adaboost algorithm trained on features automatically

extracted for the class `pedestrian’.

(13)

Object class recognition

(14)

Object class recognition

(15)

Excellent for more than recognition

This system beats the medical experts…

(16)

NON-LINEARITY (`RELU’) NON-LINEARITY

(`RELU’) POOLING POOLING SET OF

CONVOLUTION FILTERS SET OF CONVOLUTION

FILTERS

1 layer of a CNN

(17)

SET OF CONVOLUTION

FILTERS SET OF CONVOLUTION

FILTERS

1 layer of a CNN

(18)

SET OF CONVOLUTION

FILTERS SET OF CONVOLUTION

FILTERS

1 layer of a CNN

(19)

NON-LINEARITY (`RELU’) NON-LINEARITY

(`RELU’) SET OF

CONVOLUTION FILTERS SET OF CONVOLUTION

FILTERS

1 layer of a CNN

(20)

NON-LINEARITY (`RELU’) NON-LINEARITY

(`RELU’) SET OF

CONVOLUTION FILTERS SET OF CONVOLUTION

FILTERS

1 layer of a CNN

(21)

NON-LINEARITY (`RELU’) NON-LINEARITY

(`RELU’) POOLING POOLING SET OF

CONVOLUTION FILTERS SET OF CONVOLUTION

FILTERS

1 layer of a CNN

(22)

NON-LINEARITY (`RELU’) NON-LINEARITY

(`RELU’) POOLING POOLING SET OF

CONVOLUTION FILTERS SET OF CONVOLUTION

FILTERS

1 layer of a CNN

(23)

NON-LINEARITY (`RELU’) NON-LINEARITY

(`RELU’) POOLING POOLING SET OF

CONVOLUTION FILTERS SET OF CONVOLUTION

FILTERS

1 layer of a CNN

Such a CNN layer shares several characteristics with the

neuronal 6-layer architecture omni-present in the brain,

but also lacks some

(24)

The good - easy, generic, and top performance in many areas

Deep learning lets the data speak, and has millions of parameters to do it

The good, the bad, and the

ugly

(25)

The bad – lots of training data needed

CNNs are very data hungry – possibly needing more than is available – and requires annotations !

This can be mitigated via

1) Pre-training on other data, then refining for the directly relevant examples 2) Semi-supervised or unsupervised techniques

3) GANs (generative adversarial networks) Nonetheless, this remains a serious hurdle

The good, the bad, and the

ugly

(26)

The ugly – all in all… a black box

The good, the bad, and the ugly

No grand theory of deep learning yet

What e.g. with `right to explanation’ ? (now mainly about automatically made decisions

about individual users, as part of data protection… but this may also come to hit other

areas such as automated driving)

(27)

To be expected… and already happening

Stronger extensions towards video Move towards less supervision

Doing away with the silo approach to tasks and modalities Adding intra-layer connections

Adding feedback

Adding mid-level, configuration sensitive representations (symmetry, layout, gestalts, …)

(28)

To be expected…

Joint semantic segmentation, instance labeling, and depth extraction network, on 1 GPU @ 21 fps

Better than individual, task-specific CNNs !

Sharing computation parts, such that faster overall.

More of such combinations are to

be expected.

(29)

To be expected…

(30)

I mentioned similarities between CNNs and brains… and mentioned things brains have and could be added to CNNs

It is an open Q whether adding those then also does away with differences...

like: deeper CNNs with smaller convolution kernels work better than shallow ones with larger convolution kernels…

but the brain has to limit depth because of speed, and receptive fields

tend to be quite large when taking all effects into account… (e.g. excitatory and inhibitory surrounds)

To be expected…

(31)

THANKS FOR YOUR

ATTENTION !!!

(32)

To be expected…

Talking about automated cars…

A critical jump between ADAS and a souvereign car is to be made... Hardly mentioned.

We work on measures to smoothen that transition.

Short latencies are important, but in general not enough, as this would lead to very abrupt behaviour by the car.

Some conditions are difficult but rare, like extreme weather. Synthetic data generation can help to generalize towards such conditions, i.e. to generate sufficient training material.

Speech is a useful modality to give orders to cars of the future, like wanting to take a break.

(33)