missed manholes [%]

(1)

URBAN - WP #2: Object recognition

Multi-view Traffic Sign Detection,

Recognition and 3D Localisation

(2)

Problem definition

¾Input: Large set of views and corresponding camera locations/calibrations/poses

(3)

Outline

Single view

• Segmentation – rapidly select bounding boxes that may contain traffic signs.

– Traffic signs are designed to be well distinguishable from background and therefore have distinctive

colors and shapes.

• Detection – classify segmented bounding boxes by AdaBoost cascades.

• Recognition – determine specific signs using SVM classifiers.

Multi-view

• Global optimization – constrain single-view detections using 3D geometry

(4)

Color-based segmentation

(5)

Shape-based segmentation

• Not all traffic signs are separable via local color threshold.

• Instead, search for specific shapes (rectangles, circles, triangles).

– Can be time consuming

(6)

Learning segmentation

• There are thousands of possible settings of such methods, e.g. different projections from color space. • Learning is searching for a reasonable subset of these

methods/settings.

• Optimal trade-off among FN, FP and the number of

methods:

T* = argmin (FP(T) + K₁FN(T)+K₂card(T))

• Boolean Linear Programming selects ≈ 50 methods out of 10000 in 2 hours.

• Segmentation results for example:

(7)

(8)

Detection

• Detection: select bounding boxes most likely to be a traffic sign.

– Haar-like features computed on each channel of HSI space. – Separated shape-specific cascades of Adaboost classifiers.

(9)

(10)

(11)

Results

• The summary of 3D results:

(12)

(13)

(14)

Multi-view Manhole Detection,

Recognition and 3D Localisation

(15)

Problem definition

¾Input: Large set of views and corresponding camera locations/calibrations/poses

(16)

Manholes

¾Large variety of manhole patterns around the world. ¾We use texture models for manhole validation. For each new region, we train new texture models.

(17)

Outline

Single view

• Segmentation – fast segment selection process with very few missed manholes.

– Manholes are usually distinguishable from the surrounding environment => have distinctive textures, shapes, symmetry.

– Mean shift method is employed for color segmentation.

• Detection – classifiers based on histograms of Local Binary Patterns as texture descriptors.

Multi-view

• Global optimization – over single-view detections constrained by 3D geometry

(18)

Edge Detection and Image Segmentation

• The image is projected on the estimated ground plane. • Edge detection and mean shift1 _{in L*u*v* color space}

are combined for segmentation

Original image Ground plane projection Segmented image → →

1 _{D.Comaniciu, P.Meer, “Mean shift: A robust approach toward feature}

(19)

Detection

• Local Binary Patterns2 _{are used as a texture descriptor model}

• Radial symmetry3 _{is exploited for pruning.}

• Each segment is classified according its LBP histogram as manhole or background.

2_{T.Ojala et al, “Multiresolution gray-scale and rotation invariant texture classication}

with Local Binary Patterns”, PAMI, 2002

3_{G.Loy and A. Zelinsky, “Fast Radial Symmetry for Detecting Points of Interest”,}

PAMI, 2003 Segmented image Radial symmetry Texture image Projected image + + =

(20)

3D Localisation

• Single-view manhole detections are grouped under 3D geometric constraints.

Projected image Localised manhole

(21)

Evaluation

• 317 manholes and 270 non-manholes images in testing set. • Detection rate increases with the number of views available

for each manhole.

• Single-view detection rate is about 41%,

• Multi-view evaluation achieves 97% manhole detection rate, with very few false positives.

0 1 2 3 4 5 6 7 8 9 10 0 10 20 30 40 50 60 70 80 90 100

accepted backgrounds per image

missed manholes [%]

each view individually 1 view per manhole 2 views per manhole 3 views per manhole 4 views per manhole

(22)

(23)

Integrating Object Detection with

3D Tracking Towards a Better

Driver Assistance System

Radu Timofte, Karel Zimmermann, Luc van Gool

VISICS, ESAT-PSI/IBBT

Katholieke Universiteit Leuven

Victor A. Prisacariu, Ian Reid

Active Vision Laboratory University of Oxford

(24)

Problem definition

¾Input: video stream from a single front camera on the vehicle

¾Output: list of tracks assigned to detected traffic signs ¾Need to process in real-time

(25)

Importance

¾Tracking provides a consistent label over time, reduces the searching space for traffic sign detector.

¾3D pose estimation gives orientation alerting the driver if a sign is facing the car.

(26)

Outline

• The still image processing is similar to the one done by us on “traffic sign 3D mapping”1_:

– fast segmentation by optimal set of thresholding methods – pruning of candidates by AdaBoost cascades

– hierarchy of SVM classifiers for recognition

1 _{R. Timofte, K. Zimmermann, and L. van Gool,“Multi-view traffic sign}

(27)

Outline

• The core of our tracking is the Pixel-wise posterior 3D (PWP3D) algorithm2

2 _{V. Prisacariu and I. Reid,”Pwp3d: Real-time segmentation and tracking of 3d}

(28)

Evaluation

System performance while tracking a sign over 70m, with just the 4 point pose recovery (RPP) and with the tracker (PWP)

(29)

Real-time performance

• The CPU C++ implementation of the detection phase takes ~50ms on images of 640x480 pixels resolution. • The GPU based tracking needs up to 20ms per object.

(30)

(31)

Conclusions

We tackled and provided solutions for: Single view

- traffic sign detection and recognition - manhole detection and recognition Multi-view

- traffic signs 3D mapping - manholes 3D mapping Real-time

- traffic sign detection and recognition - traffic sign 3D pose tracking

(32)

Questions?

0 1 2 3 4 0 10 20 30 40 50 60 70 80 90 100 missed manholes [%]