URBAN - WP #2: Object recognition
Multi-view Traffic Sign Detection,
Recognition and 3D Localisation
Problem definition
¾Input: Large set of views and corresponding camera locations/calibrations/poses
Outline
Single view
• Segmentation – rapidly select bounding boxes that may contain traffic signs.
– Traffic signs are designed to be well distinguishable from background and therefore have distinctive
colors and shapes.
• Detection – classify segmented bounding boxes by AdaBoost cascades.
• Recognition – determine specific signs using SVM classifiers.
Multi-view
• Global optimization – constrain single-view detections using 3D geometry
Color-based segmentation
Shape-based segmentation
• Not all traffic signs are separable via local color threshold.
• Instead, search for specific shapes (rectangles, circles, triangles).
– Can be time consuming
Learning segmentation
• There are thousands of possible settings of such methods, e.g. different projections from color space. • Learning is searching for a reasonable subset of these
methods/settings.
• Optimal trade-off among FN, FP and the number of
methods:
T* = argmin (FP(T) + K1FN(T)+K2card(T))
• Boolean Linear Programming selects ≈ 50 methods out of 10000 in 2 hours.
• Segmentation results for example:
Detection
• Detection: select bounding boxes most likely to be a traffic sign.
– Haar-like features computed on each channel of HSI space. – Separated shape-specific cascades of Adaboost classifiers.
Results
• The summary of 3D results:
URBAN - WP #2: Object recognition
Multi-view Manhole Detection,
Recognition and 3D Localisation
Problem definition
¾Input: Large set of views and corresponding camera locations/calibrations/poses
Manholes
¾Large variety of manhole patterns around the world. ¾We use texture models for manhole validation. For each new region, we train new texture models.
Outline
Single view
• Segmentation – fast segment selection process with very few missed manholes.
– Manholes are usually distinguishable from the surrounding environment => have distinctive textures, shapes, symmetry.
– Mean shift method is employed for color segmentation.
• Detection – classifiers based on histograms of Local Binary Patterns as texture descriptors.
Multi-view
• Global optimization – over single-view detections constrained by 3D geometry
Edge Detection and Image Segmentation
• The image is projected on the estimated ground plane. • Edge detection and mean shift1 in L*u*v* color space
are combined for segmentation
Original image Ground plane projection Segmented image → →
1 D.Comaniciu, P.Meer, “Mean shift: A robust approach toward feature
Detection
• Local Binary Patterns2 are used as a texture descriptor model
• Radial symmetry3 is exploited for pruning.
• Each segment is classified according its LBP histogram as manhole or background.
2T.Ojala et al, “Multiresolution gray-scale and rotation invariant texture classication
with Local Binary Patterns”, PAMI, 2002
3G.Loy and A. Zelinsky, “Fast Radial Symmetry for Detecting Points of Interest”,
PAMI, 2003 Segmented image Radial symmetry Texture image Projected image + + =
3D Localisation
• Single-view manhole detections are grouped under 3D geometric constraints.
Projected image Localised manhole
Evaluation
• 317 manholes and 270 non-manholes images in testing set. • Detection rate increases with the number of views available
for each manhole.
• Single-view detection rate is about 41%,
• Multi-view evaluation achieves 97% manhole detection rate, with very few false positives.
0 1 2 3 4 5 6 7 8 9 10 0 10 20 30 40 50 60 70 80 90 100
accepted backgrounds per image
missed manholes [%]
each view individually 1 view per manhole 2 views per manhole 3 views per manhole 4 views per manhole
URBAN - WP #2: Object recognition
Integrating Object Detection with
3D Tracking Towards a Better
Driver Assistance System
Radu Timofte, Karel Zimmermann, Luc van Gool
VISICS, ESAT-PSI/IBBT
Katholieke Universiteit Leuven
Victor A. Prisacariu, Ian Reid
Active Vision Laboratory University of Oxford
Problem definition
¾Input: video stream from a single front camera on the vehicle
¾Output: list of tracks assigned to detected traffic signs ¾Need to process in real-time
Importance
¾Tracking provides a consistent label over time, reduces the searching space for traffic sign detector.
¾3D pose estimation gives orientation alerting the driver if a sign is facing the car.
Outline
• The still image processing is similar to the one done by us on “traffic sign 3D mapping”1:
– fast segmentation by optimal set of thresholding methods – pruning of candidates by AdaBoost cascades
– hierarchy of SVM classifiers for recognition
1 R. Timofte, K. Zimmermann, and L. van Gool,“Multi-view traffic sign
Outline
• The core of our tracking is the Pixel-wise posterior 3D (PWP3D) algorithm2
2 V. Prisacariu and I. Reid,”Pwp3d: Real-time segmentation and tracking of 3d
Evaluation
System performance while tracking a sign over 70m, with just the 4 point pose recovery (RPP) and with the tracker (PWP)
Real-time performance
• The CPU C++ implementation of the detection phase takes ~50ms on images of 640x480 pixels resolution. • The GPU based tracking needs up to 20ms per object.
Conclusions
We tackled and provided solutions for: Single view
- traffic sign detection and recognition - manhole detection and recognition Multi-view
- traffic signs 3D mapping - manholes 3D mapping Real-time
- traffic sign detection and recognition - traffic sign 3D pose tracking