Here are the key steps in our pipeline:1. Preprocess lidar data to generate 2D feature maps representing height, intensity, and ring number. 2. Feed feature maps into a convolutional neural network to perform semantic segmentation, classifying each cell as car, pedestrian, or background.3. Extract connected components from the segmentation to identify individual obstacles. 4. Use a point cloud based neural network to localize each obstacle's 3D position and estimate its orientation. 5. Track obstacle state over time using a Kalman filter, fusing lidar and radar measurements.6. Output tracked obstacle positions, orientations, and bounding box sizes at 10Hz.Some highlights of our approach
Deep image retrieval learning global representations for image search
Similaire à Here are the key steps in our pipeline:1. Preprocess lidar data to generate 2D feature maps representing height, intensity, and ring number. 2. Feed feature maps into a convolutional neural network to perform semantic segmentation, classifying each cell as car, pedestrian, or background.3. Extract connected components from the segmentation to identify individual obstacles. 4. Use a point cloud based neural network to localize each obstacle's 3D position and estimate its orientation. 5. Track obstacle state over time using a Kalman filter, fusing lidar and radar measurements.6. Output tracked obstacle positions, orientations, and bounding box sizes at 10Hz.Some highlights of our approach
Traffic Light Detection and Recognition for Self Driving Cars using Deep Lear...ijtsrd
Similaire à Here are the key steps in our pipeline:1. Preprocess lidar data to generate 2D feature maps representing height, intensity, and ring number. 2. Feed feature maps into a convolutional neural network to perform semantic segmentation, classifying each cell as car, pedestrian, or background.3. Extract connected components from the segmentation to identify individual obstacles. 4. Use a point cloud based neural network to localize each obstacle's 3D position and estimate its orientation. 5. Track obstacle state over time using a Kalman filter, fusing lidar and radar measurements.6. Output tracked obstacle positions, orientations, and bounding box sizes at 10Hz.Some highlights of our approach (20)
Here are the key steps in our pipeline:1. Preprocess lidar data to generate 2D feature maps representing height, intensity, and ring number. 2. Feed feature maps into a convolutional neural network to perform semantic segmentation, classifying each cell as car, pedestrian, or background.3. Extract connected components from the segmentation to identify individual obstacles. 4. Use a point cloud based neural network to localize each obstacle's 3D position and estimate its orientation. 5. Track obstacle state over time using a Kalman filter, fusing lidar and radar measurements.6. Output tracked obstacle positions, orientations, and bounding box sizes at 10Hz.Some highlights of our approach
3. Yiming Zeng, Yu Hu, Qiankun Tang, Shice Liu, Beibei
Jin
Autonomous Navigation System Research Group
State Key Laboratory of Computer Architecture
Institute of Computing Technology, Chinese Academy of Sciences
August 31st 2017
Position and Orientation Estimation of
Cars and Pedestrians
4. ANS@
Sensor and Data
Sensors setup
Sensors range
For Round2 test data, obstacles detectable in sensors are listed as follow
ford01 ford02 Ford03 ford0
4
ford05 ford0
6
for07 mustang01 pedestrian
100%
42%
49%
100%
68.6%
70.5%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
71.7%
78.2%
100%
54.8%
47.2%
83.8%
59.1%
64.6%
74.74%
14.9%
-
Coordinate system transformation(used in Round1)
To detect cars in various ranges, different sensor data were utilized in Round1. Trading off detection accuracy and 10Hz constraint, only
velodeny data was used to detect obstacle in Round2
7. ANS@
Round2 Framework
Encoding 3D point cloud into
compact representation
End-to-end regressing
to estimate position
Calculating height of obstacle
center in 3D point cloud
Tracking and correcting
by Kalman filter
point cloud
1 2 3 4
correction
prediction
8. ANS@
Representation for 3D point cloud
Bird view:
• Average height
• Height maximum
• Variance of height
• Density
• Gradient
• Intensity
Point cloud Bird view
Different representation
• Height maps
• Density
• Average height
• Density
• Gradient
• Average height
• Density
9. ANS@
CNN model
Training and validation
We eliminated Lidar frames that have wrong GPS
positions, then randomly picked frames from
these good ones and projected them to bird
view as training data and validation data.
car pedestrian
Training 12436 7847
Validation 1455 872
AP 0.8169 0.6278
Caffe
SDG
base_lr: 0.001
display: 20
lr_policy: "step"
gamma: 0.1
momentum: 0.9
weight_decay: 0.001
stepsize: 20000
conv
ResNet-50
OHEM[8]
RFCN[6] was used to detect obstacles in bird
view
GPS error
10. ANS@
Correction
Tracking and Correcting
Comparison between DNN and KF Decision KF Action
CNN output is near to KF prediction CNN output Update
CNN output with high confidence level is far from KF prediction CNN output Reinitialize
CNN output with low confidence level is far from KF prediction KF prediction Update
• For pedestrian detection, Kalman filter were used to validate and correct
• For car detection, the Kalman filter didn’t significantly improve the score, so we didn’t use it
Prediction Synchronization*
We tried two strategies:
• Nearest interpolation
• Linear interpolation
There is no obvious
difference between the two.
*we think Kalman filter will
achieve better results, however,
we don’t have enough time to try
it.
12. ANS@
Reference
[1] B. Li, “3D Fully Convolutional Network for Vehicle Detection in Point Cloud,” Robot. Sci. Syst., Nov. 2016.
[2] J. Schlosser, C. K. Chow, and Z. Kira, “Fusing LIDAR and images for pedestrian detection using convolutional neural
networks,” in 2016 IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 2198–2205.
[3] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-View 3D Object Detection Network for Autonomous Driving,” arxiv, 2016.
[4] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2015.
[5] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in 2016
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788.
[6] W. Liu et al., “SSD: Single Shot MultiBox Detector,” arxiv, 2016.
[7] J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: Object Detection via Region-based Fully Convolutional Networks,” arxiv, 2016.
[8] A. Shrivastava, A. Gupta, and R. Girshick, “Training Region-Based Object Detectors with Online Hard Example Mining,” in
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 761–769.
18. Team background
- Ali Aliev→ computer vision engineer; ROS, filtering, point cloud
preprocessing, visualization
- Andres Torrubia → Udacity SDCND Student, no previous ROS/Lidar
experience before challenge, devised and implemented deep learning
architecture (segmentation + localization).
- We merged teams 2 weeks before final deadline.
19. Pipeline design
- Most state of the art solutions are based on building image-like features from
lidar and convolutional networks (YOLO, SSD, FCNs, etc.), e.g.
- We wanted to do something different, original
and new.
20. Pipeline
Lidar: n x 5 (x y z i r)
(n ~ 30,000)
Lidar: 32 x N x 3 (d i h)
(N = 2048 )
Obstacl
e
segmenter
Segmented obstacle
(m points)
Clusteri
ng
and filtering
Obstacle points: M x 4 (x y z i)
(M = 2048)
localizer
Obstacle pose: (x y z yaw)
and size (h l w) (10 hz)
Filtration
Radar 1 x 3
(x y vx vy) (20 Hz)
Obstacle
Obstacle pose: (x y z yaw)
and size (h l w) (24 hz)
22. Obstacle segmentation
- 32 signals x 3 ⇒
distance
intensity
height
- Nearest neighbor
interpolation
- Sampled @ 2048 points
from -π to π
23. (512)
ectional
x₀ x₁ x₆₃ (48)
- 2048 samples split in 32 sectors (64 samples each)
- 16 sequences (we use 16 rings out of the 32 from the HDL32e)
- Each x is a vector of 16 x 3 dimensions: 16 rings x 3 (d i h)
- GRU = Gated Recurrent Unit (Cho et al. 2014)
- Last GRU layer uses sigmoid and dropout 0.1, rest use tanh and dropout 0.2
- 2.6m parameters, trained w/ binary x-entropy using release3 data + augmentation
Obstacle segmentation
GRU
….
GRU
GRU
GRU
….
GRU
GRU
….
GRU GRU GRU
bidir
(16)
(256)
(512)
GRU
….
GRU
GRU
y₀
y₁
y₆₃
24.
25. mx4
nx4
m > 1
x y z mean = 0
n = 1024
Resampling mlp
mlp
mlp
mlp
mlp
mlp
mlp
nx64
mlp
mlp
mlp
mlp
mlp
mlp
mlp
nx128
mlp
mlp
mlp
mlp
mlp
mlp
mlp
nx256
mlp
mlp
mlp
mlp
mlp
mlp
mlp
nx2048
shared
weights
Max pool
2048
Latent
space
FC
0.1
dropout
64
FC
0.2
dropout
Obstacle localization
3
FC
512
FC
256
Centroid
64
FC
0.2
dropout
3
FC
Size
512 31
distance
128
FC
0.1
dropout
FC
FC
256
FC
32
FC
1
64
Yaw
-See:
PointNet: Deep Learning on Point Sets for 3D
Classification and Segmentation, CVPR 2017, Qi et al
-Trained on fixed release 2 data
(vehicle) + r3 (pedestrian)
- Size, centroid: l2 loss
- Yaw: angle loss
Activation → tanh(.) * π/2
26. Filtering: obstacle pose
We used Unscented Kalman Filter
Lidar-fixed coordinate frame
Input: lidar (x, y, z, yaw), radar (x, y, vx, vy), camera ticks
Output: pose (x, y, z, yaw)
Internal state: S = (x, vx, ax, y, vy, ay, z, vz, az, yaw)
Noisy input rejection based on S covariance
Resetting filter when S covariance too high
Kalman Filter
Lidar@10HZ Radar@20HZ
Tick@24HZ
Noise rejection
Pose@24HZ
only predictpredict & update
27. Filtering: obstacle pose
Fusion details:
Prefer lidar measurements over radar
measurement at close distances
Use “nearest neighbour” to pick a radar
measurement of the obstacle
radar only radar & lidar
28. Filtering: obstacle bounding box
Car: exponential moving average for bbox length, width, height
Trick: shift radar radius by a constant value to better fit car
bbox centroid
Pedestrian: constant cylinder radius and height (allowed by
the rules)
shift
radar
radius
29. Closing thoughts
- Implementation, performance & gotchas:
- No resolution lost when using raw lidar points
- Substantial polishing of release3 noisy "ground truth"
- Trained using single 1080 GTX Ti
- Code primarily in Python, optimized lidar cloud interfacing in C++
- Trained GRU (RNN) w/ theano (2x faster than tensorflow)
- Used tensorflow for inference (theano segfaulted when using two models sequentially)
-
- Areas of improvement:
- Train two networks end to end (need differentiable filtering and resampling)
- Fix release3 "ground truth"
- Train localizer with release3 data for car
- Track ego and obstacle position in a fixed global frame, separately
- Account for time delta in lidar frames
- Fuse camera, odometry
- Use phased LSTM to avoid lidar sampling
31. 31DiDi-Udacity Self-Driving Car Challenge 2017
Pipeline
Python Node 2C Node
Input:(Bag file)
Output:(Obs info)
Lidar
Model
yaw
location
H,w,l
Lidar to 2D features
Classifications
Localizations
Orientations
Obstacles state tracking
RGB
Model
Python Node 1
Classifications
Localizations
Lidar msg
Radar msg
Camera msg
32. 32DiDi-Udacity Self-Driving Car Challenge 2017
1. Lidar Information to 2D Features
Features for neural network
height
• height
• maximum z value in each cell.
• intensity
• maximum intensity value in each cell.
• ring number
• maximum ring number value in each cell.
Intensity(ped) Intensity(car)
ring
Features for calculate obstacle height
• minimum z
• minimum z value in each cell.
34. 34DiDi-Udacity Self-Driving Car Challenge 2017
Training Details
Input data
• Bounding box
• Classification
• Orientation
Data Augment
• data normalization, random crops and horizontal flip
Batch normalization
3.
Bounding Box
Object
Orientation
35. 35DiDi-Udacity Self-Driving Car Challenge 2017
H, W, L Calculation
Car
• Length and width:
Pedestrian
4.
Bounding Box
Object
Orientation
α
β
L
W
• Height:
• Height:
36. 36DiDi-Udacity Self-Driving Car Challenge 2017
Obstacle Status Tracking
Car
• Unscented Kalman Filter:
● CTRV model
● State vector:
Pedestrian
5.
• Standard Kalman Filter :
● State vector:
k+1
37. 37DiDi-Udacity Self-Driving Car Challenge 2017
[1]. Multi-View 3D Object Detection Network for Autonomous Driving. Xiaozhi Chen, Huimin Ma, Ji Wan, Bo
Li, Tian Xia International Conference on Computer Vision and Pattern Recognition (CVPR), 2017
[2]. SSD: Single Shot MultiBox Detector. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy,
Scott Reed, Cheng-Yang Fu, Alexander C. Berg ECCV 2016
[3]. https://github.com/balancap/SSD-Tensorflow
[4]. Emerging Topics in Computer Vision. Edited by G erard Medioni and Sing Bing Kang
[5]. Calibration of RGB Camera With Velodyne LiDAR. Martin Velas, Michal Spanel, Zdenek Materna, Adam
Herout
[6]. S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal
covariate shift. arXiv preprint arXiv:1502.03167, 2015.
[7]. YOLO9000: Better, Faster, Stronger Joseph Redmon∗y, Ali Farhadi∗y University of Washington∗, Allen
Institute for Aiy
Reference
43. EKF Estimation
Main features:
● Speed and rotation of a vehicle is considered
● Delay of sensors data is taken into account
State vector:
System model:
Tracked
vehicle
Ego vehicle
44. Lidar Object Detection
Lidar
Remove EGO Vehicle
Find and remove Ground
Plane
Clusterization
Select a cluster related to
vehicle
Shape alignment around the
cluster
45. Shape alignment
Particle weight is
● Each particle is a parallelepiped with
different parameters: x, y; width, length,
height
● We generate a particle in the center of a
found cluster using normal distribution
● Each parallelepiped plane has a different
weight. The nearest plane has the
maximum weight
dmin dmin
dmin
46. Object detection using Deep Learning (Camera)
Orientation Pooling
Detector
And
Classifier
Detector
And
Classifier
Input (1242x375x3)
VGG
up to
conv4_3
156x47x512
VGG
up to fc7
78x24x1024
Conv
layers
16x2x256
Normalization
Detector
And
Classifier
Orient.
Classifier
Detector
And
Classifier
Fast NMS
Final Detections
Orient.
Classifier
Orient.
Classifier
Orient.
Classifier
Orientation
Prediction
50. Background:
Research in the field of Robotics
at Innopolis University
➢ Nonlinear MPC for a race car
➢ Getting ready for the
Roborace: a competition of
autonomous racing cars
Team
51. Tried different approaches and neural networks
Increased performance thanks to reducing the number of cloud points
Added orientation to SSD network instead of using a separate CNN for orientation
Speeded up the development process due to the access to the high-performance GPU
Reflections
52. Improve detection with lidar and stay in realtime
Use a larger training dataset to improve the quality of visual detection
Detect steering wheels position of a car
Multiple object tracking in realtime
Future work
56. Team Introduction
Team name
abccba
Team members
Zhenzhe Ying (Graduated from Xian
Jiaotong University. Working as algorithm
engineer)
Jian Li (Master in Nanjing University of
Secience and Technology. Research on
deep learning)
57. Dataset Challenges
(1) Lidar point cloud is sparse;
(2) Target may be a long distance away;
(3) Few points is hard to distinguish car, pedestrian
(4) Camera may not find target behind or beside;
(5) Radar captures less object feature.
58. Our Solutions
Coarse Detection
• Clustering algorithm for
lidar point cloud
Fine Location
• Fine tune 3D box for each
lidar point cluster
Verification
• Validate current results
using history infomation
Multi-Sensor Coarse-to-Fine Detection Framework
• Tiny YOLO for camera
images
• Simple central point rules
for radar data
• Interpolate frames and
refine the track
59. Tiny YOLO network
(1) Conv+Pooling+FC+Multi-loss;
(2) Remove redundant code;
(3) Downsize network structure;
Why YOLO
(1)Developed by C language;
(2) One-stage detection;
(3) Fast and easily deployed;
Train yolo on kitti dataset;
Detect car or ped on didi-uda dataset;
Output: (x , y , w, l) and categories;
Transformation from 2d box to 3d box.
YOLO
Coarse Detection
You Only Look Once: Unified, Real-Time Object Detection. J Redmon, S Divvala, R Girshick, AFarhadi 2016 CVPR
60. Point cloud
cluster algorithm
Input : lidar point
cloud;
Output: point
clusters. (1) (2)
(3)
Remove
ground and
objects too
high.
Swing scan
remaining points.
Cluster point
cloud into
several point
clusters by
spatial
distance
Consdier
each points
cluster
(5)
Coarse Detection
(4)
(3)
... ...
61. Fine Location
(1) Given Few lidar points. Based on
this, we initialize a central point;
(2)For each point cloud cluster, We
grid search x, y, z, yaw, around these
points;
(3)After fixing w, h, l. we generate
some 3d box proposals centered at
x,y,z in different orientations;
(4)We evaluate each proposal and
output the one with the highest
score. Score is based on Evaluation
Metrics in next page.
(2)
(3)
(4)(1)
3D box fine tuning scheme
62. Fine Location
Car(left), Pedestrian(right) parameters
N :the number of points.
dis :distance from the point to surface of box;
f(N) : the more points in box, the better the 3d box will be;
Lmin(V) :try to minimum the volume of the 3d box.
m n a b c
2.0 1.5 2.0 0.6 1.2
Evaluation Metrics
63. Verification
• Central point rules for radar
points to locate far target
• Validate current results
using history infomation
• Interpolate frames and
refine the track
• Point cloud may fail to
capture far target.
<35m
>35m
.....
....
....
Radar
Camera
Lidar
Validation
Interpolation
35m
Lidar Radar
66. Summary
(1). System design
Agile development, easy deploy;
Low coupling and more flexbility;
(2). Multi-sensor info ensemble
Lidar, Radar, Camera and GPS;
(3). Algorithm
Corse-to-fine detection;
Adopt CNN for camera images;
Point cloud reduction and cluster algorithm;
Based on spatial distribution of points, we design evaluation criteria
(4). Get 0.43 IOU and 20HZ on K80 GPU platform;
TODO
(1). Record the speed of target for tracking to predict the next position more precisely;
(2). Fuse a small neural network module for coarse detection from bird view for point cloud;
Team scores
abccba 0.4333510468
Robodreams 0.4097831892
zbzc 0.3978965429
Tea 0.3914668045
ICTANS 0.3463341661
Round1 Team scores
abccba 0.28531890
zbzc 0.23590994
Roboauto 0.21162456
Robodreams 0.18696818
Something 0.17618155
Round2