Yiming Zeng, Yu Hu, Qiankun Tang, Shice Liu, Beibei
Jin
Autonomous Navigation System Research Group
State Key Laboratory of Computer Architecture
Institute of Computing Technology, Chinese Academy of Sciences
August 31st 2017
Position and Orientation Estimation of
Cars and Pedestrians

ANS@
Sensor and Data
Sensors setup
Sensors range
For Round2 test data, obstacles detectable in sensors are listed as follow
ford01 ford02 Ford03 ford0
4
ford05 ford0
6
for07 mustang01 pedestrian
100%
42%
49%
100%
68.6%
70.5%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
71.7%
78.2%
100%
54.8%
47.2%
83.8%
59.1%
64.6%
74.74%
14.9%
-
Coordinate system transformation(used in Round1)
To detect cars in various ranges, different sensor data were utilized in Round1. Trading off detection accuracy and 10Hz constraint, only
velodeny data was used to detect obstacle in Round2

ANS@
Related work
3D
detection
2D detection
Multi-view[3]
Region-based Region-free
RGB-D[2]
YOLO[5]
3D
conv[1]
Faster
RCNN[4]
R-FCN[6] SSD[7]

ANS@
Round1 Framework
Camera
Lidar
point cloud
Radar
Radar
msgs
R-FCN
detector
1
R-FCN
detector2
R-FCN
detector3
Front
view
Bird
view
Fusion
Front view
Bird view Camera imagePositions of car detected in coordinates are projected
to the camera coordinate and determined by scores

ANS@
Round2 Framework
Encoding 3D point cloud into
compact representation
End-to-end regressing
to estimate position
Calculating height of obstacle
center in 3D point cloud
Tracking and correcting
by Kalman filter
point cloud
1 2 3 4
correction
prediction

ANS@
Representation for 3D point cloud
Bird view:
• Average height
• Height maximum
• Variance of height
• Density
• Gradient
• Intensity
Point cloud Bird view
Different representation
• Height maps
• Density
• Average height
• Density
• Gradient
• Average height
• Density

ANS@
CNN model
Training and validation
We eliminated Lidar frames that have wrong GPS
positions, then randomly picked frames from
these good ones and projected them to bird
view as training data and validation data.
car pedestrian
Training 12436 7847
Validation 1455 872
AP 0.8169 0.6278
Caffe
SDG
base_lr: 0.001
display: 20
lr_policy: "step"
gamma: 0.1
momentum: 0.9
weight_decay: 0.001
stepsize: 20000
conv
ResNet-50
OHEM[8]
RFCN[6] was used to detect obstacles in bird
view
GPS error

ANS@
Correction
Tracking and Correcting
Comparison between DNN and KF Decision KF Action
CNN output is near to KF prediction CNN output Update
CNN output with high confidence level is far from KF prediction CNN output Reinitialize
CNN output with low confidence level is far from KF prediction KF prediction Update
• For pedestrian detection, Kalman filter were used to validate and correct
• For car detection, the Kalman filter didn’t significantly improve the score, so we didn’t use it
Prediction Synchronization*
We tried two strategies:
• Nearest interpolation
• Linear interpolation
There is no obvious
difference between the two.
*we think Kalman filter will
achieve better results, however,
we don’t have enough time to try
it.

ANS@
Result
Score:0.332
Rank: 5

ANS@
Reference
[1] B. Li, “3D Fully Convolutional Network for Vehicle Detection in Point Cloud,” Robot. Sci. Syst., Nov. 2016.
[2] J. Schlosser, C. K. Chow, and Z. Kira, “Fusing LIDAR and images for pedestrian detection using convolutional neural
networks,” in 2016 IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 2198–2205.
[3] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-View 3D Object Detection Network for Autonomous Driving,” arxiv, 2016.
[4] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2015.
[5] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in 2016
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788.
[6] W. Liu et al., “SSD: Single Shot MultiBox Detector,” arxiv, 2016.
[7] J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: Object Detection via Region-based Fully Convolutional Networks,” arxiv, 2016.
[8] A. Shrivastava, A. Gupta, and R. Girshick, “Training Region-Based Object Detectors with Online Hard Example Mining,” in
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 761–769.

ANS@
conv
ResNet-50
R-FCN
OHEM
RFCN, Region based obstacle detection, which
incorporates location information into feature map

Team Tea
Andres
Torrubia Ali
Aliev

Agenda
Team
Pipeline overview
Obstacle segmentation
Obstacle localization
Filtering
Implementation + closing thoughts

Team background
- Ali Aliev→ computer vision engineer; ROS, filtering, point cloud
preprocessing, visualization
- Andres Torrubia → Udacity SDCND Student, no previous ROS/Lidar
experience before challenge, devised and implemented deep learning
architecture (segmentation + localization).
- We merged teams 2 weeks before final deadline.

Pipeline design
- Most state of the art solutions are based on building image-like features from
lidar and convolutional networks (YOLO, SSD, FCNs, etc.), e.g.
- We wanted to do something different, original
and new.

Pipeline
Lidar: n x 5 (x y z i r)
(n ~ 30,000)
Lidar: 32 x N x 3 (d i h)
(N = 2048 )
Obstacl
e
segmenter
Segmented obstacle
(m points)
Clusteri
ng
and filtering
Obstacle points: M x 4 (x y z i)
(M = 2048)
localizer
Obstacle pose: (x y z yaw)
and size (h l w) (10 hz)
Filtration
Radar 1 x 3
(x y vx vy) (20 Hz)
Obstacle
Obstacle pose: (x y z yaw)
and size (h l w) (24 hz)

- 32 signals x 3 ⇒
distance
intensity
height
- Nearest neighbor
interpolation
- Sampled @ 2048 points
from -π to π

(512)
ectional
x₀ x₁ x₆₃ (48)
- 2048 samples split in 32 sectors (64 samples each)
- 16 sequences (we use 16 rings out of the 32 from the HDL32e)
- Each x is a vector of 16 x 3 dimensions: 16 rings x 3 (d i h)
- GRU = Gated Recurrent Unit (Cho et al. 2014)
- Last GRU layer uses sigmoid and dropout 0.1, rest use tanh and dropout 0.2
- 2.6m parameters, trained w/ binary x-entropy using release3 data + augmentation
GRU
….
GRU
GRU
GRU
….
GRU
GRU
….
GRU GRU GRU
bidir
(16)
(256)
(512)
GRU
….
GRU
GRU
y₀
y₁
y₆₃

mx4
nx4
m > 1
x y z mean = 0
n = 1024
Resampling mlp
mlp
mlp
mlp
mlp
mlp
mlp
nx64
mlp
mlp
mlp
mlp
mlp
mlp
mlp
nx128
mlp
mlp
mlp
mlp
mlp
mlp
mlp
nx256
mlp
mlp
mlp
mlp
mlp
mlp
mlp
nx2048
shared
weights
Max pool
2048
Latent
space
FC
0.1
dropout
64
FC
0.2
dropout
Obstacle localization
3
FC
512
FC
256
Centroid
64
FC
0.2
dropout
3
FC
Size
512 31
distance
128
FC
0.1
dropout
FC
FC
256
FC
32
FC
1
64
Yaw
-See:
PointNet: Deep Learning on Point Sets for 3D
Classification and Segmentation, CVPR 2017, Qi et al
-Trained on fixed release 2 data
(vehicle) + r3 (pedestrian)
- Size, centroid: l2 loss
- Yaw: angle loss
Activation → tanh(.) * π/2

Filtering: obstacle pose
We used Unscented Kalman Filter
Lidar-fixed coordinate frame
Input: lidar (x, y, z, yaw), radar (x, y, vx, vy), camera ticks
Output: pose (x, y, z, yaw)
Internal state: S = (x, vx, ax, y, vy, ay, z, vz, az, yaw)
Noisy input rejection based on S covariance
Resetting filter when S covariance too high
Kalman Filter
Lidar@10HZ Radar@20HZ
Tick@24HZ
Noise rejection
Pose@24HZ
only predictpredict & update

Filtering: obstacle pose
Fusion details:
Prefer lidar measurements over radar
measurement at close distances
Use “nearest neighbour” to pick a radar
measurement of the obstacle
radar only radar & lidar

Filtering: obstacle bounding box
Car: exponential moving average for bbox length, width, height
Trick: shift radar radius by a constant value to better fit car
bbox centroid
Pedestrian: constant cylinder radius and height (allowed by
the rules)
shift
radar
radius

Closing thoughts
- Implementation, performance & gotchas:
- No resolution lost when using raw lidar points
- Substantial polishing of release3 noisy "ground truth"
- Trained using single 1080 GTX Ti
- Code primarily in Python, optimized lidar cloud interfacing in C++
- Trained GRU (RNN) w/ theano (2x faster than tensorflow)
- Used tensorflow for inference (theano segfaulted when using two models sequentially)
-
- Areas of improvement:
- Train two networks end to end (need differentiable filtering and resampling)
- Fix release3 "ground truth"
- Train localizer with release3 data for car
- Track ego and obstacle position in a fixed global frame, separately
- Account for time delta in lidar frames
- Fuse camera, odometry
- Use phased LSTM to avoid lidar sampling

31DiDi-Udacity Self-Driving Car Challenge 2017
Pipeline
Python Node 2C Node
Input:(Bag file)
Output:(Obs info）
Lidar
Model
yaw
location
H,w,l
Lidar to 2D features
Classifications
Localizations
Orientations
Obstacles state tracking
RGB
Model
Python Node 1
Classifications
Localizations
Lidar msg
Radar msg
Camera msg

1. Lidar Information to 2D Features
Features for neural network
height
• height
• maximum z value in each cell.
• intensity
• maximum intensity value in each cell.
• ring number
• maximum ring number value in each cell.
Intensity(ped) Intensity(car)
ring
Features for calculate obstacle height
• minimum z
• minimum z value in each cell.

Network Architecture2.
Name Filters Size/Stride Output
Input 600x600
conv1_1 32 3x3 600x600
conv1_2 64 3x3 600x600
pool1 2x2/2 300x300
conv2_1 128 3x3 300x300
conv2_1_1x1 64 1x1 300x300
conv2_2 128 3x3 300x300
pool2 2x2/2 150x150
conv3_1 256 3x3 150x150
conv3_1_1x1 128 1x1 150x150
conv3_2 256 3x3 150x150
conv3_3_1x1 128 1x1 150x150
conv3_3 256 3x3 150x150
pool3 2x2/2 75x75
conv4_1 512 3x3 75x75
conv4_1_1x1 256 1x1 75x75
conv4_2 512 3x3 75x75
conv4_2_1x1 256 1x1 75x75
conv4_3 512 3x3 75x75

Training Details
Input data
• Bounding box
• Classification
• Orientation
Data Augment
• data normalization, random crops and horizontal flip
Batch normalization
3.
Bounding Box
Object
Orientation

H, W, L Calculation
Car
• Length and width:
Pedestrian
4.
Bounding Box
Object
Orientation
α
β
L
W
• Height:
• Height:

Obstacle Status Tracking
Car
• Unscented Kalman Filter:
● CTRV model
● State vector:
Pedestrian
5.
• Standard Kalman Filter :
● State vector:
k+1

[1]. Multi-View 3D Object Detection Network for Autonomous Driving. Xiaozhi Chen, Huimin Ma, Ji Wan, Bo
Li, Tian Xia International Conference on Computer Vision and Pattern Recognition (CVPR), 2017
[2]. SSD: Single Shot MultiBox Detector. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy,
Scott Reed, Cheng-Yang Fu, Alexander C. Berg ECCV 2016
[3]. https://github.com/balancap/SSD-Tensorflow
[4]. Emerging Topics in Computer Vision. Edited by G erard Medioni and Sing Bing Kang
[5]. Calibration of RGB Camera With Velodyne LiDAR. Martin Velas, Michal Spanel, Zdenek Materna, Adam
Herout
[6]. S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal
covariate shift. arXiv preprint arXiv:1502.03167, 2015.
[7]. YOLO9000: Better, Faster, Stronger Joseph Redmon∗y, Ali Farhadi∗y University of Washington∗, Allen
Institute for Aiy
Reference

Thank you！

Didi-Udacity Challenge
Robodreams Team

3D Lidar
Architecture
Radar
IMU
GPS
Camera
Lidar Object detection
EKF Estimation
Object detection using
Deep Learning
XML
/tracks
/imu
/pos
/image
/point_cloud
/vision_poses
/lidar_poses
/tracklets

EKF Estimation
Main features:
● Speed and rotation of a vehicle is considered
● Delay of sensors data is taken into account
State vector:
System model:
Tracked
vehicle
Ego vehicle

Lidar Object Detection
Lidar
Remove EGO Vehicle
Find and remove Ground
Plane
Clusterization
Select a cluster related to
vehicle
Shape alignment around the
cluster

Shape alignment
Particle weight is
● Each particle is a parallelepiped with
different parameters: x, y; width, length,
height
● We generate a particle in the center of a
found cluster using normal distribution
● Each parallelepiped plane has a different
weight. The nearest plane has the
maximum weight
dmin dmin
dmin

Object detection using Deep Learning (Camera)
Orientation Pooling
Detector
And
Classifier
Detector
And
Classifier
Input (1242x375x3)
VGG
up to
conv4_3
156x47x512
VGG
up to fc7
78x24x1024
Conv
layers
16x2x256
Normalization
Detector
And
Classifier
Orient.
Classifier
Detector
And
Classifier
Fast NMS
Final Detections
Orient.
Classifier
Orient.
Classifier
Orient.
Classifier
Orientation
Prediction

Examples: Car I
Detection
and Tracking
3D Point Cloud + Radar
IMU+GPS

Examples: Car II
Detection
and Tracking
IMU+GPS

Examples: Pedestrian
Detection
and Tracking
IMU+GPS

Background:
Research in the field of Robotics
at Innopolis University
➢ Nonlinear MPC for a race car
➢ Getting ready for the
Roborace: a competition of
autonomous racing cars
Team

Tried different approaches and neural networks
Increased performance thanks to reducing the number of cloud points
Added orientation to SSD network instead of using a separate CNN for orientation
Speeded up the development process due to the access to the high-performance GPU
Reflections

Improve detection with lidar and stay in realtime
Use a larger training dataset to improve the quality of visual detection
Detect steering wheels position of a car
Multiple object tracking in realtime
Future work

DiDi-Udacity Self-Driving Car
Challenge
Presenter：Jian Li

Team Introduction
Team name
abccba
Team members
Zhenzhe Ying (Graduated from Xian
Jiaotong University. Working as algorithm
engineer)
Jian Li (Master in Nanjing University of
Secience and Technology. Research on
deep learning)

Dataset Challenges
(1) Lidar point cloud is sparse;
(2) Target may be a long distance away;
(3) Few points is hard to distinguish car, pedestrian
(4) Camera may not find target behind or beside;
(5) Radar captures less object feature.

Our Solutions
Coarse Detection
• Clustering algorithm for
lidar point cloud
Fine Location
• Fine tune 3D box for each
lidar point cluster
Verification
• Validate current results
using history infomation
Multi-Sensor Coarse-to-Fine Detection Framework
• Tiny YOLO for camera
images
• Simple central point rules
for radar data
• Interpolate frames and
refine the track

Tiny YOLO network
(1) Conv+Pooling+FC+Multi-loss;
(2) Remove redundant code;
(3) Downsize network structure;
Why YOLO
(1)Developed by C language;
(2) One-stage detection;
(3) Fast and easily deployed;
Train yolo on kitti dataset;
Detect car or ped on didi-uda dataset;
Output: (x , y , w, l) and categories;
Transformation from 2d box to 3d box.
YOLO
Coarse Detection
You Only Look Once: Unified, Real-Time Object Detection. J Redmon, S Divvala, R Girshick, AFarhadi 2016 CVPR

Point cloud
cluster algorithm
Input : lidar point
cloud;
Output: point
clusters. （1）（2）
（3）
Remove
ground and
objects too
high.
Swing scan
remaining points.
Cluster point
cloud into
several point
clusters by
spatial
distance
Consdier
each points
cluster
（5）
Coarse Detection
（4）
（3）
... ...

Fine Location
(1) Given Few lidar points. Based on
this, we initialize a central point;
(2)For each point cloud cluster, We
grid search x, y, z, yaw, around these
points;
(3)After fixing w, h, l. we generate
some 3d box proposals centered at
x,y,z in different orientations;
(4)We evaluate each proposal and
output the one with the highest
score. Score is based on Evaluation
Metrics in next page.
(2)
(3)
(4)(1)
3D box fine tuning scheme

Fine Location
Car(left), Pedestrian(right) parameters
N :the number of points.
dis :distance from the point to surface of box;
f(N) : the more points in box, the better the 3d box will be;
Lmin(V) :try to minimum the volume of the 3d box.
m n a b c
2.0 1.5 2.0 0.6 1.2
Evaluation Metrics

Verification
• Central point rules for radar
points to locate far target
• Validate current results
using history infomation
• Interpolate frames and
refine the track
• Point cloud may fail to
capture far target.
<35m
>35m
.....
....
....
Radar
Camera
Lidar
Validation
Interpolation
35m
Lidar Radar

Complex scene examples
dust car

Summary
(1). System design
Agile development, easy deploy;
Low coupling and more flexbility;
(2). Multi-sensor info ensemble
Lidar, Radar, Camera and GPS;
(3). Algorithm
Corse-to-fine detection;
Adopt CNN for camera images;
Point cloud reduction and cluster algorithm;
Based on spatial distribution of points, we design evaluation criteria
(4). Get 0.43 IOU and 20HZ on K80 GPU platform;
TODO
(1). Record the speed of target for tracking to predict the next position more precisely;
(2). Fuse a small neural network module for coarse detection from bird view for point cloud;
Team scores
abccba 0.4333510468
Robodreams 0.4097831892
zbzc 0.3978965429
Tea 0.3914668045
ICTANS 0.3463341661
Round1 Team scores
abccba 0.28531890
zbzc 0.23590994
Roboauto 0.21162456
Robodreams 0.18696818
Something 0.17618155
Round2

Thanks！
bird
whitebird827@163.com
sword
lijiannuist@gmail.com

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Dernier

Dernier (20)