Transfer Learning for Autonomous Driving in Duckietown

Charles Schaff, Ruotian Luo

DuckieBook links: [project description] [instructions to replicate results]

The goal of this project was to replace part or all of the duckiebot control pipeline with deep models trained in simulation and transferred to the real town. We replaced the pose estimation module in the control pipeline with a convolutional neural network trained to estimate the pose of the duckiebot directly from images.

Control in Duckietown

Each duckiebot is equipped with a forward facing camera. Images from that camera are used to drive the duckiebot around duckietown. The control pipeline is split into two phases: pose estimation and PID control. The pose of the duckiebot is described with two numbers: the displacement of the duckiebot from the center of the right lane, and the angle between the orientation of the duckiebot relative to the road. A duckiebot is successfully driving in its lane when its pose is near [0,0]. The estimated pose is used as the error signal in a PID controller.

Training a Pose Estimation Network

To train a pose estimation network, we leverage a very basic duckietown simulator. The simulator contains a single patch of straight road in an otherwise empty world. While more sophisticated simulators exist now, this was close to the best we could use as of Fall 2017 and given the time constraints on the project.
Simulator Duckietown
As you can see, the simulator looks quite different from the real thing. When dealing directly with images, this can create quite a problem. Even if our model is performing well on simulated data, the perceptual differences between simulated images and real images might be enough to fool the model. To solve this problem, we need a way to make our model robust to these differences.

Domain Randomization

Images taken from the simulator before and after domain randomization was applied.
Domain randomization is a common technique to enable transfer from simulation to the real world. The idea is to continually randomize the dynamics or look of the simulator. The intuition behind this idea is simple: the real world is going to look and act unlike the simulator, so we should force our model to be robust to these factors. For example, in the case of this project, the only important information is the location and orientation of the lane lines, the exact look of the environment is unnecessary for the task. However, it is easy for the model trained in simulation to rely on the exact look of the simulation, making it useless in Duckietown. By randomizing the lighting, coloring, and textures in the simulator, the model will focus on the remaining information in the image such as the position and shape of objects. A model which only relies on this information will be much more likely to work in Duckietown. For more information see this paper.

Training Process

We train a 5 layer convolutional neural network (3 convolutional layers followed by 2 fully connected layers) to estimate the pose of a duckiebot from images taken from its camera. First we collect a large number of images from the simulater of the duckiebot in various poses. Then we train on that dataset until convergence, applying different domain randomizations for each epoch of the dataset. Then, to further help the transfer, we collect a small dataset of about 70 images from real duckiebots and label the pose associated with each image by hand. We then fine-tune our network on the real data.


We deploy the trained model in the control pipeline described above on real duckiebots. Videos of its performance are shown below. Interestingly, the model was able to transfer from the straight road in the simulator to a curved road in duckietown. Additionally the real data collected was taken only from the outside lane, but the model was able to transfer to the inside lane as well.
Full Model (outside lane) Full Model (inside lane)
In addition to testing our full model we ran ablation studies, testing our approach without domain randomization, without real data and with only real data. We find that both training in simulation with domain randomization and fine-tuning on real data are crucial to the final performance of the model.
No Domain Randomization (outside lane) No Domain Randomization (inside lane)
No Real Data (outside lane) No Real Data (inside lane)
No Simulated Data (outside lane) No Simulated Data (inside lane)