Understanding the 3D world is one of the fundamental challenges in computer vision. A wide variety of approaches have been developed to either reconstruct the 3D world or recognize it. However, until very recently the interactions between these two tasks were mostly ignored. This is perhaps surprising as knowing the 3D world greatly simplifies the recognition task. Furthermore, knowing that we are looking at a particular object, greatly constraints 3D reconstruction, e.g., we expect a wall to be planar.
Inspired by the great success of the PASCAL VOC challenge, we propose a set of challenges to study how reconstruction and recognition algorithms can jointly be exploited to push forward the state-of-the-art in visual perception tasks. Towards this goal, we propose a set of benchmarks that cover both outdoor scenarios in the context of autonomous driving, as well as indoor scenes for personal robotics. We take advantage of the KITTI, NYU and Sun3D datasets and extend them in a variety of ways to provide the community with a set of challenges ranging from low level to high level vision. We envision this workshop to be the first one in a series of workshops which will help push forward the performance of the field.
Towards this goal, we have created two training sets, one in the outdoor setting and one in the indoor setting, which contain labelings for all reconstruction and recognition tasks. This way, participants can exploit semantics for reconstruction and reconstruction for semantic analysis. Participants are allowed to use as many sources of information as they want in order to solve each challenge. In the outdoor scenario, we will provide stereo imagery, point clouds from a laser scanner as well as video. In the indoor case, we will provide RGB-D data captured by a set of different devices. The following table shows the tasks that compose our challenges.
|Depth from kinect