Holistic Scene Understanding

Jian Yao, Sanja Fidler, Raquel Urtasun

model visualization         visualization of task

In this project we propose an approach to holistic scene understanding that reasons jointly about regions, location, class and spatial extent of objects, presence of a class in the image, as well as the scene type. Learning and inference in our model are efficient as we reason at the segment level, and introduce auxiliary variables that allow us to decompose the inherent high-order potentials into pairwise potentials between a few variables with small number of states (at most the number of classes). Inference is done via a convergent message-passing algorithm, which, unlike graph-cuts inference, has no submodularity restrictions and does not require potential specific moves. We believe this is very important, as it allows us to encode our ideas and prior knowledge about the problem without the need to change the inference engine every time we introduce a new potential. Our approach outperforms the state-of-the-art on the MSRC-21 benchmark, while being much faster. Importantly, our holistic model is able to improve performance in all tasks.

Related Papers

  • Jian Yao, Sanja Fidler and Raquel Urtasun
    Describing the Scene as a Whole: Joint Object Detection, Scene Classification and Semantic Segmentation
    In Conference of Computer Vision and Pattern Recognition (CVPR), 2012  [PDF]

Please cite the above papers if you use the code or the data.

[code] [ReadMe]
Note: All the source codes are available, please refer to the ReadMe file
You also have to download MSRC/Pascal dataset and put into the corresponding folders

Please feel free to email to yaojian@ttic.edu or fidler@cs.toronto.edu if you have any suggestion or question.