Research
My primary research interest is the intersection of machine learning and traditional 3D computer vision, including depth estimation, multi-camera geometry, and camera calibration.
|
|
Full Surround Monodepth from Multiple Cameras
Vitor Guizilini*,
Igor Vasiljevic*,
Rares Ambrus,
Greg Shakhnarovich,
Adrien Gaidon
To appear in RA-L, 2022.
arXiv
/
video
In this work, we extend monocular self-supervised depth and ego-motion estimation to large-baseline multi-camera rigs. Using generalized spatio-temporal contexts, pose consistency constraints, and carefully designed photometric loss masking, we learn a single network generating dense, consistent, and scale-aware point clouds that cover the same full surround 360 degree field of view as a typical LiDAR scanner. We also propose a new scale-consistent evaluation metric more suitable to multi-camera settings. Experiments on two challenging benchmarks illustrate the benefits of our approach over strong baselines.
|
|
Neural Ray Surfaces for Self-Supervised Learning of Depth and Ego-motion
Igor Vasiljevic,
Vitor Guizilini,
Rares Ambrus,
Sudeep Pillai,
Wolfram Burgard,
Greg Shakhnarovich,
Adrien Gaidon
3DV, 2020   (Oral Presentation)
arXiv /
video
Self-supervised depth methods assume a known parametric camera model (usually pinhole), leading to failure when applied to imaging systems that deviate significantly from this assumption (e.g., catadioptric cameras or underwater imaging). We introduced a differentiable version of the general camera model of Grossberg and Nayar, learning depth, pose, and a per-pixel general camera model in a fully-supervised way. We demonstrate our model with experiments on perspective, fisheye, catadioptric, and underwater datasets.
|
|
DIODE: A dense indoor and outdoor depth dataset
Igor Vasiljevic,
Nick Kolkin,
Shanyi Zhang,
Ruotian Luo,
Haochen Wang,
Falcon Z. Dai,
Andrea F. Daniele,
Mohammadreza Mostajabi,
Steven Basart,
Matthew R. Walter,
Gregory Shakhnarovich
CVPRW, 2019   (Oral Presentation)
DIODE is a dataset that contains thousands of diverse high resolution color images with accurate, dense, long-range depth measurements. It was the first public dataset to include RGBD images of indoor and outdoor scenes obtained with one sensor suite (a high-resolution FARO scanner). This is in contrast to existing datasets that focus on just one domain/scene type and employ different sensors, making generalization across domains difficult.
|
|
Examining the Impact of Blur on Recognition by Convolutional Networks
Igor Vasiljevic,
Ayan Chakrabarti,
Gregory Shakhnarovich
UChicago Statistics MS thesis, 2016.
CNNs for semantic visual tasks are generally trained (and evaluated) on large annotated datasets of artifact-free, high-quality images. However, in real-world applications many images are marred by various forms of blur.
We show that standard network models, trained only on high-quality images, suffer a significant degradation in performance when applied to those degraded by blur due to defocus, or subject or camera motion. We investigate the extent to which this degradation is due to the mismatch between training and input image statistics.
|
|