Cue Integration in Figure/Ground Labeling

Introduction

Figure/ground organization, the binding of contours to surfaces, is a classical problem in vision. In this work we study a simplified task of figure/ground labeling in which the goal is to label every pixel as belonging to either a figural object or background. Our goal is to understand the role of different cues in this process, including low-level cues, such as edge contrast and texture similarity; mid-level cues, such as curvilinear continuity; and high-level cues, such as characteristic shape or texture of the object.

We develop a conditional random field model over edges, regions and objects to integrate these cues. This random field model is built upon the CDT graph, a discrete scale-invariant image representation we have recently developed. We train the model from human-marked groundtruth labels and quantify the relative contributions of each cue on a large collection of horse images.

We have previously applied this CDT/CRF framework to the problem of contour grouping/completion. Here we extend the approach in a few key directions:

We extend the framework to joint modeling and inference of both contours and regions, hence allowing a much richer set of cues be to incorporated and studied. The output of our model is now the posterior marginal distributions of both boundary contours and figure regions.
We include high-level knowledge into the grouping mechanism, including shape, texture and regional support. As we will show, such object-specific knowledge greatly improves the performance of grouping.

Cues for Figure/Ground Labeling

We study the interactions of figure/ground cues at three distinct levels: low-level cues, which can be computed in local neighborhoods; mid-level cues, which encodes generic relations between elements without object knowledge; and then high-level cues, which are specific to a object category.

We define these cues on top of the CDT graph, where each edge in the triangulation is associated with a binary random variable Xe, and each triangle a binary variable Yt. Each cue is a constraint on a subset of these random variables, as outlined below.

L1: edge energy along an edge e.
L2: brightness/texture similarity between two regions s and t.
M1: colinearity and junction frequency at vertex V.
M2: consistency of edge labels and adjoining region labels.
H1: similarity of a region t to exemplar texture.
H2: compatibility of local region support with pose.
H3: ompatibility of local edge shape with pose.

Quantitative Evaluation

Performance evaluation on the horse dataset: (a) precision-recall curves for horse boundaries, models with low-level cues only (Pb), low- plus mid-level cues (Pb+M), low- plus high-level cues (Pb+H), and all three classes of cues combined (Pb+M+H). The F-measure recorded in the legend is the maximal harmonic mean of precision and recall and provides an overall ranking. Using high-level cues greatly improves the boundary detection performance. Mid-level continuity cues are useful with or without high-level cues. (b) precision-recall for regions. The poor performance of the baseline L+M model indicates the ambiguity of figure/ground labeling at low-level despite successful boundary detection. High-level shape knowledge is the key, consistent with evidence from psychophysics [Peterson and Gibson 1994]. In both boundary and region cases, the groundtruth labels on CDTs are nearly perfect, indicating that the CDT graphs preserve most of the image structure.

Some Results

(a) (b) (c) (d)
Sample results. (a) the input grayscale images. (b) the low-level boundary map output by Pb. (c) the edge marginals under our full model and (d) the image masked by the output region marginals. A red cross in (d) indicates the most probably object center. By combining relatively simple low-/mid-/high-level cues in a learning framework, We are able to find and segment horses under varying conditions with only a simple object mode. The boundary maps show the model is capable of suppressing strong gradients in the scene background while boosting low-contrast edges between figure and ground.

References

Cue Integration in Figure/Ground Labeling. [abstract] [pdf] [bibtex]
Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, in NIPS '05, Vancouver 2005.