Using Shapemes for Mid-level Vision

(under construction)

Introduction

If low-level vision is about how much information one can independently extract for each image element, mid-level vision is about the interaction between elements or, roughly speaking, context.

How do we model context? One way is to find a computational/operational definition for each mid-level visual cue, such as continuity, convexity, or parallelism. An alternative is to use a generic "context" descriptor, such as shape context or geometric blur.

Shapemes

Shapemes, as analogous to phonemes, is the nickname we use for prototypical local shapes. As we have discovered in our experiments, such prototypical shapes can capture mid-level cues such as convexity and parallelism quite nicely, without having any notion of what convexity or parallelism is. This is a demonstration of the ecological ground of mid-level vision (Egon Brunswik).

Figure 1 shows an example of the shapemes. We use the geometric blur descriptor on human-marked boundaries of a collection of baseball player photos, align the blurred descriptors according to local orientation (such that all straight lines are oriented vertically, hence belong to a single shapeme), and use k-means to cluster them into 64 shapemes.

Figure 1: an example of shapemes. Shown here is the "average" shape in each shapeme cluster. Mid-level cues, such as convexity (e.g. row 1, col 1) or parallelism (e.g. row 1, col 2), can be easily found in this shapeme representation.

Shapemes for Figure/Ground Organization




Figure 2: some results on shapeme-based figure/ground organization, by averaging soft shapeme classifier output over human-marked boundaries. Human subjects provide figure/ground groundtruth labels for images from the BSDS dataset. Red indicates a correct classification and blue incorrect.

Shapemes for Boundary Detection


Figure 3: precision-recall curves for the baseball player and the horse datasets. Shapemes, encoding information from a larger context, improves boundary detection. However, they encode information in a generic way, hence less effective comparing to contour continuity cues for the boundary detection problem.

References

Figure/Ground Assignment in Natural Images. [abstract]
Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, in ECCV '06, volume 2, pages 614-627, Graz 2006.
Familiar Configuration Enables Figure/Ground Assignment in Natural Scenes. [abstract] [poster] [bibtex]
Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, in VSS 05, Sarasota, FL 2005.
Mid-level Cues Improve Boundary Detection. [abstract] [pdf] [ps] [bibtex]
Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, Berkeley Technical Report 05-1382, CSD 2005.