We consider the problem of finding distinctive social interactions involving groups of agents embedded in larger social gatherings. Given a pre-defined gallery of short exemplar interaction videos, and a long input video of a large gathering (with approximately-tracked agents), we identify within the gathering small sub-groups of agents exhibiting social interactions that resemble those in the exemplars. The participants of each detected group interaction are localized in space; the extent of their interaction is localized in time; and when the gallery of exemplars is annotated with group-interaction categories, each detected interaction is classified into one of the pre-defined categories. Our approach represents group behaviors by dichotomous collections of descriptors for (a) individual actions, and (b) pair-wise interactions; and it includes efficient algorithms for optimally distinguishing participants from by-standers in every temporal unit and for temporally localizing the extent of the group interaction. Most importantly, the method is generic and can be applied whenever numerous interacting agents can be approximately tracked over time. We evaluate the approach using three different video collections, two that involve humans and one that involves mice.
Recent attempts to fabricate surfaces with custom reflectance functions boast impressive angular resolution, yet their spatial resolution is limited. In this paper we present a method to construct spatially varying reflectance at a high resolution of up to 220dpi, orders of magnitude greater than previous attempts, albeit with a lower angular resolution. The resolution of previous approaches is limited by the machining, but more fundamentally, by the geometric optics model on which they are built. Beyond a certain scale geometric optics models break down and wave effects must be taken into account. We present an analysis of incoherent reflectance based on wave optics and gain important insights into reflectance design. We further suggest and demonstrate a practical method, which takes into account the limitations of existing micro-fabrication techniques such as photolithography to design and fabricate a range of reflection effects, based on wave interference.
Consumer digital cameras use tone-mapping to produce compact, narrow-gamut images that are nonetheless visually pleasing. In doing so, they discard or distort substantial radiometric signal that could otherwise be used for computer vision. Existing methods attempt to undo these effects through deterministic maps that de-render the reported narrow-gamut colors back to their original wide-gamut sensor measurements. Deterministic approaches are unreliable, however, because the reverse narrow-to-wide mapping is one-to-many and has inherent uncertainty. Our solution is to use probabilistic maps, providing uncertainty estimates useful to many applications. We use a non-parametric Bayesian regression technique---local Gaussian process regression---to learn for each pixel's narrow-gamut color a probability distribution over the scene colors that could have created it. Using a variety of consumer cameras we show that these distributions, once learned from training data, are effective in simple probabilistic adaptations of two popular applications: multi-exposure imaging and photometric stereo. Our results on these applications are better than those of corresponding deterministic approaches, especially for saturated and out-of-gamut colors.
We propose modifying the aperture of a conventional color camera so that the effective aperture size for one color channel is smaller than that for the other two. This produces an image where different color channels have different depths-of-field, and from this we can computationally recover scene depth, reconstruct an all-focus image and achieve synthetic re-focusing, all from a single shot. These capabilities are enabled by a spatio-spectral image model that encodes the statistical relationship between gradient profiles across color channels. This approach substantially improves depth accuracy over alternative single-shot coded-aperture designs, and since it avoids introducing additional spatial distortions and is light efficient, it allows high-quality deblurring and lower exposure times. We demonstrate these benefits with comparisons on synthetic data, as well as results on images captured with a prototype lens.
We propose an approach for cross-view action recognition by way of ‘virtual views’ that connect the action descriptors extracted from one (source) view to those extracted from another (target) view. Each virtual view is associated with a linear transformation of the action descriptor, and the sequence of transformations arising from the sequence of virtual views aims at bridging the source and target views while preserving discrimination among action categories. Our approach is capable of operating without access to labeled action samples in the target view and without access to corresponding action instances in the two views, and it also naturally incorporate and exploit corresponding instances or partial labeling in the target view when they are available. The proposed approach achieves improved or competitive performance relative to existing methods when instance correspondences or target labels are available, and it goes beyond the capabilities of these methods by providing some level of discrimination even when neither correspondences nor target labels exist.
Achieving computer vision on micro-scale devices is a challenge. On these platforms, the power and mass constraints are severe enough for even the most common computations (matrix manipulations, convolution, etc.) to be difficult. This paper proposes and analyzes a class of miniature vision sensors that can help overcome these constraints. These sensors reduce power requirements through template-based optical convolution, and they enable a wide field-of-view within a small form through a novel optical design. We describe the trade-offs between the field of view, volume, and mass of these sensors and we provide analytic tools to navigate the design space. We demonstrate milli-scale prototypes for tasks such as locating edges, tracking targets, and detecting faces.
We explore a polar representation of optical flow in which each element of the brightness motion field is represented by its magnitude and orientation instead of its Cartesian projections. This seemingly small change in representation provides more direct access to the intrinsic structure of a flow field, and when used with existing variational inference procedures it provides a framework in which regularizers can be intuitively tailored for very different classes of motion. Our evaluations reveal that a flow estimation algorithm that is based on a polar representation can perform as well or better than the state-of-the-art when applied to traditional optical flow problems concerning camera or rigid scene motion, and at the same time, it facilitates both qualitative and quantitative improvements for non-traditional cases such as fluid flows and specular flows, whose structure is very different.
Color is known to be highly discriminative for many object recognition tasks, but is difficult to infer from uncontrolled images in which the illuminant is not known. Traditional methods for color constancy can improve surface reflectance estimates from such uncalibrated images, but their output depends significantly on the background scene. In many recognition and retrieval applications, we have access to image sets that contain multiple views of the same object in different environments; we show in this paper that correspondences between these images provide important constraints that can improve color constancy. We introduce the multi-view color constancy problem, and present a method to recover estimates of underlying surface reflectance based on joint estimation of these surface properties and the illuminants present in multiple images. The method can exploit image correspondences obtained by various alignment techniques, and we show examples based on matching local region features. Our results show that multi-view constraints can significantly improve estimates of both scene illuminants and object color (surface reflectance) when compared to a baseline single-view method.
Specular flow is the motion field induced on the image plane by the movement of points reflected by a curved, mirror-like surface. This flow provides information about surface shape, and when the camera and surface move as a fixed pair, shape can be recovered by solving linear differential equations along integral curves of flow. Previous analysis has shown that two distinct motions (i.e., two flow fields) are generally sufficient to guarantee a unique solution without externally-provided initial conditions. In this work, we show that we can often succeed with only one flow. The key idea is to exploit the fact that smooth surfaces induce integrability constraints on the surface normal field. We show that this induces a new differential equation that facilitates the propagation of shape information between integral curves of flow, and that combining this equation with known methods often permits the recovery of unique shape from a single specular flow given only a single seed point.
Biological visual systems are currently unrivaled by artificial systems in their ability to recognize faces and objects in highly variable and cluttered real-world environments. Biologically-inspired computer vision systems seek to capture key aspects of the computational architecture of the brain, and such approaches have proven successful across a range of standard object and face recognition tasks (e.g. [23, 8, 9, 18]). Here, we explore the effectiveness of these algorithms on a large-scale unconstrained real-world face recognition problem based on images taken from the Facebook social networking website. In particular, we use a family of biologically-inspired models derived from a high-throughput feature search paradigm [19, 15] to tackle a face identification task with up to one hundred individuals (a number that approaches the reasonable size of real-world social networks). We show that these models yield high levels of face-identification performance even when large numbers of individuals are considered; this performance increases steadily as more examples are used, and the models outperform a state-of-the-art commercial face recognition system. Finally, we discuss current limitations and future opportunities associated with datasets such as these, and we argue that careful creation of large sets is an important future direction.
Different materials reflect light in different ways, and this reflectance interacts with shape, lighting, and viewpoint to determine an object’s image. Common materials exhibit diverse reflectance effects, and this is a significant source of difficulty for image analysis. One strategy for dealing with this diversity is to build computational tools that exploit reflectance symmetries, such as reciprocity and isotropy, that are exhibited by broad classes of materials. By building tools that exploit these symmetries, one can create vision systems that are more likely to succeed in real-world, non-Lambertian environments. In this paper, we develop a framework for representing and exploiting reflectance symmetries. We analyze the conditions for distinct surface points to have local view and lighting conditions that are equivalent under these symmetries, and we represent these conditions in terms of the geometric structure they induce on the Gaussian sphere and its abstraction, the projective plane. We also study the behavior of these structures under perturbations of surface shape and explore applications to both calibrated and un-calibrated photometric stereo.
We propose an approach for linear unsupervised dimensionality reduction, based on the sparse linear model that has been used to probabilistically interpret sparse coding. We formulate an optimization problem for learning a linear projection from the original signal domain to a lower-dimensional one in a way that approximately preserves, in expectation, pairwise inner products in the sparse domain. We derive solutions to the problem, present nonlinear extensions, and discuss relations to compressed sensing. Our experiments using facial images, texture patches, and images of object categories suggest that the approach can improve our ability to recover meaningful structure in many classes of signals.
Blur is caused by a pixel receiving light from multiple scene points, and in many cases, such as object motion, the induced blur varies spatially across the image plane. However, the seemingly straight-forward task of estimating spatially-varying blur from a single image has proved hard to accomplish reliably. This work considers such blur and makes two contributions: a local blur cue that measures the likelihood of a small neighborhood being blurred by a candidate blur kernel; and an algorithm that, given an image, simultaneously selects a motion blur kernel and segments the region that it affects. The methods are shown to perform well on a diversity of images.
We present a novel optical setup and processing pipeline for measuring the 3D geometry and spatially-varying surface reflectance of physical objects. Central to our design is a digital camera and a high frequency spatially-modulated light source aligned to share a common focal point and optical axis. Pairs of such devices allow capturing a sequence of images from which precise measurements of geometry and reflectance can be recovered. Our approach is enabled by two technical contributions: a new active multiview stereo algorithm and an analysis of light descattering that has important implications for image-based reflectometry. We show that the geometry measured by our scanner is accurate to within 50 microns at a resolution of roughly 200 microns and that the reflectance agrees with reference data to within 5.5%. Additionally, we present an image relighting application and show renderings that agree very well with reference images at light and view positions far from those that were initially measured.
We present a new Precomputed Radiance Transfer (PRT) algorithm based on a two dimensional representation of isotropic BRDFs. Our approach involves precomputing matrices that allow quickly mapping environment lighting, which is represented in the global coordinate system, and the surface BRDFs, which are represented in a bivariate domain, to the local hemisphere at a surface location where the reflection integral is evaluated. When the lighting and BRDFs are represented in a wavelet basis, these rotation matrices are sparse and can be efficiently stored and combined with pre-computed visibility at run-time. Compared to prior techniques that also precompute wavelet rotation matrices, our method allows full control over the lighting and materials due to the way the BRDF is represented. Furthermore, this bivariate parameterization preserves sharp specular peaks and grazing effects that are attenuated in conventional parameterizations. We demonstrate a prototype rendering system that achieves real-time framerates while lighting and materials are edited.
We address the problem of inferring homogeneous reflectance (BRDF) from a single image of a known shape in an unknown real-world lighting environment. With appropriate representations of lighting and reflectance, the image provides bilinear constraints on the two signals, and our task is to blindly isolate the latter. We achieve this by leveraging the statistics of real-world illumination and estimating the reflectance that is most likely under a distribution of probable illumination environments. Experimental results with a variety of real and synthetic images suggest that useable reflectance information can be inferred in many cases, and that these estimates are stable under changes in lighting.
Different materials reflect light in different ways, and reflectance interacts with shape, lighting, and viewpoint to determine an object's image. Common materials exhibit diverse reflectance effects, and this is a significant source of difficulty for radiometric image analysis. One strategy for dealing with this diversity is to build computational tools that exploit reflectance symmetries, such as reciprocity and isotropy, that are exhibited by broad classes of materials. In this paper, we advocate the real projective plane as a tool for representing and exploiting these symmetries. In this approach, each point in the plane represents a surface normal that is visible from a fixed viewpoint, and reflectance symmetries are analyzed in terms of the geometric structures that they induce. We provide an overview of these structures and explore applications to both calibrated and un-calibrated photometric stereo.
When a curved mirror-like surface moves relative to its environment, it induces a motion field---or specular flow---on the image plane that observes it. This specular flow is related to the mirror's shape through a non-linear partial differential equation, and there is interest in understanding when and how this equation can be solved for surface shape. Existing analyses of this `shape from specular flow equation' have focused on closed-form solutions, and while they have yielded insight, their critical reliance on externally-provided initial conditions and/or specific motions makes them difficult to apply in practice. This paper resolves these issues. We show that a suitable reparameterization leads to a linear formulation of the shape from specular flow equation. This formulation radically simplifies the reconstruction process and allows, for example, both motion and shape to be recovered from as few as two specular flows even when no externally-provided initial conditions are available. Our analysis moves us closer to a practical method for recovering shape from specular flow that operates under arbitrary, unknown motions in unknown illumination environments and does not require additional shape information from other sources.
Images harvested from the Web are proving to be useful for many visual tasks, including recognition, geo-location, and three-dimensional reconstruction. These images are captured under a variety of lighting conditions by consumer-level digital cameras, and these cameras have color processing pipelines that are diverse, complex, and scene dependent. As a result, the color information contained in these images is difficult to exploit. In this paper, we analyze the factors that contribute to the color output of a typical camera, and we explore the use of parametric models for relating these output colors to meaningful scenes properties. We evaluate these models using a database of registered images captured with varying camera models, camera settings, and lighting conditions. The database is available online at http://vision.middlebury.edu/color/.
Most personal photos that are shared online are embedded in some form of social network, and these social networks are a potent sources of contextual information that can be leveraged for automatic image understanding. In this paper, we investigate the utility of social network context for the task of automatic face recognition in personal photographs. We combine face recognition scores with social context in a conditional random field (CRF) model and apply this model to label faces in photos from the popular online social network Facebook, which is now the top photo-sharing site on the Web with billions of photos in total. We demonstrate that our simple method of enhancing face recognition with social network context substantially increases recognition performance beyond that of a baseline face recognition system.