Currently, computer search and classification of images is based on the name of the file or folder or on features such as size and date. That’s fine if the name of the file reflects its content but isn’t much good when the file is given an abstract name that only holds meaning to the person providing it. This drawback means companies in the search business, such as Google and Microsoft, are extremely interested in giving computers the ability to automatically interpret the visual contents and video. A technique developed by the University of Granada does just that, allowing pictures to be classified automatically based on whether individuals or specific objects are present in the images.
One of the difficulties faced by the researchers when they were looking to develop a way for a computer to recognize a person is that in many images the person is only partially visible – usually the upper body. So although there were already successful full-body detectors available, the team decided to develop an upper body detector designed to detect the region between the top of the head and the upper half of the torso using a near-frontal viewpoint. The researchers say this near-frontal detector works well for viewpoints up to 30 degrees away from straight frontal, and also detects back views.
The system first detects the upper body of the person in frame which delivers the approximate location and scale of the person and roughly where the torso and head should lie. This allows the system to restrict the search area which is then further restricted using color models to estimate the person’s appearance automatically from subregions of the detection window likely to contain the person. These are then used to initialize a segmentation algorithm and the search area for body parts is progressively reduced, eventually resulting in the estimation of a 2D pose.
The ability to estimate a 2D pose allows the system to retrieve shots containing a particular pose from a video database in what the researchers call a Pose Search. It does this by looking at the spatial configuration of body parts returned by the pose estimator with features which are person, clothing, background and lighting independent. Using this technique the system is able to automatically classify video scenes where people appear in a specific pose. It also allows human actions such as walking, jumping, bending down, etc. to be detected in video sequences.
The results of the research, which is being carried out by Manuel Jesús Marín Jiménez, who is currently working at the University of Córdoba, and coordinated by Professor Nicolás Pérez de la Blanca Capilla, Department of Computering and Artificial Intelligence, University of Granada, have been presented in a number of international conferences, including the International Conference in Pattern Recognition (2006), and the conference on Computer Vision and Pattern Recognition (2008 and 2009).
See the stories that matter in your inbox every morning