Music video used to build computer vision algorithm

May 19, 2011

An interactive music video is being used to develop an algorithm for use in computer vision visions, to help them recognize body poses
(Image: C-Mon & Kypski)

View 3 Images

1/3

An interactive music video is being used to develop an algorithm for use in computer vision visions, to help them recognize body poses
(Image: C-Mon & Kypski)

2/3

An interactive music video is being used to develop an algorithm for use in computer vision visions, to help them recognize body poses
(Image: C-Mon & Kypski)

3/3

View gallery - 3 images

Although already incorporated into devices such as the Microsoft Kinect gaming console, the ability of computer vision systems to recognize specific body poses is still very much a work in progress. One of the big challenges involves the chaos that such systems encounter in real-world use - while it's one thing to initially train a computer to recognize a given person standing and pointing against a neutral background, for instance, it's quite another to expect it to recognize that same stance in visual data where variables such as background, clothing and body type are constantly changing. A new interactive music video from Dutch electronic band C-Mon & Kypski, however, may help address that problem.

The band is inviting people to go to the One Frame of Fame website, where they will be presented with a variety of freeze frames of band members from the video for the song Less is More. Participants can then use their webcam to take a picture of themselves, in the same pose as the band member, and submit it to the project. Those snapshots are edited into the video, taking the place of the original frame that they copied, with a newly-updated version of the complete video going up once an hour.

Any one of the original "seed" frames displays a band member against a blank background, while the multiple submitted imitations of that seed display a variety of people in a variety of settings - usually with chaotic backgrounds. The only thing that the seeds and their various imitations have in common is the pose of the human subjects.

A team at New York University's Courant Institute of Mathematical Sciences is using this visual database to develop a "pose estimation" algorithm for use in computer vision systems. Having an initial clean base image of each pose, combined with a variety of images of that same pose shot under different conditions, is ideal for the purpose.

"This turned out to be the perfect data source for developing an algorithm that learns to compute similarity based on pose," said NYU researcher Graham Taylor. "Armed with the band's data and a few machine learning tricks up our sleeves, we built a system that is highly effective at matching people in similar pose but under widely different settings."

View gallery - 3 images