Monday, May 28, 2012

Project Dance Controller: Report 1

This post is part of Project Dance Controller.

It's a bachelor thesis project with the aim of letting the quantity of dance movements in the room control the volume level using the Kinect for Windows hardware.

You can read all reports here.

Last week

The last week I have been going through some previous research on the area of crowd analysis and human detection. I think that the best approach for me is to use the depth image that I get from the IR sensor. This makes it easier to detect humans against cluttered backgrounds or when occlusion occurs.

The skeleton tracking only allows for tracking of two persons at the time and puts some serious restraints on poses and positions making it unusable for my scenario.

I read a research report Ikemura and Fujiyoshi where they used a depth image from an flight-of-time (FOT) camera to detect humans in real time. They used a window based approach and were able to get the detection calculations down to 100 ms on an Intel 3 Ghz CPU. Their approach was not very robust against certain poses and positions, though. It also wasn't able to handle occlusions very well.

The work of Xia, Chen and Aggarwal from the University of Texas presented a different approach to detecting humans from depth images. They used the Kinect for Xbox360 device for retrieving the depth array and detected humans in three steps. First they narrowed down all areas where a human head may be using 2D chamfer distance matching. They then confirmed all heads by fitting a 3D model onto the area. Lastly, they expanded the section from the head to include the rest of the visible body of the human. This method improves on the window based method by Ikemura and Fujiyoshi but it still suffers from some limiations. It won't work very well if the person is wearing a hat or if part of the head is hidden.

In addition to reading up on some previous research I was also able to install my newly arrived Kinect for Windows device and write a skeleton report which I will be uploading to the repository on Wednesday when I get back home again.


I anticipated that it would be difficult to pick out good reference material and I still think I need to find more. But the 4 reports I've read so far have given me some very important insight into just was kind of methods may work and which won't.

This week

This week I will start to implement a skeleton library and document the API calls. This will help me get the API structure ready so I can early on see what works and what does not. The most basic things that my API should provide are some rudimentary detection of whether or not a Kinect device is connected to the computer or not. It should also provide some raw data from the depth sensor (possibly all raw data) so that the caller can get fine grained control if needed. That way if someone wants to use my dance quantifier in their own Kinect code they can piggy back my initialization and sensor detection.


I think the hardest part this week will be to find a good balance on the API between full coverage and simplicity. I need to both make it very easy to use but also provide full control of the device for advanced developers who want to make their own Kinect code work with mine.