Monday, June 11, 2012

Project Dance Controller: Report 3


This post is part of Project Dance Controller.

It's a bachelor thesis project with the aim of letting the quantity of dance movements in the room control the volume level using the Kinect for Windows hardware.

You can read all reports here.

Last week

I have successfully detected the presence of the Kinect device. My events Connected and Disconnected fire when the device is plugged or unplugged respectively. I also managed to get in some code to control the motor and retrieve the depth image from the sensor.

Regarding the exposure of the raw data there is a sensor class which I can just forward in a property to any caller so they can manipulate the sensor directly. I have not yet added that code but aim to do so this week.

Challenges

I did not really have any problems, even though I was pretty sure last week there would be some. It's a really nice surprise when stuff actually just work out great.

This week

This week I will start to actually analyze the depth image. The data is a simple array of shorts. The array is of length FrameWidth x FrameHeight (640x800). There's several things I need to do before I can even start with the analysis. First of all the 16 bit shorts have three bits which is used to identify "layers" where players are detected. This detection is not very good and mostly only work in ideal conditions. So I will need to discard those bits by shifting the short three bits. The remaining 13 bits is the distance to the point in millimeters.

Next there's a lot of "holes" in the depth image due to the infrared laser not being able to detect certain surfaces. My coffee table, my hair, glass bottles and my shelf are some of the things which create these holes. However, these holes don't have a very clear border or edge, they kind of flicker. This creates "movement" in the image even though nothing is actually moving. I need a way to remove that "noise movement" from the image.

Challenges

This preprocessing will create some overhead in my calculations for each frame. After reading up on the previous research done with Kinect depth images it seems that it will be hard to actually be able to process around 30 frames per second if I intend to continue on and detect heads as well. As an example one of my references which detects heads and then continue in that layer to extract the body (with mixed results) runs in over 27 seconds for each frame. That is totally unacceptable for me. My analysis can not take more than 60 milliseconds.

So the solution will be to again revise the plan. Instead of detecting heads I will just process the depth image this week and prepare it for next week when I will detect movements using two images. I believe that if I relate the movement to the distance I don't need to know the number of people in the room.

This should also help me get the whole analysis done in a fairly fast manner and let me keep a high bitrate from the sensor.