Monday, June 4, 2012

Project Dance Controller: Report 2

This post is part of Project Dance Controller.

It's a bachelor thesis project with the aim of letting the quantity of dance movements in the room control the volume level using the Kinect for Windows hardware.

You can read all reports here.

Last week

I have created a skeleton library with some API calls which I have documented in code. I will use the generator from the Stoffi project to convert the specification comments into wiki pages so it can be read directly on the wiki. The code is available on GitHub.

I have also started to look through some of the code examples for dealing with the Kinect for Windows device in C#. It seems pretty straight forward to get the raw data. The tricky part will be to do something useful with it.

I've also uploaded the report as I promised in the last report. It's in docx format since it's a work in progress. I will create a PDF version when it's finished.

Challenges

There were no problems with the code but I did find some problems with the plan to quantify movements according to fine movements in POIs.

Since I studied the reference material I mentioned in the last report I've come to the conclusion that I should use the depth image for analysis. It provides better means of finding edges (and thus humans) than the visible light camera since it can handle strange backgrounds, weird colors and bad or non-existent lightning. The skeleton tracking is a total no-go as it can only track to people at a time and it's very much hit and miss unless the person is positioned right in front of the camera with all limbs visible, facing forward.

So I have revised some of the plans a little bit. I will not try to filter out movement such as walking or going from sitting to standing. Instead I will do my analysis in two stages: first I will detect heads and count the number of people in the room. I will then quantify the total amount of movement, normalize it to scale, and balance it toward the number of people in the room. The backside will be that funny hats or other stuff touching the head will create a false negative but since the number of heads is only to better interpret the meaning of total movement it will not affect the outcome too much.

This means that two weeks will change. Instead of identifying POIs (head, hands, etc) I will detect heads and the week after I will quantify all movements in the depth image and relate that to the number of people detected. I have updated the schedule accordingly.

This week

Today I will start working on detecting the presence of a Kinect device and try to control its motor as well as getting the depth image from it. I will finally start to get acquainted with the Kinect SDK and really get to know the device. Most probably, I will learn some stuff which will force me to rethink the API or at least extend it a little bit. That's why I've kept the API very simple and basic.

Challenges

The biggest problem I anticipate will be getting the device and my code to work together. It's the first time I've ever controlled an external device from code (never even read stuff from a webcam or microphone before) so there's a lot which can prove challenging. Hopefully though there's plenty of resources to guide me if I ever get stuck.