Monday, June 18, 2012

Project Dance Controller: Report 4

This post is part of Project Dance Controller.

It's a bachelor thesis project with the aim of letting the quantity of dance movements in the room control the volume level using the Kinect for Windows hardware.

You can read all reports here.

Last week

The last days I have been busy trying to smooth out and fix the depth image that I get from the IR sensor. I have successfully interpolated missing pixels and applied a bitmask to isolate only bits indicating the distance to the object.

As a start here's a picture which has been shifted three bits. I have also translated the various depths into the colors blue (far away), green (middle) and red (near). Black areas are missing pixels. This is where the IR laser isn't able to determine the distance due to the material absorbing, refracting or diffracting the light, preventing it from reflecting back to the sensor. I also encoded pixels which have a higher depth than the max depth into white. There are no such pixels on this picture but they appear if I aim the Kinect toward something that's farther away than 4 meters.

A raw depth image.

Removing white pixels is very easy. All I do is set them to the max depth (4000) and they will appear blue. The black pixels however are more difficult. Here I decided to use the nearest neighbor interpolation algorithm to determine the value for each black pixel. What I do is that I start by looking around the pixel for any non-black pixel. I extend my search farther and farther out from the pixel until I find a pixel with a correct value. When I do I just give my black pixel that value.

An interpolated depth image.

This produces some artifacts since the scanning is linear (left-to-right top-down). The result is a lot of blocking and this actually creates more "noise movement" than the original depth image.

The interpolation plus bit shifting is done in 8-9 milliseconds (ms) which is pretty fast and allows me to process images at a rate of over 100 frames per second (fps).

I also tried to apply a mean filter after the interpolation to smooth out the blocking without removing edges but that cost some serious amount of CPU cycles and increased the time to over 30 ms giving me less than a 30 fps rate.


The biggest issue here is that the interpolation may remove "pixel noise" (pixels with unknown depth) but introduces more "movement noise" (the interpolation algorithm switches some blocks of pixels from green (around 2 meters) to red (around 30 centimeters). That's a lot of movement that does not actually occur.

So as much as it hurts I may have to skip the interpolation and mean filter all together and declare this week a week I just learned why certain techniques does not fit the purpose of my code. I should not see this as a failure, instead this week was the week I did interpolation and mean filtering for the first time, learning new and well-known algorithms in the field of image processing. However, neither of those algorithms works for me so I will just use the raw depth image when I move onto the next step: analyzing difference between two frames.

This week

So this week I will try to actually get a value between 0 and 10 out of the images. Even without interpolation there's still some noise movement so I need a way to remove that. I was thinking of dividing the image up in squares and analyse each square separately. To remove noise movement I will take 3-5 frames and calculate the average movement between the frames, then compare it to the average movement of the next 3-5 frames. This should keep my algorithm fast enough while still smoothing out movements caused by missing pixels.


The biggest challenge will be to account for the false movement of the missing pixels while still being able to do the processing fast enough for it to both feel responsive for the user and properly detect real movement in the picture.

If I have time and ability I should try to make the algorithm not give false positives when the camera is moved but I consider that low priority right now.