DummiesImageProcessing: July 2012

Tuesday, 3 July 2012

Important techniques in Vision for Robotics

Some of the key computer vision skills required for Humanoid Robotic Vision from my point of understanding are:

1.) Hough transforms to detect line/planes/contours etc.

2.) Perspective vision (Especially if we use two cameras for (eye-like) vision, and for depth mapping).

3.) Block matching for correcting the left and right side vision problems.(Jut close one eye and you'll find the image significantly different from when its two eyes, to simulate this kind of correction for a computer is a pretty difficult task and is ongoing current research efforts.)

4.) Calculating odometry information by calculating the kinematic transforms for each step the robot takes. This is necessary for keeping balance without a special sensor for balance.

5.) Calculating the floor planes from the visual information using standard 3-D reconstruction techniques and/or depth maps.

6.) Creating an occupancy grid once we have made the floor plane and clustered it into object blocked clusters and free clusters.

7.) Making an efficient algorithm to plan paths based on the Occupancy Grid. (Maybe A* or Dijkstra's or something)

Monday, 2 July 2012

No Kinect for External Environments: Why?

Here's why Skeletal Tracking Algorithms used by Kinect does work so well in the external environments:

Kinect sensor isn't suited for outside/external environments. It is based off skeletal tracking. Now here's how skeletal tracking works:

For the depth image we:
1) Thresholding on the depth image to extract the foreground from the image.
2) Noise is removed using morphological operations of erosion and dilation.
3) Further small blob removal is done to get a clean foregound segmented image.
4) This helps us focus on only the subject in the image and calculate the Extended Distance Transform.

For the RGB image we:
1) Face and upper body detection.
2) Skin Segmentation.
3) Arm Fitting

For the face and upper body detection, we use hear features in the viola-jones face detection algorithm. The hear features used are edge detectors, line features and centre surround features.

Problems in the external environment are:
1) Skeletal tracking doesn't work for non-humans.
2) Gestures/Human/environment movements cannot be tracked as easily using skeletal tracking when the perspective view is different or rather say sideways etc. A problem of the extended distance transform used.
3) Its far more difficult to morphologically clean up an image in the wide variety of external environments we can think of.

Ref: http://home.iitk.ac.in/~akar/cs397/Skeletal%20Tracking%20Using%20Microsoft%20Kinect.pdf