My Senior Design Project was undertaken with Girish Jain, Jon Stokes and Dhruv Srivastava during Spring 2007 at Georgia Tech under the guidance of Professor David Anderson, then Ph.D. student Rajbabu Velmurugan and with lots of support from Steve Pressig of Texas Instruments. We implemented real time object tracking on the video stream from a camera, using a Texas Instruments DaVinci SoC that pairs a general purpose ARM core running linux OS with a TI 644x DSP core. DaVinci SoCs were fairly new at the time and I think we had one of the few development boards around thanks to Steve Pressig. The novelty of the platform was both exciting and challenging (we often ran into driver, compiler, firmware issues) when trying to get code to run across both ARM and DSP cores in a reliable manner. The most entertaining part of the semester was explaining why I needed to “borrow” a set of Pool balls for my “project” to the amused and somewhat suspicious Georgia Tech Recreation Center staff (see “Results” video at the end).
System Architecture
The figure below provides a high level visualization of information flow through the system and identifies the the core components of the real time object detection and tracking algorithm.
Hardware and Software Stacks
In the figure below, the left panel shows the formidable amount of hardware available on the Davinci SoC, which includes the ARM microprocessor (red) and the DSP Core (blue) in addition to memory, interfaces and peripherals.
The right panel provides a picture of the software stack. The ARM core ran MontaVista Linux and POSIX threads enabled multi-threaded C code. Our main program ran in the user layer on the ARM CPU, which was responsible for reading frames off the camera, passing them through the algorithm, caching algorithm results (about object locations and sizes) and passing the modified frames to the display for rendering. The object detection algorithm was written in a separate subroutine to run on the DSP core. We used the lightweight RCS version control system to keep track of all the code we tweaked after we got our first prototype working. Our compilation script built and linked the DSP and ARM code.
Object Detection Algorithm
We used a segmentation scheme that builds regions (of an object) around leader pixels. Leader pixels are defined as pixels with a luminosity gradient (w.r.t neighboring pixels) over a certain threshold. We are essentially doing edge detection under the assumption that objects appear homogenous (in their luminosity). We ran into a significant issue because the algorithms would identify too many leader pixels, so identifying a homogenous region around a leader pixel would lead to too many objects being detected in a frame that were not there.
To address this issue, we divided each frame into a number of smaller seggments called subframes (green boxes in the right panel). If a subframe contains more than a certain threshold number of leader pixels, then we assume the subframe contains the edge of an object. Next, we consider all contigious subframes as part of the same object and compute its centroid and size, which allows us to draw the box. The subframe approach smoothens (over space) the noise captured by the leader pixels. We preserve the centroid locations of objects across frames, keeping state (on the ARM side implementation of the algorithm), so we could continue to track objects across frames and resolve cases that arise when objects came close to each other or collided. This algorithm was very effective in video streams that met the underlying assumption about luminosity (gradients exist and objects have relatively homogenous luminosity).
Final Results: Object Tracking Video
This is the demo video from our final project presentation. Credit to Jon Stokes for surprising us all with a well chosen soundtrack the movie we all expected to be silent. No Pool balls were harmed in the making of this movie.