A programmable System on a Chip (SoC) with a DCT accelerator

I worked on this project in Spring 2007 at Georgia Tech as part of ECE 4273: DSP Chip Design, under the guidance of Professor Vijay Madisetti and in collaboration with Chung Ching Lin, Dhruv Srivastava, Girish Jain, Michael Smith, Navraj Singh, Ramanathan Palaniappan and Sunpyo Hong. The project integrates a Discrete Cosine Transform (DCT) accelarator with a 16 bit Microcontroller which features its own Real Time Operating System (RTOS) and additional memory. The idea was to build a a programmable System on a Chip (SoC) which can feature in an application that requires fast computation of DCTs for image processing (e.g. compressing a raw image into JPEG), alongside other general purpose computation. We finished this project with a multi-threaded application that ran on the microcontroller’s RTOS and performed two tasks, Finite Impulse Response (FIR) filtering and image compression using a DCT. The latter task was offloaded by the RTOS to the DCT core of the SoC we developed, while the former ran on the microcontroller. Although they are commonplace now, SoCs were rising in prominence around the time of this project. Our code and documentation are open source.

Let us start with a 10,000 foot view of what we built. The system architecture below visualizes the complete system as instantiated in a testbench. dct_core

The core components (16 bit microcontroller, DCT core and external memory) are in grey, the additional logic that glues the system together is depicted in cyan boxes. The microcontroller has 8 Kb of internal memory and it is not able to address the first 8 Kb of external memory (denoted in red) as a result. The boxes with blue dots depict testbench only modules that faciliate I/O and testing. The verification was limited to Xilinx ModelSim simulations and the design was not synthesized on a real FPGA.

The work done by the group had 4 main components to it.

  1. Wishbone compliance of the MDCT core (Wishbone Wrapper): We developed a wishbone wrapper for the DCT core, so that the processor could communicate with it and control it as a Wishbone master. The ‘DCT Core’ in the first figure is actually the entirety of the Wishbone compliant DCT core (with memory and additional wishbone protocol logic) shown below.
    dct_core

  2. Integration of 16 bit MCU, WB compliant MDCT core, and memory to provide a wishbone interconnected system : After making the DCT core wishbone compliant, we integrated all the components (microcontroller, DCT core, external memory) with some extra chip selection logic to finalize the SoC visualized in the first figure.

  3. 16 bit MCU assembly code to execute DCT’s and create 4x4 and 2x2 tiles : We developed assembly subroutines that were integrated into the RTOS for the microcontroller, which an application programmer could call from their code, like APIs, to execute DCTs using the accelerator.

  4. A high level multi-threaded application: Finally, we stepped into the shoes of our customer, the application programmer who we developed this system for, and wrote a program in C to that performed multiple tasks including FIR filtering (ostensibly on an audio signal) and DCTs (ostensibly for image compression) leveraging all the hardware (microcontroller + DCT) available. We verified that that the entire software and hardware stacks worked as designed giving application programmers using a high level programming language (C) control over the general purpose computing hardware (16 bit microcontroller) and dedicated application specific hardware (DCT core) on the SoC, without getting bogged down in system level details. The most interesting part of this process was the discovery of (and some fixes for) some bugs in the C compiler, Assembler for the 16 bit Microcontroller.

Acknowedgements: The 16 bit Microcontroller used in this project system features an assembler, a compiler, an RTOS and was Wishbone compliant. It was written by Dr. Juergen Sauermann. The 64 point DCT core used has been written by Mr. Michael Krepa. The additional memory added to the system uses a memory model developed by Mr. Jamil Khatib. All of these cores were obtained from www.opencores.org.

For readers interested in using this code base and wanting to get into implementation details, I think you will find our Systems Specification a useful starting point.