This page is an overview of existing RACORS non-processor IPs. For each IP the most important features and properties are listed. Some common properties of all RACORS non-processor IPs can be foundhere. For more detailled information please send an email to

Most of the IP blocks on this page have been designed as peripherals for RACORS processors and have peripheral interfaces for glueless connection to a 16-bit or 32-bit RACORS peripheral bus. Memory master interfaces are designed to connect to multi-port memory arbiters as described in the SDRAM/DDR-SDRAM section of this page. Blocks with other standard or non-standard peripheral and memory interfaces can be derived on request.

IP Groups

Fast Ethernet MAC

These MAC modules are optimized for low resource consumption. FPGA implementations are in the range of 500 LEs and below. The PHY management interface is implemented in software via dedicated GPIO signals of the MAC module. Multiple variants exist with different peripheral interfaces (16-bit or 32-bit), PHY interfaces (MII or RGMII) and packet-data interfaces (local memory DMA or external memory DMA).

Common Features

  • Full Duplex only

  • Hardware CRC32 generation and checking

  • MAC Address filter (can be disabled)

  • DMA for packet data (no local packet buffers)

  • GPIOs for software access to the PHY management registers

  • Interrupt generation (various programmable events)

  • FPGA Size: ~460 LEs with 16-bit peripheral-I/f, 16-bit local DMA-I/f and MII PHY interface

The fast Ethernet MAC modules are fully synchronous designs intended to run at the peripheral bus clock. With an MII PHY interface the receive and transmit clocks are inputs of the MAC module. All input signals driven by the PHY are synchronized to the system clock. Edge detection circuitry is used to synchronize input and output signals of the PHY interface to the receive and transmit clocks.

With an RGMII interface the transmit clock is an output signal of the MAC module with 25MHz/2.5MHz for 100/10Mbit Ethernet. To keep the MAC module fully synchronous the system clock must be a multiple of 25MHz, e.g. 100MHz so that the transmit clock can be derived from the system clock.

Typical use case of variants with local DMA for packet data are small processor systems with no external memory. Packet data are transferred to/from the processor's data RAM via dedicated DMA channels. This is more flexible than using packet buffer RAMs conected to the MAC module. With the local DMA concept software determines the number and size of receive/transmit packet buffers and where they are located in the processor's data RAM.

Gigabit Ethernet MAC

As with the fast Ethernet MAC modules the GBit MAC is optimized for low resource consumption. The PHY management interface is implemented in software via dedicated GPIO signals of the MAC module. At the moment a single implementation with 32-bit peripheral-I/f and RGMII PHY-I/f is available. Other variants can be derived on request.


  • Full Duplex only

  • Hardware CRC32 generation and checking

  • Preamble filter, if enabled received packets with corrupt preamble are discarded.

  • MAC Address filter (can be disabled)

  • Separate memory master interfaces for transmit and receive packets

  • Packet data transfers with n x 16 bytes bursts (n=1-8)

  • GPIOs for software access to the PHY management registers

  • RGMII PHY interface @125MHz DDR

  • 32-bit peripheral interface

  • Interrupt generation (various programmable events)

  • FPGA Size: ~1250 LEs

  • Separate 250MHz clock input for the RGMII transmitter

Due to the special clocks required for the RGMII PHY interface the GBit MAC cannot be fully synchronous. Most of the logic including peripheral interface and memory interfaces run from the system clock. A separate 250MHz clock is required for the RGMII transmitter. The RGMII receiver is clocked by the 125MHz receive clock either directly or phase corrected by a PLL (depends on the phase relation of the RGMII receiver input signals).

Ethernet System IP

These are intelligent IP blocks containig a processor and firmware. An Ethernet MAC, a DMA controller instruction and data RAMs are connected to the processor. Local firmware handles Ethernet packet transmission and reception with buffer management, optional packet filtering and local packet generation. With a small processor like the sf16bu the entire block is still rather small, e.g. < 2500 LEs for the examples described below.

There are many applications for this type of intelligent IP which can be used either as an intelligent peripheral of a processor system (example 1) or as an autonomous unit (example 2).

Example 1: Ethernet MAC with debug interface

In this example the IP block is a peripheral of a processor. In addition the block provides the debug-interface for this processor (e.g. JTAG, ...). A single Ethernet port is used for network access and as debug interface.

A dedicated UDP port is used for debug messages. Firmware on the local processor inside the IP block filters packets to the debug UDP port and directs them to the debug interface. Data received from the debug interface is sent back to the debug Host using the same UDP port. All remaining receive/transmit packets are transferred to the main memory of the system for access by the main processor. The filtering of debug messages is completely transparent to the main procesor.

With -2500 LEs + 13kBytes block RAM for 100Mbit Ethernet this block provides a very efficient Network+Debug solution for FPGAs.

Example 2: PCM audio data streaming to an audio DAC

This is an example of an autonomous IP block. It contains a processor with firmware, an Ethernet MAC, a DMA controller, instruction and data RAMs. UDP packets with 1kBytes PCM payload are sent to the block, which outputs the PCM data as a continues stream to an audio DAC. A 6-deep receive packet buffer is implemented to enable uninterrupted audio. Depending on the network environment (load, latency of handshake packets) a larger packet buffer may be required.

The firmware inside the IP block contains a partial protocol stack up to the UDP layer. To keep the instruction and data RAM small only protocol functions that are required for the PCM streaming are implemented:

  • MAC Layer: MAC address generation and checking, filtering based on the next layer protocol, only ARP and IP packets are further processed.

  • Ping (echoe response) for testing, to check if the module can be reached from other nodes of the network

  • ARP (Address Resolution Protocol) to enable the streaming server to find out the MAC address of the streaming target.

  • IP: filter out UDP packets

  • UDP: port filter, manage the PCM buffer queue

On an FPGA the example IP block takes ~2300 LEs and 13kBytes of block RAM. The same basic comcept can be used for streaming or non-streaming data transfers to any standard or custom interface.


These modules provide full SD/SDHC/eMMC Host controller functionality and require little attention from the system processor. Multiple variants are available with different peripheral interfaces (16-bit or 32-bit), different card-interfaces (1-bit, 4-bit or 8-bit) and different data-interfaces (local DMA or external memory DMA).

Common Features

  • Fully synchronous design with 100MHz system clock

  • Programmable card clock from 400kHz to 50MHz

  • CRC7 generation and checking for commands and responses

  • CRC16 generation and checking for block data

  • DMA for block data (no local block buffers)

  • Buffered commands for seamless operation

  • Interrupt generation (various programmable events)

  • FPGA size: ~310 LEs with 1-bit card-I/f, 16-bit peripheral-I/f, 16-bit local DMA-I/f

  • FPGA size: ~780 LEs with 4-bit card-I/f, 32-bit peripheral-I/f, 32-bit local DMA-I/f

The existing variants are fully synchronous and have been designed for systems with 100MHz peripheral bus clock. Typical card clocks of 50MHz, 25MHz and lower can be generated from the 100MHz system clock by integer division. On request derivatives with more flexible card clock generation can be generated e.g. with a separate 2 x card clock input that is independent from the system clock.


Intelligent IP blocks for accessing SD/SDHC/eMMC cards can be created by combining a small processor system with an SDHC Host controller and firmware. Data can be transferred directly between user interfaces and files on the SD/SDHC card which leads to many interresting use cases and applications. Users don't have to be familiar with the SD card command protocol.

Depending on the user interface(s) and variant of SD/SDHC/eMMC Host controller total IP block sizes for FPGAs in the range of 2000-3000 LEs plus 10-20 block RAMs can be realized.

The maximum read/write data rates that can be achieved depend on the card speed, the RAM available for buffering and the way data is stored of the card (file system or raw, sequential block read/write). With fast SDHC cards, sufficient buffering and sequential read/write read rates of 20MBytes/s and write-rates of 10MBytes/s can be achieved. With eMMC chips significanlty higher speeds with read-rates >100MBytes/s and write-rates >50MBytes/s are possible.

DMA Controllers

Most systems based on RACORS processors have a DMA controller. In MCU like FPGA systems DMA to/from the processor's local (on-chip) data RAM is used for data transfers to/from peripherals and for transfers to/from external memory.

For FPGAs a generic DMA controller architecture with configurable #of channels and channel parameters would waste resources. As a consequence many variants exist that are optimized for specific use cases. For new projects the variant with the best matching features is used as template and adapted to meet the project specific requirements.

DMA to/from external memory instead of a data cache

These special purpose DMA channels are used to copy data from the processor's local on-chip data RAM to external memory (typically SDRAM or DDR-SDRAM) and vice versa. They are special not from a DMA controller point of view but from the overall system concept. The channels are used as alternative to a data cache.

In I/O intensive systems with a moderate share of control software using DMA instead of a data cache in many cases is more efficient. Because data from I/O channels is volatile (e.g. network packets) the hit rates of a data cache are low. With DMA, data can be pre-fetched from external memory and then accessed in the processor's local data memory with zero wait states. In a similar way output data is copied to external memory by DMA in the background while the processor can work on other tasks.

The concept is particularly usefull with large data access units like e.g. Ethernet packets, mass-storage data blocks or graphics line buffers. With larger access units the software overhead for setting up the DMA operations is small compared to the performance gain over a data cache solution.


This memory type gets a separate section here because it is so important for FPGAs. Most of the high density FPGA families for digital designs need external non-volatile memory to load the FPGA hardware configuration at power up time. SPI-FLASH is the most attractive option because of low pin count and high speed. Beside startup configuration data other data for different purposes can be stored in the same SPI-FLASH device:

  • System software: in processor systems with external DRAM the system software can be copied from the configuration SPI-FLASH to the DRAM at system startup time.

  • System software: in MCU like systems with no external DRAM program code and constant data can be mapped into the configuration SPI-FLASH and accessed by the processor either directly or with instruction and data caches in between.

  • Some FPGAs families support self-reconfiguration. Multiple configurations can be stored in the same FLASH device and can be loaded dynamically when needed.

Multiple variants of SPI-FLASH controllers are available that can be used e.g. to access the FPGA configuration device during normal operation. Debuggers for RACORS processors support erase, program and reading of the configuration SPI-FLASH via the debug interface.


This functionality is divided into two separate modules:

  • A physical interface module (memory type specific)

  • A multi-port memory arbiter module with client specific interfaces and burst buffers

Physical Interface Controller

These modules are optimized for high throughput and resource efficiency. Parameters like burst length, CAS latency and address segmentation (page, row, bank-select) are fixed for a particular memory type and use case. For best performance memory banks are kept open as long as possible. Bank precharge commands are issued only in case of time-out (timer per bank) or page misses. The refresh controller uses periods of no memory access for refresh commands as much as possible. Only if there are un-refreshed rows left close to the end of the maximum refresh period refresh commands take priority over memory access requests.

Multiple variants are availabe for different memory types (SDRAM or DDR-SDRAM) different memory data width (16-bit or 32-bit) and size (address bits). The memory architecture determines the #of row address bits and #of column address bits. in the overall address space as seen by the arbiter module the bank select bits (typically 2 bits for 4 banks) are placed in the middle between page address bits (low bits) and row address bits (high bits). This enables efficient linear access across page boundaries.

Physical Interface Common Properties

  • Optimized for high throughput and resource efficiency

  • Parameters like burst-length, CAS-latency and address segmentation are fixed for a particular memory type and use case

  • Banks are kept open as long as possible to avoid unnecessary pre-charging

  • Bank select bits are placed in the middle for efficient linear accesses across page boundaries

  • Efficient refresh controller, uses non-access periods whenever possible

  • Back to back read & write bursts

  • Byte enables for incomplete write bursts

  • FPGA size ~370 LEs for SDRAM, 128MBytes, 32-bit width

  • FPGA size ~410 LEs for DDR-SDRAM, 32MBytes, 16-bit width

Physical controllers are fully synchronous designs except for a small part of DDR-SDRAM controllers that runs at twice the system clock. The clock output to the memory device(s) requires a separate PLL output with a special (tuned) phase relation to the system clock.

Multi-port Arbiter

Instead of a memory bus a multi-port arbiter is used with separate address and data channels for each client. This concept has the following advantages:

  • Client interfaces can be optimized for each client (width, throughput, burst-buffers ...)

  • Consumes less logic/routing resources: lower driver load, lower data-width

  • Higher throughput, no blocking between clients

  • Shared burst buffer RAMs in the arbiter instead of client local burst buffers

To further illustrate the concept here is an example of an arbiter from an existing FPGA design:

  • Arbiter for 128MBytes, 32-bit wide SRDAM @100MHz, max bandwidth of 400MBytes/s (theoretical)

  • Fixed memory burst length of 16 bytes (4 x 32-bit words)

  • Six client interfaces with fixed priority from 0 (highest) to 5 (lowest)

  • Prio0: Debug-Interface, read/write, fixed burst length of 16 bytes, no burst buffers, max 23kBytes/s

  • Prio1: SD/SDHC card, 1-8 x 16 byte bursts, 2 x 128 bytes burst buffers, max 25MBytes/s

  • Prio2: Gbit Ethernet, write-only 1-8 x 16 byte bursts, 2 x 128 bytes burst buffers, max 120MBytes/s

  • Prio3: Gbit Ethernet, read-only 1-8 x 16 byte bursts, 2 x 128 bytes burst buffers, max 120MBytes/s

  • Prio4: DMA, write-only 1-8 x 16 byte bursts, 2 x 128 bytes burst buffers, max 133MBytes/s

  • Prio5: DMA, read-only 1-8 x 16 byte bursts, 2 x 128 bytes burst buffers, max 133MBytes/s

  • FPGA size ~850 LEs + 2 x 1kBytes block RAMs

The debug interface in this case has a UART interface to the Host PC. It is given the highest priority because with the low bandwidth and short burst it cannot negatively impact available bandwidth or latency of the other clients.

The SD card has the next highest priority because of the relatively low bandwidth compared to the remaining clients. Next are the Ethernet clients followed by the DMA to/from the processor's local data RAM which uses the left over bandwidth.

The fast clients have burst buffers of 2 x max burst-length size to de-couple the data rates of the memory and the client interfaces. All burst buffers of one direction (read or write) are mapped into a single 1kBytes block RAM which is more efficient than using local buffers in the clients.

Serial I/O

Following a resource optimized concept for FPGAs most serial I/O functions are rather small and simple designs. In many cases there is more effort adapting the interfaces to the target system (DMA, interrupts, etc) than developing the actual I/O function. The following paragraphs provide some general properties and infos about available blocks.


Probably the most popular of the 'old' serial interfaces because it is used for chip to chip connections and is not a highly standardized device interface. Recently SPI became even more popular again as a low-pin alternative to connect NOR-FLASH memories to SoCs. For higher data rates dual and quad SPI versions with 2 or 4 data lines and clock rates > 100MHz are available.

Resource efficient implementations optimized for a particular use case are either master or slave. To support high data rates receive/transmit FIFOs or DMA are required. Many variants are available that differ in one or more of the following properties:

  • Master or slave

  • DMA or FIFO, FIFO size

  • Peripheral interface 16-bit ore 32-bit

  • Master clocking scheme: derived from system clock or separate PLL output

  • Slave clocking scheme: sampled with system clock or used directly

  • Special support for SPI-FLASH, e.g. insertion of dummy cycles

A special feature implemented in some variants with FIFOs is support for misaligned access to the FIFO. With protocols that contain e.g. a mixture of 8-bit, 16-bit and 32-bit data objects these objects can be written to and read from the FIFOs with one processor access regardless of their order and alignment in the byte stream.


Slow but still popular as a low pin count (2-wire) chip to chip interface to access configuration or status registers. IIC is also the most common interface for EEPROMs. In most use cases today the clock line is a master output and slave input and not bidirectional as in the original specification.

In many use cases software that emulates the protocol via GPIO pins is sufficient especially if the interface is used only for initialization to setup configuration parameters. A hardware implementation makes sense e.g. to access EEPROMs in the background with little or no processor load. A special property of EEPROM IIC slaves is that they use the acknowledge mechanism to indicate when a program operation is finished. The master has to implement so called acknowledge polling to sense when the EEPROM is ready for the next program or read operation.

Beside software implementations two variants are available mainly targeted at EEPROM slaves. The first variant is interrupt driven and transmits/receives one byte at a time. The second variant has block RAM based read/write FIFOs to program or read longer byte sequences autonomously. Both variants support acknowledge polling to efficintly handle EEPROM slaves.


Not very popular anymore because of low speed and new PCs don't have a UART interface anymore. Still usefull as debugging interface for processor systems with small memory foot prints. In most cases the protocol setting is 1 start-bit, 8 data bits, no parity, 1 stop bit. Programmability of these parameters is not really necessary.

The low end debug modules for RACORS processor systems have a half-duplex UART interface to connect to the debug Host. A special feature of this UART is the auto-baud rate detection. After a reset the Host sends a 0x80 character. The UART module measures the length of the low period which is 8 bit times, sets its baud rate generator accordingly and sends back a 0x47 character to indicate to the Host that the connection was successfull.

Some variants of available UART based debug modules allow the processor to use the UART as a communication peripheral. This is particularly usefull in FPGA systems where resource efficiency is most important. A select input from a pin with jumper or switch determines if the UART is used a debugging interface or as terminal/console interface.


The most common interface for audio ADCs/DACs. The bandwitdh is rather low, e.g. for 44.1kHz Stereo in/out, 16-bit samples the bandwidth is 352.8 kBytes/s. Most audio ADCs/DACs have an additional I2C interface to access control, configuration and status registers.

Multiple variants are available most of them with small transmit/receive FIFOs and no DMA. The following example describes an IIS block with 16-bit peripheral interface and simultaneous transmit/receive operation.

  • 5 x 32-bit transmit FIFO (2 x 16-bit left/right samples)

  • 5 x 32-bit receive FIFO (2 x 16-bit left/right samples)

  • interrupt generation on transmit-FIFO low, receive FIFO almost full

  • Clock generation for 44.1kHz or 48kHz sample rates

  • FPGA size ~470 LEs incl. FIFO registers

The concept is that interrupts are generated when 1 sample-pair is left in the transmit FIFO and when 4 samples are contained in the receive FIFO. This provides enough time for the processor to react and write the next 4 samples into the transmit FIFO or read the next 4 samples from the receive FIFO.


Has been the standard interface to connect mice and keyboards to PCs for a long time. Replaced by USB since quite a while. Still usefull to connect mice and/or keyboards to low cost FPGA processor systems with no USB. PS/2 is much smaller than USB and the software driver is much simpler.

Some RACORS processor systems have a PS/2 interface to connect a keyboard. Typically the PS/2 function is part of a system integration module that also contains a periodic interrupt timer and interrupt controller. Because of the low speed the PS/2 function is interrupt driven and very small. The hardware is < 100 LEs and consists only of a clock pre-scaler a shift-register, synchronizers and a few status flags.

IR Remote

Because of the low bit rates almost no hardware resources are required for Infra Red remote control receivers. The hardware part consists only of an edge detector. Every edge of the input signal generates an interrupt. Software measures the time from edge to edge using a system timer (available anyway for other purposes). The sequences of time intervals are analyzed to determine the codes sent by the remote control.


Display controllers cover a broad range regarding display type, complexity and resource consumption. Solutons for the following display types are available:

  • Individual LEDs, 7-segment LEDs, LED matrices

  • Monochrome LCD displays, character based or pixel-array

  • TFT color LCD displays


Can be driven easily by GPIO pins. For larger numbers of LEDs multiplex schemes are used to limit the number of GPIO pins required. The multiplex rate can be low. E.g. a 16-channel mux scheme with 100Hz refresh rate requires switching a 1600Hz which can be easily implemented with timer interrupts and software. In many cases dedicated hardware is not required.

Higher mux rates are required to control the brightness of each LED within a LED matrix individually. This can be done by PWM schemes. Because of the strong non-linearity of brightness vs. agerage current high PWM resolution is required. E.g. for 16-channel mux, 100Hz refresh and 8-bit brightness with 12-bit PWM resolution a switch rate of ~6.5 MHz is required. This is too fast for an interrupt and software solution. A hardware block makes more sense.

Monochrome LCD Displays

Display units include a controller/driver chip with a MCU like bus interface with 4-bit or 8-bit data width and a couple of control signals. Most popular are character based displays with built in character generator but bit map displays with individually addressable pixels are also available. The controller/driver chip includes the frame buffer RAM and performs the display refresh. The driving processor writes to the frame buffer RAM only to change the display content.

The interface is slow and can be emulated with GPIO pins. But because of the slow timing using waiting loops between signal transitions would take too much processor time. The implemented solution uses a software state machine and timer interrupts to drive the display with minimal processor load. The time between signal transitions is used by other software tasks.

TFT color LCD Displays

The standard for high resolution, high quality displays. Meanwhile has widely replaced CRT displays. The available modules have been developed for particular use cases and displays and have fixed video time base parameters. They differ in the following properties:

  • Color fomats: 12-bit 4:4:4 RGB, 16-bit 5:6:5 RGB, 16-bit 1:5:5:5 ARGB, 24-bit 8:8:8 RGB

  • Video timing: 640x480, 800x480, 800x600, 1600x1200

  • single plane or dual plane (video/graphics)

  • Video plane with YCrCb to RGB color conversion and programmable, bi-linear re-sizing

  • Pixel clock derived from system clock or separate PLL output

Frame buffers are typically mapped into the system's main memory pool, usually external SDRAM or DDR-SDRAM. The controllers have on-chip pre-fetch buffers with two transfer request priorities based on the fullness of the pre-fetch buffer. This is to make the display a nice memory client that generates high priority requests only when the pre-fetch buffer is almost empty and otherwise uses the system's 'left-over' bandwidth to fill its pre-fetch buffer.


The available modules are designed for RACORS processors and are optimized for particular use cases with fixed parameters regarding cache-size and main memory size. Derivatives with other sizes can be generated with little effort. Two examples are given. The first is a read-only instruction cache and the second is a data-cache.

Instruction-Cache Example

  • 8kBytes, 4-way set associative for 8Mbytes main memory

  • 4 x 512x32-bit cache RAMs

  • 32x64-bit tag RAM with 4x16-bit entries

  • 16x32-bit cache lines, 32 lines per way

  • LRU (Least Recently Used) replacement algorithm

  • FPGA size ~850 LEs + memories

Data-Cache Example

  • 32kBytes, 4-way set associative for 32Mbytes main memory

  • 4 x 2kx32-bit cache RAMs

  • 256x64-bit tag RAM with 4x16-bit entries

  • 8x32-bit cache lines, 256 lines per way

  • Up to three open write lines

  • LRU (Least Recently Used) replacement algorithm

  • modified write-through concept

  • FPGA size ~1600 LEs + memories

Debug Modules

These are debug modules for the RACORS processors which all have the same debug concept. The cores have a debug interface to stop and restart instruction execution and to inject and execute individual instructions. At a minimum the debug modules have interfaces to connect to the processor core and to the debug Host PC.

Most of the available variants are for small FPGA processor systems with on-chip instruction and data memories. These variants have a UART interface to connect to the Host PC and an extra interface to the processors's instruction RAM to download program code, to set break points and for single-step instruction execution. Here is an example feature list of a module for sf32 processors:

  • UART based Debug Host interface with auto baud rate recognition

  • Baud rates up to 230.400 supported (limited by Host RS232 i/f)

  • Interface for up to 256kBytes instruction RAM (64k x 32-bit)

  • FPGA size ~380 LEs

To read or modify processor registers, data memory locations or peripheral registers the processor must be stopped first by a command from the debug host. The actual accesses to memory or registers are then done by instructions injected by the debug Host and executed by the core. Due to the dedicated interface the instruction RAM can be accessed (read and write) at any time also while the processor is running.

Variants with similar features are available for all RACORS processors. Depending on the target system variants of UART based debug modules have these additional features:

  • Interface to accesse external memory (SDRAM, DDR-SDRAM). The interface is connected to a dedicated port of the multi-port external memory arbiter of the system. See also here

  • The UART can be used as a peripheral of the processor, e.g. as terminal/console interface. See also here

System Integration

These modules are peripherals of processor systems that contain some frequently used standard functions or functions that are too small and simple to become a separate peripheral module. They can also be considered glue logic of processor systems. Typical functions are:

  • Periodic interrupt timer

  • Timer/counter for period measurement

  • Interrupt controller

  • PS/2 mouse/keyboard interface

  • IR remote interface

  • GPIO port(s)

  • LED port

A periodic interrupt timer is used as OS tick or to schedule other periodic system events. Typical tick intervals are in the range of 1ms to 10ms.

The RACORS processors all have 16 interrupts and each interrupt has a separate, software defined start address. The system interrupt controller takes all interrupt signals as inputs and selects the next interrupt to be forwarded to the processor based on a fixed or programmable priority scheme.

MP3 Decoder

This is mainly software IP. Only a bit string reader hardware block is used to accelerate reading of arbitrary length bit string from the compressed audio stream. The decoders are implemented as hand optimized assembly code for the eco32bl and eco32dl processors. More information can be found here

MPEG4-ASP Decoder

This IP is a combination of hand optimized assembly code for the eco16il processor and a number of hardware acceleration blocks. More details can be found here