IP for FPGAs

Focus on resource efficiency rather than ultimate flexibility

Most of the RACORS non-processor IP blocks have been developed for FPGAs. Compared to Std-Cell based SoCs produced in high volume the hardware resources of FPGAs are quite expensive and high flexibility of hardware blocks is not required because the hardware can be changed easily by reconfiguring the FPGA, even in the field.

Flexibility is required only for those features and parameters that need to be changed at run time. Providing programmability for static parameters that are set only once when the system is initialized would consume unneccesary resources.

The RACORS non-processor IPs are resource optimized for use in one particular applicaction. In most cases multiple variants of the same IP function exist optimized for different applications. A good example are DMA controllers. Number of channels, required channel-parameters and features, memory interfaces and client interfaces depend on the use case. A generic architecture designed to cover all potential requirements would be significantly bigger than a use case optimized architecture.

DMA instead of local buffers

Peripherals with data I/O need buffers to store receive/transmit data. The required buffer sizes depend on data rates and real time constraints of the I/O function. With low data rates and relaxed real time requirements like e.g. for PS/2 a few registers in the peripheral block are sufficient. With high data rates and real time contraints like e.g. for Ethernet MAC or SD card Host controllers large buffers mapped into RAMs are required.

RACORS IPs with RAM buffer requirements use DMA instead of local buffer RAMs. In MCU like FPGA systems with no external memory DMA to the data RAM of the processor has major advantages compared to local buffer RAMs connected to peripheral modules:

  • Number and sizes of buffers are assigned to peripheral functions by software and can be adapted dynamically during run time.

  • During times where a peripheral function is not used its buffer space can be used for other purposes.

  • The processor has random access to any buffer location at any time

Peripheral Slave Interface

IP blocks for connection to the peripheral bus of a processor have simple 16-bit or 32-bit peripheral slave interfaces with pipelined, single cycle effective transactions. For example the 32-bit version has the following signals:

input  [31:0] PRDI;       // Peripheral Data In
input  [15:0] PRAD;       // Peripheral Address
input  [ 3:0] PRBS;       // Peripheral Byte Strobes
input         PRWE;       // Peripheral Write Enable
input         PRSL;       // Peripheral Select
output [31:0] PRDO;       // Peripheral Data Out
output        PRRDY;      // Peripheral Ready
output        PRIR;       // Peripheral Interrupt

The timing is the same as for synchronous memories. With no wait states data out is valid in the next cycle after PRSL is asserted. The optional PRRDY output is used to insert wait states for peripherals that cannot complete bus-transactions in a single cycle. An optional PRIR output is present for peripherals that generate interrupts.

Memory Master Interface

For IP blocks that need access to the system's main memory pool (typically external DRAM) point-to-point connections to a memory arbiter module are used. This is more resource efficient and has better performance than using a wide bus with sufficient bandwidth for all connected clients.

The IPs with memory master interfaces that exist so far have 32-bit point-to-point connections to a dedicated port of a central arbiter module. For maximum external DRAM bandwidth efficiency transactions must be aligned on a 16 bytes boundary and must have burst lengths of n * 16 bytes. To enable back-to-back reading and writing split transactions with separate address and data phases are used.

For most client types the buffering of DRAM bursts is done in the arbiter module and not in the client modules. In FPGA designs this enables the use of a single block RAM per read/write directions for the burst buffers of all clients. Exception are display controllers which typically have a large prefetch buffer and at least two read-request priorities that depend on the prefetch buffer fullness.

Depending on IP block requirements memory master interfaces can be read/write, read-only or write-only. An SD card host controller for example has a single read/write interface because SD card accesses are either read or write accesses. An Ethernet MAC controller with full-duplex support requires independent interfaces for receive and transmit packet transfers.

Below is list of the signals of a read/write memory master interface:

output [23:0] MMAD;       // Main Memory Address
output  [3:0] MMBCT;      // Main Memory Burst Count
output        MMTRQ;      // Main Memory Transfer Request
input         MMTAC;      // Main Memory Transfer Acknowledge
output [31:0] MMDO;       // Main Memory Data Out
output        MMDOV;      // Main Memory Data Out Valid
input         MMDOC;      // Main Memory Data Out Consumed
input  [31:0] MMDI;       // Main Memory Data In
input         MMDIV;      // Main Memory Data In Valid
output        MMDIC;      // Main Memory Data In Consumed

The first signal group is for the address phase of split transactions. The second group is for the data phase of write transactions and the last group is for the data phase of read transactions. In some cases write data channels require an extra signal to distinguish between bufferd writes and unbuffered writes. Unbuffered writes are used at the end of longer data transfers to make sure that data have actually been written into main memory and can be read back by another client by the time the transfer is acknowledged.

Synchronous Design where possible

It is generally acknowledged that synchronous designs have major advantages. They are resource efficient because extra synchronization logic is not required. Verification is easier because cycle based simulation can be used.

Most of the RACORS non-processor IPs are fully synchronous designs where all flip-flops are triggered by the positive edge of the same clock. Combinatorial paths are kept short enough to make sure that at least 100MHz clock can be achieved on low end FPGAs. Goal is that processors and their peripherals can be connected together to a fully synchronous circuit.

Exceptions are IPs that require specific clock rates to be compatible with a standardized interface if these clock rates cannot be synchronously generated from or sampled with the system clock. Examples are:

  • Gbit Ethernet MAC with RGMII PHY interfaces @125MHz DDR

  • Display controllers where the pixel clock is generally not related to the system clock

These IPs are split into sub-modules such that the parts that require special clocking are kept as separate modules and the remainder of the design e.g. peripheral and memory interfaces can still be connected to the system clock with no synchronization logic.