Introduction

The eco16 family of 16-bit microprocessors is targeted at embedded control and DSP applications. The family defines three ISAs:

  • eco16b: (base ISA) for general purpose control and computing with light DSP enhancements

  • eco16d: (DSP extension ISA) for 16-bit DSP applications

  • eco16i: (imaging extension) 4-way SIMD DSP extension for imaging and video applications

Resources

MPEG4-ASP_decoder

Details

Implementations

eco16bl eco16dl eco16il

ISAs (Instruction Set Architectures)

Base ISA

The eco16b is a 16-bit microprocessor architecture for embedded control and compute applications. Main target are applications with high performance and low data address space requirements.

The eco16b is a load/store architecture. All operands of computation instructions are either constants or contained in registers. Load/store instructions are used to transfer operands between registers and memory.

The eco16b base ISA defines a generic and complete instruction set for efficient high level language compiler implementations.

Features

  • Harvard architecture with separate instruction and data buses

  • 4 MBytes instruction address space

  • 64kBytes data address space

  • Variable length instruction coding with 16-bit and 32-bit opcodes

  • 16 interrupts (4 non-maskable, 12 maskable) with programmable start addresses

  • 16 x 16-bit general purpose registers

  • Separate 16-bit Stack Pointer

  • Native support for 8-bit and 16-bit signed and unsigned integer data types

  • 32-bit load/store instructions

  • Higher precision integer and float data types supported by multi-instruction sequences

  • Rich set of load/store addressing modes, including indirect with scaled index and update addressing

  • Little endian byte ordering

  • Load/store multiple instructions for code efficient copying and function prologue/epilogue

  • Bit manipulation & test instructions: set, clear, toggle & test

  • Bit field instructions: make, insert, signed/unsigned extract, 1-16 bits width, 0-15 bits offset

  • 16*16 multiply with high or low word result

  • Flexible debug concept with application specific debug modules

DSP enhancements

  • MAC (Multiply & Accumulate) and MAS (Multiply & Subtract) instructions

  • MAC/MAS instructions with 1-bit or 2-bit left-shift before accumulation

  • Direct and indirect auto update addressing modes for load/store instructions

  • 2 x 16-bit loop counters

DSP ISA extension

The eco16d is an extension of the eco16b base ISA and is fully backward compatible with the eco16b. Main target are 16-bit DSP applications including low-cost audio. The dsp extension adds only a few special registers but no general purpose registers to the programming model of the eco16b ISA. Main additions are addressing modes with memory source operands and special add/subtract instructions that improve the performance and precision of DSP algorithms. A 2-entry accumulation extension cache (patented) enables 32-bit accumulation precision. With the same pipeline architecture and computation resources implementations of the eco16d processors are only slightly larger than base ISA implementations.

The eco16d deviates from the puristic load/store architecture of the eco16b. Some performance critical DSP instructions have addressing modes with one source operand in memory

The instructions with additional addressing modes and the additional, special add/subtract instructions can't be used easily from high level languages. The targeted use model suggests hand optimized assembler routines for performance critical DSP functions using the special addressing modes and instructions. The less performance critical higher layers and control code is written in C and compiled to the eco16b base ISA instruction set.

Extension Features

  • Special add/subtract instructions with 1-bit left-shift of one source operand before the add/subtract operation

  • Additional addressing modes with one memory source operand for MAC and special add/subtract instructions

  • Three additional memory addressing modes: indirect with offset or direct/indirect update

  • Dual entries accumulation extension cache (patented) for sum-of-products calculation with 32-bit precision

  • Clip instruction with programmable low/high boundaries

  • 32-bit iterative divide instruction

Imaging ISA extension

The eco16i is an extension of the eco16b base ISA and is fully backward compatible with the eco16b. Main target applications are image-processing, digital video and graphics. The ISA extension is of 4-way SIMD DSP type and has a separate register file. In typical implementations the main share of the hardware resources is spent for the extension. The overhead of the base ISA (used for control code) is relatively low so that the overall resources are efficiently used for video and image processing.

The eco16i deviates from the puristic load/store architecture of the eco16b. Some performance critical extension instructions have one or two source operands in memory.

The extension instructions use operand types and perform operations that are difficult to support from high level languages. The targeted use model is to implement libraries with hand optimized assembler routines for the video and image processing functions. The control layers above can be written in C and compiled to the eco16b instruction set.

Extension Features

  • 4-way SIMD DSP extension with 4 x 32-bit accumulation

  • 12 x 64-bit physical vector registers used as 4 x 16-bit vectors

  • Individual component addressing of vector registers

  • Vector permutation instruction

  • Dual memory source operands instructions -> no extra cycles to load sample or coefficient operands

  • 8 x 4-bit vector component enable flags

  • Support of 8-bit/16-bit scalar and 4x8-bit/4x16-bit vector memory operands

  • Support for byte-vector access in interleaved CrCb chroma buffers

  • 4 x 4 operand array transpose instructions

  • 32-bit iterative divide instruction

Implementations

eco16bl

The eco16bl light implementation is focused on high performance and moderate resource consumption. A 3 read-ports register-file makes sure that almost all instructions potentially can be executed in one cycle effective. A decoupled unit for instruction fetch and flow-instruction execution together with separate execution units for computation and for load/store instructions enable high pipeline throughput.

With 16/32-bit variable length instruction coding and a 32-bit instruction bus the average instruction fetch rate per cycle is > 1 and fills the decoupling buffer between the instruction fetch unit and the computation and load/store execution units. A filled buffer enables zero cycle effective execution time for flow change instructions (branch, jump, return).

Branch speculation, pre-fetching and conditional instructions minimize performance penalties of program flow changes. Average IPCs strongly depend on instruction sequences e.g. branches and operand dependencies. Performance optimized sequences can get close to an IPC of 1, highly optimized sequences with effective zero cycle flow change instructions can achieve IPCs > 1.

Features

  • Focus on performance

  • Size: ~5400 LEs on FPGAs

  • 32-bit/16-bit instruction/data buses

  • Register-file with 3/2 read/write ports

  • Max clock ~110Mhz on low end FPGAs

  • Average IPC (Instr. per Cycle) ~0.8

  • Single cycle effective MAC instructions with two register source operands

  • Barrel-Shifter, single cycle effective shift execution

eco16dl

The eco16dl light implementation has the same basic pipeline architecture as the eco16bl. Only difference is an extra execution unit for multiply/MAC instructions. To support this unit the register-file is upgraded to 4/3 read/write ports.

Features

  • Size: ~6500 LEs on FPGAs

  • 32-bit/16-bit instruction/data buses

  • Register-file with 4/3 read/write ports

  • Max clock ~100Mhz on low end FPGAs

  • Average IPC (Instr. per Cycle) ~0.9

  • Single cycle effective MAC instructions with one register and one memory source operand

  • Barrel-Shifter, single cycle effective shift execution

  • Non-blocking, iterative hardware divide

eco16il

The eco16il light implementation has the same basic pipeline architecture as the eco16bl. With the dual data bus architecture and auto-update addressing for both memory source operands a register-file 5/5 read/write ports is required.

Features

  • Size: ~13700 LEs on FPGAs

  • 32-bit/2x64-bit instruction/data buses

  • Register-file with 5/5 read/write ports

  • Max clock ~100Mhz on low end FPGAs

  • Average IPC (Instr. per Cycle) ~0.9

  • Single cycle effective MAC instructions with both source operands in memory -> 4 x MAC per cycle

  • Barrel-Shifter, single cycle effective shift execution

  • Non-blocking, iterative hardware divide