Introduction

The sf20 is a 16-bit microprocessor architecture for embedded control & computing applications with limited code size requirements. Main focus of the ISA definition is on high clock rates and small core implementations. The sf20 is specifically targeted at FPGAs. With 20-bit fixed length instruction coding it provides excellent code density for an architecture with 16 general purpose registers and it exploits the fact that efficient 20-bit wide block RAMs are available in most FPGA architectures. Two ISAs are available:

  • sf20b: Base ISA for general purpose control & computing

  • sf20d: DSP ISA with extensions for DSP applications

Resources

Quick_Reference_Guide

Evaluation_Package (contains free sf20bu)

Details

ISAs

Base DSP

Implementations

sf20bu sf20bl sf20dl

ISAs (Instruction Set Architectures)

Base ISA

The sf20b is a load/store architecture. All operands of computation instructions are either constants or contained in registers. Load/store instructions are used to transfer operands between registers and memory.

The sf20b base ISA defines a generic and complete instruction set for efficient high level language compiler implementations.

Features

  • Harvard architecture with separate instruction and data buses

  • 64k x 20-bit instruction address space

  • 64kBytes data address space

  • Fixed length 20-bit instruction coding

  • 16 interrupts with programmable start addresses

  • 16 x 16-bit general purpose registers and 8 special registers

  • Native support for 8-bit and 16-bit signed and unsigned integer data types

  • Higher precision integer and float data types supported by multi-instruction sequences

  • Rich set of load/store addressing modes, including indirect with index and update addressing

  • Little endian byte ordering

  • Load/store multiple instructions for code efficient copying and function prologue/epilogue

  • Bit manipulation & test instructions: set, clear, toggle & test

  • 16*16 multiply with either 16-bit high word or 16-bit low word results

  • Flexible debug concept with application specific debug modules

Resources

ISA_Reference_Manual

DSP extension ISA

The sf20d deviates from the puristic load/store architecture of the sf20b. Some performance critical DSP instructions have addressing modes with one source operand in memory

Extension Features

  • multiply high instructions with optional left-shift and with one source operand in memory

  • multiply and accumulate instruction with optional left-shift and with one source operand in memory

  • multiply and subtract instruction with optional left-shift and with one source operand in memory

  • Clip, clip with left-shift and clip to unsigned byte instructions

  • Eight additional special registers

  • Four addressing modes for memory source operands of DSP instructions

  • Four registers with accumulation extension (patented) for sum-of-products calculations with 32-bit precision

Implementations

sf20bu

The sf20bu ultra light implementation is focused on low resource comsumption. A dual-ported RAM implements the general-purpose and most special registers. Instructions with two register source operands have 2 cycles effective execution time. A combined ALU/load/store unit has shared resources for computation and load/store instructions. Load/store instructions have 2 cycles effective execution time. Shift instructions are executed iteratively with 1 bit per cycle. Together this leads to an average IPC of ~0.5 which is still good enough for many embedded control applications.

Features

  • Focus on low resource consumption

  • Size: ~1500 LEs + 1 block RAM on FPGAs

  • 20-bit/16-bit instruction/data buses

  • Register-file with 1/1 read/write ports, can be implemented as dual-ported RAM

  • Average IPC (Instr. per Cycle) of ~0.5

  • Max clock ~120Mhz on low end FPGAs

  • Iterative shift execution, 1 bit per clock

Resources

IMA_Reference_Manual

sf20bl

The sf20bl light implementation is focused on high performance and moderate resource consumption. A 3 read-ports register-file makes sure that all instructions potentially can be executed in one cycle effective. A decoupled unit for instruction fetch and flow-instruction execution together with separate execution units for computation and for load/store instructions enable high pipeline throughput. Branch speculation, loop cache and conditional instructions minimize performance penalties of program flow changes. Average IPCs strongly depend on instruction sequences e.g. branches and operand dependencies. Performance optimized sequences can get close to an IPC of 1, with the loop cache loop execution with IPCs > 1 is possible.

Features

  • Focus on performance

  • Size: ~2750 LEs on FPGAs

  • 20-bit/16-bit instruction/data buses

  • Register-file with 3/2 read/write ports

  • Max clock ~130Mhz on low end FPGAs

  • Average IPC (Instr. per Cycle) ~0.8

  • Decoupled pipeline for instruction fetch and flow-instruction execution

  • Branch speculation

  • Separate execution pipelines for computation and load/store instructions

  • Barrel-Shifter, single cycle effective shift execution

  • Loop Cache, zero-cycle loop-back branch from 2nd iteration

Resources

IMA_Reference_Manual

sf20dl

The sf20dl light implementation has the same basic pipeline architecture as the sf20bl. Only difference is an extra execution unit for multiply/MAC instructions. To support this unit the register-file is upgraded to 4/3 read/write ports.

Features

  • Focus on DSP performance and precision

  • Size: ~3300 LEs on FPGAs

  • 20-bit/16-bit instruction/data buses

  • Register-file with 4/3 read/write ports

  • Max clock ~130Mhz on low end FPGAs

  • Average IPC (Instr. per Cycle) ~0.9

  • Decoupled pipeline for instruction fetch and flow-instruction execution

  • Branch speculation

  • Separate execution pipelines for computation, load/store and miltiply instructions

  • Single cycle effective MAC instructions with one register and one memory source operand

  • Barrel-Shifter, single cycle effective shift execution

  • Loop Cache, zero-cycle loop-back branch from 2nd iteration