The sf32 family of 32-bit microprocessors is targeted at applications where high performance and small core sizes are most important. Fixed length 32-bit instruction coding enables low decoding complexity which results in high clock rates and small core foot prints. Multiple ISAs are available to address control & computing as well as DSP applications with optimized solutions.

  • sf32b: Base ISA for general purpose control & computing

  • sf32d: DSP ISA with extensions for high precision DSP & audio applications



Evaluation_Package (contains free sf32bu)



Base DSP


sf32bu sf32bl sf32dl

ISAs (Instruction Set Architectures)

Base ISA features

The sf32b is a 32-bit microprocessor architecture for embedded control & computing applications. Main focus of the ISA definition is on high clock rates and small core implementations.

The sf32b is a load/store architecture. All operands of computation instructions are either constants or contained in registers. Load/store instructions are used to transfer operands between registers and memory.

The sf32b defines a generic and complete instruction set for efficient high level language compiler implementations.


  • Harvard architecture with separate instruction and data buses

  • 4GBytes instruction address space

  • 4GBytes data address space

  • Fixed length 32-bit instruction coding

  • 16 interrupts with programmable start addresses

  • 24 x 32-bit general purpose registers plus 7 special registers

  • System (protected) and application operation modes

  • Native support for 8-bit, 16-bit and 32-bit signed and unsigned integer data types

  • Higher precision integer and float data types supported by multi-instruction sequences

  • Rich set of load/store addressing modes, including indirect with index and update

  • Little endian byte ordering

  • Load/store multiple instructions for code efficient copying and function prologue/epilogue

  • Bit manipulation & test instructions: set, clear, toggle & test

  • 32*32 multiply with either 32-bit high word or 32-bit low word results

  • Instructions for endianess conversion

  • Flexible debug concept with application specific debug modules



DSP extension ISA

The sf32d is an extension of the sf32b base ISA and is fully backward compatible with the sf32b. Main target are 32-bit DSP in general and specifically audio applications. The DSP extension adds only a few special registers but no general purpose registers to the programming model of the sf32b ISA. Main additions are addressing modes with memory source operands and special add/subtract instructions that improve the performance of audio and general DSP algorithms. With the same pipeline architecture and computation resources implementations of sf32d processors are only slightly larger than base ISA implementations.

The sf32d deviates from the puristic load/store architecture of the sf32b. Some performance critical extension instructions have one source operand in memory.

The instructions with additional addressing modes and the additional, special add/subtract instructions can't be used easily from high level languages. The targeted use model suggests hand optimized assembler routines for performance critical DSP functions using the special addressing modes and instructions. The less performance critical higher layers and control code is written in C and compiled to the sf32b base ISA instruction set.

Extension Features

  • Multiply-high and MAC (Multiply & Accumulate) instructions with one source operand in memory

  • Optional 1-bit or 2-bit left-shift before accumulation

  • Multiple indirect addressing modes for memory source operands with offset, index and auto-update

  • Add/sub instructions with preceding left-shift of one source operand

  • 32-bit iterative Divide instruction

  • Clip to signed 16-bit, clip to signed maximum and clip to unsigned byte instructions

  • Dual entries accumulation extension cache (patented) for sum-of-products calculation with 64-bit precision



The sf32bu ultra light implementation is focused on low resource comsumption. A dual-ported RAM implements the general-purpose and most special registers. Instructions with two register source operands have 2 cycles effective execution time. A combined ALU/load/store unit has shared resources for computation and load/store instructions. Load/store instructions have 2 cycles effective execution time. Shift instructions are executed iteratively with 1 bit per cycle. Together this leads to an average IPC of ~0.5 which is still good enough for many embedded control & compute applications.


  • Focus on low resource consumption

  • Size: ~2300 LEs + 1 block RAM on FPGAs

  • 32-bit/32-bit instruction/data buses

  • Register-file with 1/1 read/write ports, can be implemented as dual-ported RAM

  • Average IPC (Instr. per Cycle) of ~0.5

  • Max clock ~110Mhz on low end FPGAs

  • Iterative shift execution, 1 bit per clock




The sf32bl light implementation is focused on high performance and moderate resource consumption. A 3 read-ports register-file makes sure that all instructions potentially can be executed in one cycle effective. A decoupled unit for instruction fetch and flow-instruction execution together with separate execution units for computation and for load/store instructions enable high pipeline throughput. Branch speculation, loop cache and conditional instructions minimize performance penalties of program flow changes. Average IPCs strongly depend on instruction sequences e.g. branches and operand dependencies. Performance optimized sequences can get close to an IPC of 1, with the loop cache loop execution with IPCs > 1 is possible.


  • Focus on performance

  • Size: ~5600 LEs on FPGAs

  • 32-bit/32-bit instruction/data buses

  • Register-file with 3/2 read/write ports

  • Max clock ~110Mhz on low end FPGAs

  • Average IPC (Instr. per Cycle) ~0.8

  • Decoupled unit for instruction fetch and flow-instruction execution

  • Branch speculation

  • Separate execution pipelines for computation and load/store instructions

  • Barrel-Shifter, single cycle effective shift execution

  • Loop Cache, zero-cycle loop branch from 2nd iteration




The sf32dl light implementation has the same basic pipeline architecture as the sf32bl. Only difference is an extra execution unit for multiply/MAC instructions. To support this unit the register-file is upgraded to 4/3 read/write ports.


  • Focus on DSP performance and precision

  • Size: ~7500 LEs on FPGAs (estimated)

  • 32-bit/32-bit instruction/data buses

  • Register-file with 4/3 read/write ports

  • Max clock ~100Mhz on low end FPGAs

  • Average IPC (Instr. per Cycle) ~0.9

  • Barrel-Shifter, single cycle effective shift execution

  • Loop Cache, zero-cycle loop-back branch from 2nd iteration

  • Single cycle effective MAC instructions with one register and one memory source operand

  • Non-blocking divide