Responsible for this page: Oscar Gustafsson , oscarg@isy.liu.se
Page last update: 2009-05-20

[ Go to content ] [ Help ] [ Information about accessability ]
På svenska | A to Z Maps Web overview Contact us
Go to LiU.se

Algorithm-hardware co-design for FPGAs

Funded by CENIIT.

Project leader: Oscar Gustafsson.

Technical background

Field-programmable gate arrays (FPGAs) are integrated circuits that can be programmed to perform arbitrary combinatorial or sequential boolean functions. FPGAs have for a long time been used for prototyping application specific integrated circuits (ASICs) and for low volume products. However, there is a clear trend that ASICs (and programmable DSP processors) are often abandoned in favour of the use of FPGAs in high-volume products as well. While ASICs still provides higher performance and lower power consumption, the increased development time and manufacturing cost which comes with smaller CMOS transistor technologies have opened a large market for FPGAs. The advances in CMOS technologies have also had positive effects on FPGAs with increased capacity, higher performance and lower power consumption. In fact, in many aspects FPGA vendors are driving the CMOS technology development. Further advantages for the FPGA compared to ASICs is a shorter turn-around time and the possibility to reconfigure the FPGA even when installed in a product. Compared to programmable DSP processors FPGAs may provide advantages with higher processing capacity and higher level of integration.

The basic building block of an FPGA is a look-up table (LUT) and a flip-flop, which may be connected to the output of the LUT. By configuring the LUT and connecting LUTs and flip-flops the FPGA can be configured to perform the required operation. Many current FPGAs do not only have LUTs and flip-flops, but also complex integrated building blocks, e.g. multipliers. Some FPGAs even have DSP processor cores integrated on to the chip core, apart from the optimized IP-cores, i.e. files to configure the FPGA, supplied from most vendors. With the increasing sizes these FPGAs forms a foundation for implementing systems-on-chips (SoCs) on a single FPGA.

As the basic building blocks of ASICs and FPGAs are different, they have different cost measures in terms of area, speed, and power consumption. Furthermore, for ASICs in modern process technologies the chip area is often determined by the number of input and output connections, and, hence, as long as it is possible to fit the computational core inside the pad frame, the manufacturing cost is independent of the core area. For FPGAs the consumed area is often more critical as excessive area may require a change to the next larger (and more expensive) FPGA model. For speed it is important to be able to place pipelining stages, i.e., registers that divide the critical propagation path, at suitable cuts to efficiently utilize each LUT and flip-flop. Hence, dividing operations into blocks that can be well fitted into an FPGA has been an active research area. For example, it has been shown that bit-serial processing may be advantageous as a bit-serial adder or subtractor fit well in a LUT plus flip-flop. On the other hand, recent FPGAs have an extra block for the fast carry propagation required in bit-parallel arithmetic. This will improve the results for bit-parallel implementation compared with e.g. redundant arithmetic implementation, which is commonly used in ASICs.

Project description

The aim of the project is to develop cost measures for implementation of DSP algorithms in FPGAs in terms of area and speed trade-offs. These cost measures can then be used to optimize DSP algorithms and their corresponding FPGA implementation simultaneously.

The focus of the project will be on algorithms for digital filtering, including sample rate changes, transforms such as the discrete Fourier transform (DFT, often implemented using an FFT algorithm) and discrete cosine transform (DCT), and modulation and demodulation techniques.

It has recently been shown that the use of multiple constant multiplication (MCM) techniques, i.e., expressing the multiplications using shifts and additions/subtractions and utilizing the redundancy between coefficients that are multiplied by the same input data, are efficient when implementing FIR digital filters in FPGAs, often outperforming the vendors tools. This is an area where our research group has a long experience and where we recently have developed one of the best, if not the best, algorithms in terms of complexity. It is expected that it is possible to further optimize the results by considering bit-level optimizations. Furthermore, while most work so far has considered single rate FIR filters, it has been shown that similar approaches can be used for sample rate changes. Some work has also been done on color space converters for digital video using these techniques.

As the previous approaches compute one output sample per clock cycle (for single-rate FIR filters) they are suitable for high sample rate applications. However, for lower sample rates it is often possible to find time-multiplexed architectures that has a lower area complexity. The straightforward approach is to use one of the built-in multipliers if they are available and have suitable wordlength. Another approach related to the MCM problem is to introduce multiplexers among the shifts and additions/subtractions to reduce the circuit area. This may also be advantageous when combining several FIR filters where only one should be used, such as in multi-standard receivers and transmitters.

Another considered area is high-speed implementation of arithmetic functions for use in prototyping ASICs. As FPGAs are inherently slower it is important to find fast enough implementations in FPGAs. For communication systems this may include e.g. sine and cosine functions. Recently, we have proposed a number of methods for elementary function approximation. However, these have been aimed at ASIC implementation and therefore needs to be thoroughly evaluated and possibly modified for FPGA implementation.

Cooperation

Publications

Related publications

  • O. Gustafsson and A. G. Dempster, "On the use of multiple constant multiplication in polyphase FIR filters and filter banks," in Proc. Nordic Signal Processing Symp., Espoo, Finland, June 9-11, 2004, pp. 53-56.
  • K. Johansson, O. Gustafsson, and L. Wanhammar, "A detailed complexity model for multiple constant multiplication and an algorithm to minimize the complexity," in Proc. European Conf. Circuit Theory Design, Cork, Ireland, Aug. 29-Sept. 1, 2005.
  • O. Gustafsson and H. Johansson "Efficient implementation of FIR filter based rational sampling rate converters using constant matrix multiplication," Asilomar Conf. Signals, Syst., Comp., Pacific Grove, CA, Oct. 29-Nov. 1, 2006.
  • K. Holm and O. Gustafsson, "'Low-complexity and low-power color space conversion for digital video," IEEE Norchip Conf., Linköping, Sweden, Nov. 20-21, 2006.
  • O. Gustafsson, K. Johansson, H. Johansson, and L. Wanhammar, "Implementation of polyphase decomposed FIR filters for interpolation and decimation using multiple constant multiplication techniques," Asia-Pacific Conf. Circuits Syst., Singapore, Dec. 4-7, 2006.
  • O. Gustafsson, "A difference based adder graph heuristic for multiple constant multiplication problems," IEEE Int. Symp. Circuits Syst., New Orleans, LA, May 27-30, 2007.