An overview of advanced FPGA architectures for optimized hardware realization of computation intensive algorithms (original) (raw)

Advanced FPGA Architectures for Efficient Implementation of Computation Intensive Algorithms: A State-of-the-Art Review

MASAUM Journal of Computing, 2009

Algorithms used in signal processing, image processing and high performance computing applications are computationally intensive. For efficient implementation of such algorithms with efficient utilization of available resources, an indepth knowledge of the targeted field programmable gate array (FPGA) technology is required. This paper presents a state-ofthe-art review of the architectures and technologies used in modern FPGAs. A case study of most popular and widely used state-of-the-art commercial FPGA technologies from Xilinx and Altera is also presented in this paper. Upcoming three-Dimensional (3D)-FPGA architecture is also discussed.

SANTA BARBARA Design Methodologies and Architectures for Digital Signal Processing on FPGAs

2010

Design Methodologies and Architectures for Digital Signal Processing on FPGAs by Shahnam Mirzaei There has been a tremendous growth for the past few years in the field of embedded systems, especially in the consumer electronics segment. The increasing trend towards high performance and low power systems has forced researchers to come up with innovative design methodologies and architectures that can achieve these objectives and meet the stringent system requirements. Many of these systems perform some kind of streaming data processing that requires the extensive arithmetic calculations. FPGAs are being increasingly used for a variety of computationally intensive applications, especially in the realm of digital signal processing (DSP). Due to rapid increases in fabrication technology, the current generation of FPGAs contains a large number of configurable logic blocks (CLBs) and several other features such as onchip memory, DSP blocks, clock synthesizers, etc. to support implementing a wide range of arithmetic applications. The high non-recurring engineering (NRE) costs and long development time for application specific integrated circuits (ASICs) make FPGAs attractive for application specific DSP solutions. xi Even though the current generation of FPGAs offers variety of resources such as logic blocks, embedded memories or DSP blocks, there is still limitation on the number of these resources being offered on each device. On the other hand, a mixed DSP/FPGA design flow introduces several challenges to the designers due to the integration of the design tools and complexity of the algorithms. Therefore, any attempt to simplify the design flow and optimize the processes for either area or performance is appreciated. This thesis develops innovative architectures and methodologies to exploit FPGA resources effectively. Specifically, it introduces an efficient method of implementing FIR filters on FPGAs that can be used as basic building blocks to make various types of DSP filters. Secondly, it introduces a novel implementation of correlation function (using embedded memory) that is vastly used in image processing applications. Furthermore, it introduces an optimal data placement algorithm for power consumption reduction on FPGA embedded memory blocks. These techniques are more efficient in terms of power consumption, performance and FPGA area and they are incorporated into a number of signal processing applications. A few real life case studies are also provided where the above techniques are applied and significant performance is achieved over software based algorithms. The results of such implementations are also compared with competing methods and trade-offs are discussed. Finally, the challenges and suggestions of integrating such methods of optimizations into FPGA design tools are discussed.

Recent Trends in FPGA Architectures and Applications

2008

Since their introduction in the 1985, field programmable gate arrays (FPGAs) have become increasingly important to the electronics industry. They have the potential for higher performance and lower power consumption than microprocessors and compared with application specific integrated circuits (ASICs), offer lower non-recurrent engineering (NRE) costs, reduced development time, easier debugging and reduced risk. Since modern FPGAs can meet many of the performance requirements of ASICs, they are being increasingly used in their place. In this paper, some recent developments in FPGA devices, platforms and applications are reviewed, with a focus on high performance applications of this technology.

Features, Design Tools, and Application Domains of FPGAs

IEEE Transactions on Industrial Electronics, 2007

In the past two decades advances in Programmable Device technologies, in both the hardware and software arenas, have been extraordinary. The original application of rapid prototyping has been complemented with a large number of new applications that take advantage of the excellent characteristics of the latest devices. High speed, very large number of components, large number of supported protocols and the addition of ready-to-use IP cores, make Programmable Devices a preferred choice of implementation, and even deployment in mass production quantities. This paper surveys the advanced features, design tools and application domains for Field Programmable Gate Arrays (FPGAs). The main characteristics and structure of modern FPGAs is first described to show their versatility and abundance of available design resources. Software resources are also discussed, as they are the main enablers for the efficient exploitation of the design capabilities of these devices. Current application domains are described, like configurable computing, dynamically reconfigurable systems, rapid system prototyping, communication processors and interfaces and signal processing. The paper also presents the authors' prospective view of how FPGAs will evolve to enter new application domains in the future.

Accelerated image processing on FPGAs

IEEE Transactions on Image Processing, 2003

The Cameron project has developed a language called Single Assignment C (SA-C), and a compiler for mapping image-based applications written in SA-C to field programmable gate arrays (FPGAs). This paper tests this technology by implementing several applications in SA-C and compiling them to an Annapolis Microsystems (AMS) WildStar board with a Xilinx XV2000E FPGA. The performance of these applications on the FPGA is compared to the performance of the same applications written in assembly code or C for an 800MHz Pentium III. (Although no comparison across processors is perfect, these chips were the first of their respective classes fabricated at 0.18 microns, and are therefore of comparable ages.) We find that applications written in SA-C and compiled to FPGAs are between 8 and 800 times faster than the equivalent program run on the Pentium III. stream encoding and decoding, real-time biometric (face, retina, and/or fingerprint) recognition, and military aerial and satellite surveillance applications. To meet the demands of these and future applications, we need to develop new techniques for accelerating image-based applications on commercial hardware. Currently, many image processing applications are implemented on general-purpose processors such as Pentiums. In some cases, applications are implemented on digital signal processors (DSPs), and in extreme cases (when economics permit) applications can be implemented in application-specific integrated circuits (ASICs). This paper presents another technology, field programmable gate arrays (FPGAs), and shows how compiler technology can be used to map image processing algorithms onto FPGAs, achieving 8 to 800 fold speed-ups over Pentiums. 2) Field Programmable Gate Arrays Field-programmable gate arrays (FPGAs) are non-conventional processors built primarily out of logic blocks connected by programmable wires, as shown in Figure 1. Each logic block has one or more lookup tables (LUTs) and several bits of memory. As a result, logic blocks can implement arbitrary logic functions (up to a few bits). Logic blocks can be connected into circuits of arbitrary complexity by using the programmable wires to route the outputs of logic blocks to the input of others. FPGAs as a whole can therefore implement circuit diagrams, by mapping the gates and registers onto logic blocks. The achievable clock rate of an FPGA configuration depends on the depth of the computation in terms of logic blocks, and their relative placement, which determines the length of the wires needed to connect them. Modern FPGAs are actually more complex than the discussion above might imply. All FPGAs have special purpose I/O blocks that communicate with external pins. Many have on-chip memory in the form of RAM blocks. Others have multipliers or even complete RISC processors in addition to general purpose logic blocks. In general, however, we will stick to the simplified view of an FPGA as a set of logic blocks connected by programmable wires, because this is the model used by the SA-C compiler. Routine LUTs (%) FFs (%) Slices (%) AddS 9 9 16 Prewitt 18 13 28 Canny 48 45 87 Wavelet 54 69 99 Dilates (8) 56 56 97 Probing (chip 1) 33 39 65 Probing (chip 2) 36 41 72 Probing (chip 3) 42 49 85 to the reconfigurable system's memory, and the results must be returned to the host processor. A typical upload or download time on a PCI bus for a 512x512 8-bit image is about 0.019 seconds. In all of the applications except dilation and probing, the output image has more output pixels than the source image, typically doubling the time to upload the results. For probing, the image must be downloaded three times (once for each FPGA), and since it creates two result images, a total of six uploads are required. When upload and download times are included, the FPGA is slower than a Pentium at scalar addition and wavelet decomposition, as shown in Table 3. The other applications tested are faster on the FPGA, but the performance ratios are very small except for probing. This suggests that FPGAs should only be used as co-processors for very large applications, although they are almost always faster for image computation if data transfer times can be eliminated.

FPGAs -CHRONOLOGICAL DEVELOPMENTS AND CHALLENGES

IAEME Publication, 2021

The Field Programmable Gate Array (FPGA) industry is expanding both in market share and in innovation. The tailored FPGA features make them a better choice to include FPGA in an increasing number of applications in the upcoming years. A constant development of FPGA technology has led to minimize the gap of performance levels between FPGA and Application Specific Integrated Circuit (ASIC). Hence, in recent years, FPGA based platforms are proven more attractive than ASICs since their performance is high in addition to the low cost of the development process and short time to market. Therefore, nowadays, FPGA is highly attractive for a huge range of applications in communications, computing, avionics, security, automotive and consumer electronics. Field Programmable Gate Array industry has shown a steady growth with a market prediction value of USD 9 billion by 2023. Currently, the FPGA companies started growing in reserch areas such as Artifitial Intelligence

A scalable multi-FPGA framework for real-time digital signal processing

Proceedings of SPIE - The International Society for Optical Engineering, 2009

FPGAs have emerged as the preferred platform for implementing real-time signal processing applications. In the sub-45nm technologies, FPGAs offer significant cost and design-time advantages over application-specific custom chips and consume significantly less power than general-purpose processors while maintaining, or improving performance. Moreover, FPGAs are more advantageous than GPUs in their support for control-intensive applications, custom bitprecision operations, and diverse system interface protocols. Nonetheless, a significant inhibitor to the widespread adoption of FPGAs has been the expertise required to effectively realize functional designs that maximize application performance. While there have been several academic and commercial efforts to improve the usability of FPGAs, they have primarily focused on easing the tasks of an expert FPGA designer rather than increasing the usability offered to an application developer. In this work, the design of a scalable algorithmic-level design framework for FPGAs, AlgoFLEX, is described. AlgoFLEX offers rapid algorithmic level composition and exploration while maintaining the performance realizable from a fully custom, albeit difficult and laborious, design effort. The framework masks aspects of accelerator implementation, mapping, and communication while exposing appropriate algorithm tuning facilities to developers and system integrators. The effectiveness of the AlgoFLEX framework is demonstrated by rapidly mapping a class of image and signal processing applications to a multi-FPGA platform.