A Survey on High-Throughput Non-Binary LDPC Decoders: ASIC, FPGA, and GPU Architectures (original) (raw)
Related papers
An Improved Throughput for Non-Binary Low-Density-Parity-Check Decoder
Computer Engineering and Applications, 2020
Low-Density-Parity-Check (LDPC) based error control decoders find wide range of application in both storage and communication systems, because of the merits they possess which include high appropriateness towards parallelization and excellent performance in error correction. Field-Programmable Gate Array (FPGA) has provided a robust platform in terms of parallelism, resource allocation and excellent performing speed for implementing non-binary LDPC decoder architectures. This paper proposes, a high throughput LDPC decoder through the implementation of fully parallel architecture and a reduction in the maximum iteration limit, needed for complete error correction. A Galois field of eight was utilized alongside a non-uniform quantization scheme, resulting in fewer bits per Log Likelihood Ratio (LLR) for the implementation. Verilog Hardware Description Language (HDL) was used in the description of the non-binary error control decoder. The propose decoder attained a throughput of 10Gbps...
Analysis and Design of Cost-Effective, High-Throughput LDPC Decoders
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
This paper introduces a new approach to costeffective, high-throughput hardware designs for Low Density Parity Check (LDPC) decoders. The proposed approach, called Non-Surjective Finite Alphabet Iterative Decoders (NS-FAIDs), exploits the robustness of message-passing LDPC decoders to inaccuracies in the calculation of exchanged messages, and it is shown to provide a unified framework for several designs previously proposed in the literature. NS-FAIDs are optimized by density evolution for regular and irregular LDPC codes, and are shown to provide different trade-offs between hardware complexity and decoding performance. Two hardware architectures targeting high-throughput applications are also proposed, integrating both Min-Sum (MS) and NS-FAID decoding kernels. ASIC post synthesis implementation results on 65nm CMOS technology show that NS-FAIDs yield significant improvements in the throughput to area ratio, by up to 58.75% with respect to the MS decoder, with even better or only slightly degraded error correction performance.
Efficient high throughput decoding architecture for non-binary LDPC codes
International journal of engineering and technology, 2018
This article, deals with efficient trellis inbuilt decoding architecture for non-binary Linear Density Parity Check (LDPC) codes. In this decoder, a bidirectional recursion is embedded to enhance the layered scheduling and decoding latency, which in turn is used to minimize the number of iterations compared to existing techniques. Consequently, it is necessary to increase the throughput for improving the efficiency of the system. In addition, a compression technique is implemented for reducing the requirements of memory and the area. Trellis based decoder was used to reinforce the check node processing. The proposed decoder for LDPC codes yields high throughput when compared to other similar decoders presented in preceding works. The designed architecture was implemented using Cadence Virtuoso software. This decoder provides a throughput of about 39.21 Mb/s at clock frequency of 190MHz.
Area, throughput, and energy-efficiency trade-offs in the VLSI implementation of LDPC decoders
2011
Low-density parity-check (LDPC) codes are key ingredients for improving reliability of modern communication systems and storage devices. On the implementation side however, the design of energyefficient and high-speed LDPC decoders with a sufficient degree of reconfigurability to meet the flexibility demands of recent standards remains challenging. This survey paper provides an overview of the stateof-the-art in the design of LDPC decoders using digital integrated circuits. To this end, we summarize available algorithms and characterize the design space. We analyze the different architectures and their connection to different codes and requirements. The advantages and disadvantages of the various choices are illustrated by comparing state-of-the-art LDPC decoder designs.
FLEXIBLE NON-BINARY LDPC DECODING ON FPGAS
Despite their ability to reach within the channel capacity in shorter codeblock lengths, non-binary LDPC codes have a higher decoding complexity that poses non-trivial barriers to their generalized adoption at algorithmic and compute-intensive levels. In this work, we propose a programmable FFT-SPA decoder that delivers high decoding throughput at low power consumptions, while retaining a design flexibility at the system level which surpasses typical VLSI descriptions, guaranteeing quick retargeting and prototyping of variants of this family of signal processing algorithms with effective decoding throughputs of up to 1 Mbit/s and potential throughputs of dozens of Mbit/s.
Low-power VLSI decoder architectures for LDPC codes
Proceedings of the 2002 international symposium on Low power electronics and design - ISLPED '02, 2002
Iterative decoding of low-density parity check codes (LDPC) using the message-passing algorithm have proved to be extraordinarily effective compared to conventional maximumlikelihood decoding. However, the lack of any structural regularity in these essentially random codes is a major challenge for building a practical low-power LDPC decoder. In this paper, we jointly design the code and the decoder to induce the structural regularity needed for a reduced-complexity parallel decoder architecture. This interconnect-driven code design approach eliminates the need for a complex interconnection network while still retaining the algorithmic performance promised by random codes. Moreover, we propose a new approach for computing reliability metrics based on the BCJR algorithm that reduces the message switching activity in the decoder compared to existing approaches. Simulations show that the proposed approach results in power savings of up to 85.64% over conventional implementations. Categories and Subject Descriptors B.7.1 [Types and Design Styles]: VLSI; E.4 [Coding and Information Theory]: Error control codes However, in order to achieve desired power and throughputs for current applications (e.g., > lMbps in 3G wireless systems, > lGbps in magnetic recording systems), fully parallel and pipelined iterative decoder architectures are needed. Compared to turbo codes, LDPC codes enjoy a significant advantage in terms of computational complexity and are known to have a large amount of inherent parallelism [3]. However, the randomness of LDPC codes results in stringent memory requirements that amount to an order of magnitude increase in complexity compared to those for turbo codes. A direct approach to implementing a parallel decoder architecture would be to allocate, for each node or cluster of nodes in the graph defining the LDPC code, a function unit for computing the reliability messages, and employ an interconnection network to route messages between function nodes (see Fig.1). A major problem with this approach is that the interconnection networks require complex wiring to perform global routing of messages and hence must be deeply pipelined (e.g., bidirectional multilayered networks in [4] and 4096-input multiplexers per function unit in [5]). Moreover, the randomness in the pattern of communicating messages leads to routing and congestion problems on the networks which require extensive buffering to resolve.
Hardware Implementation of LDPC Decoders
IEEE Transactions on Circuits and Systems I-regular Papers, 2009
Low density parity check (LDPC) codes over GF(2m) are an extension of binary LDPC codes with significantly higher performance. However, the computational complexity of the encoders/decoders for these codes is also higher. Hence there is a substantial lack of hardware implementations for LDPC over GF(2m) codes. This paper proposes a novel variation of the belief propagation algorithm for GF(2m) LDPC codes. The new algorithm results in a reduced hardware complexity when implemented in VLSI. The serial architecture of the novel decoding algorithm and two other algorithms for LDPC over GF(2m) are implemented on an FPGA. The results show that the proposed algorithm has substantial advantages over existing methods. We show that the implementation of LDPC over GF(2m) decoder is feasible for short to medium length codes. The additional complexity of the decoder is balanced by the superior performance of GF(2m) LDPC codes.
A REDUCED-COMPLEXITY, SCALABLE IMPLEMENTATION OF LOW DENSITY PARITY CHECK (LDPC) DECODER
In this paper, a reduced-complexity, scalable implementation of LDPC decoder is presented. The decoder architecture in this paper is an improved version of . The new architecture makes the implementation of multiple code rates, multiple block sizes and multiple standards LDPC decoder very straightforward. As an example, we implemented a parameterized decoder that supports the LDPC code in IEEE 802.16e standard, which requires code rates of 1/2, 2/3 and 3/4, with block sizes varying from 576 to 2304. The decoder is synthesized with Texas Instruments' 90 nm ASIC process technology, with a target operation frequency of 100 MHz, 15 decoding iterations, the maximum data rate is up to 256 Mbps.
Tradeoff analysis and architecture design of high throughput irregular LDPC decoders
Circuits and Systems I Regular Papers Ieee Transactions on, 2006
Low density parity check (LDPC) codes have attracted significant research interest thanks to their excellent error-correcting abilities and high level of processing parallelism. Recent architecture designs of LDPC decoders are mostly based on the block-structured parity check matrices (PCMs) composed of horizontal layers or component codes. Irregular block-structured LDPC codes have very good error-correcting performance and they are suitable for modular semi-parallel architecture implementations. In order to achieve high decoding throughput (hundreds of MBits/sec and above) for multiple code rates and moderate codeword lengths, different levels of processing parallelism are possible. The pipelining of multiple horizontal layers of PCM is combined with different levels of memory access parallelism leading to a family of structured decoders. We provide a general trade-off analysis for selection between various decoder architectures. The goal is to find an optimal balance between hardware complexity and decoding throughput while preserving error-correcting performance. As a proof of concept, non-pipelined and pipelined LDPC decoders are prototyped for an FPGA and synthesized for an ASIC design. Both architectures support a broad range of code rates and codeword sizes with small hardware overhead while achieving high decoding throughput.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2003
A high-throughput memory-efficient decoder architecture for low-density parity-check (LDPC) codes is proposed based on a novel turbo decoding algorithm. The architecture benefits from various optimizations performed at three levels of abstraction in system design-namely LDPC code design, decoding algorithm, and decoder architecture. First, the interconnect complexity problem of current decoder implementations is mitigated by designing architecture-aware LDPC codes having embedded structural regularity features that result in a regular and scalable message-transport network with reduced control overhead. Second, the memory overhead problem in current day decoders is reduced by more than 75% by employing a new turbo decoding algorithm for LDPC codes that removes the multiple checkto-bit message update bottleneck of the current algorithm. A new merged-schedule merge-passing algorithm is also proposed that reduces the memory overhead of the current algorithm for low to moderate-throughput decoders. Moreover, a parallel soft-input-soft-output (SISO) message update mechanism is proposed that implements the recursions of the Balh-Cocke-Jelinek-Raviv (BCJR) algorithm in terms of simple "max-quartet" operations that do not require lookup-tables and incur negligible loss in performance compared to the ideal case. Finally, an efficient programmable architecture coupled with a scalable and dynamic transport network for storing and routing messages is proposed, and a full-decoder architecture is presented. Simulations demonstrate that the proposed architecture attains a throughput of 1.92 Gb/s for a frame length of 2304 bits, and achieves savings of 89.13% and 69.83% in power consumption and silicon area over state-of-the-art, with a reduction of 60.5% in interconnect length. Index Terms-Low-density parity-check (LDPC) codes, Ramanujan graphs, soft-input soft-output (SISO) decoder, turbo decoding algorithm, VLSI decoder architectures. I. INTRODUCTION T HE PHENOMENAL success of turbo codes [1] powered by the concept of iterative decoding via message-passing has rekindled the interest in low-density parity-check (LDPC) codes which were first discovered by Gallager in 1961 [2]. Recent breakthroughs to within 0.0045 dB of AWGN-channel capacity were achieved with the introduction of irregular LDPC codes in [3], [4] putting LDPC codes on par with turbo codes. However, efficient hardware implementation techniques of turbo decoders have given turbo codes a clear advantage Manuscript