Efficient high throughput decoding architecture for non-binary LDPC codes (original) (raw)

A REDUCED-COMPLEXITY, SCALABLE IMPLEMENTATION OF LOW DENSITY PARITY CHECK (LDPC) DECODER

In this paper, a reduced-complexity, scalable implementation of LDPC decoder is presented. The decoder architecture in this paper is an improved version of . The new architecture makes the implementation of multiple code rates, multiple block sizes and multiple standards LDPC decoder very straightforward. As an example, we implemented a parameterized decoder that supports the LDPC code in IEEE 802.16e standard, which requires code rates of 1/2, 2/3 and 3/4, with block sizes varying from 576 to 2304. The decoder is synthesized with Texas Instruments' 90 nm ASIC process technology, with a target operation frequency of 100 MHz, 15 decoding iterations, the maximum data rate is up to 256 Mbps.

Performance Analysis and Implementation for Nonbinary Quasi-Cyclic LDPC Decoder Architecture

International Journal of Wireless & Mobile Networks, 2014

Non-binary low-density parity check (NB-LDPC) codes are an extension of binary LDPC codes with significantly better performance. Although various kinds of low-complexity iterative decoding algorithms have been proposed, there is a big challenge for VLSI implementation of NBLDPC decoders due to its high complexity and long latency. In this brief, highly efficient check node processing scheme, which the processing delay greatly reduced, including Min-Max decoding algorithm and check node unit are proposed. Compare with previous works, less than 52% could be reduced for the latency of check node unit. In addition, the efficiency of the presented techniques is design to demonstrate for the (620, 310) NB-QC-LDPC decoder.

High-throughput LDPC decoders

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2003

A high-throughput memory-efficient decoder architecture for low-density parity-check (LDPC) codes is proposed based on a novel turbo decoding algorithm. The architecture benefits from various optimizations performed at three levels of abstraction in system design-namely LDPC code design, decoding algorithm, and decoder architecture. First, the interconnect complexity problem of current decoder implementations is mitigated by designing architecture-aware LDPC codes having embedded structural regularity features that result in a regular and scalable message-transport network with reduced control overhead. Second, the memory overhead problem in current day decoders is reduced by more than 75% by employing a new turbo decoding algorithm for LDPC codes that removes the multiple checkto-bit message update bottleneck of the current algorithm. A new merged-schedule merge-passing algorithm is also proposed that reduces the memory overhead of the current algorithm for low to moderate-throughput decoders. Moreover, a parallel soft-input-soft-output (SISO) message update mechanism is proposed that implements the recursions of the Balh-Cocke-Jelinek-Raviv (BCJR) algorithm in terms of simple "max-quartet" operations that do not require lookup-tables and incur negligible loss in performance compared to the ideal case. Finally, an efficient programmable architecture coupled with a scalable and dynamic transport network for storing and routing messages is proposed, and a full-decoder architecture is presented. Simulations demonstrate that the proposed architecture attains a throughput of 1.92 Gb/s for a frame length of 2304 bits, and achieves savings of 89.13% and 69.83% in power consumption and silicon area over state-of-the-art, with a reduction of 60.5% in interconnect length. Index Terms-Low-density parity-check (LDPC) codes, Ramanujan graphs, soft-input soft-output (SISO) decoder, turbo decoding algorithm, VLSI decoder architectures. I. INTRODUCTION T HE PHENOMENAL success of turbo codes [1] powered by the concept of iterative decoding via message-passing has rekindled the interest in low-density parity-check (LDPC) codes which were first discovered by Gallager in 1961 [2]. Recent breakthroughs to within 0.0045 dB of AWGN-channel capacity were achieved with the introduction of irregular LDPC codes in [3], [4] putting LDPC codes on par with turbo codes. However, efficient hardware implementation techniques of turbo decoders have given turbo codes a clear advantage Manuscript

Analysis and Design of Cost-Effective, High-Throughput LDPC Decoders

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

This paper introduces a new approach to costeffective, high-throughput hardware designs for Low Density Parity Check (LDPC) decoders. The proposed approach, called Non-Surjective Finite Alphabet Iterative Decoders (NS-FAIDs), exploits the robustness of message-passing LDPC decoders to inaccuracies in the calculation of exchanged messages, and it is shown to provide a unified framework for several designs previously proposed in the literature. NS-FAIDs are optimized by density evolution for regular and irregular LDPC codes, and are shown to provide different trade-offs between hardware complexity and decoding performance. Two hardware architectures targeting high-throughput applications are also proposed, integrating both Min-Sum (MS) and NS-FAID decoding kernels. ASIC post synthesis implementation results on 65nm CMOS technology show that NS-FAIDs yield significant improvements in the throughput to area ratio, by up to 58.75% with respect to the MS decoder, with even better or only slightly degraded error correction performance.

A Survey on Programmable LDPC Decoders

IEEE Access, 2016

Low-density parity-check (LDPC) block codes are popular forward error correction schemes due to their capacity-approaching characteristics. However, the realization of LDPC decoders that meet both low latency and high throughput is not a trivial challenge. Usually, this has been solved with the ASIC and FPGA technology that enables meeting the decoder design constraints. But the rise of parallel architectures, such as graphics processing units, and the scaling of CPU streaming extensions has shown that multicore and many-core technology can provide a flexible alternative to the development of dedicated LDPC decoders for the compute-intensive prototyping phase of the design of new codes. Under this light, this paper surveys the most relevant publications made in the past decade to programmable LDPC decoders. It looks at the advantages and disadvantages of parallel architectures and data-parallel programming models, and assesses how the design space exploration is pursued regarding key characteristics of the underlying code and decoding algorithm features. This paper concludes with a set of open problems in the field of communication systems on parallel programmable and reconfigurable architectures. INDEX TERMS LDPC codes, LDPC decoders, parallel computing, CPU, GPU, reconfigurable computing, high-level synthesis. Recently, he joined the R&D Department, Coriant GmBH, Lisbon, Portugal, where he is a Hardware Engineer. His research activities focus on architectures for error-correction and their resiliency to unreliable memory systems. He is an Affiliated Member of the HiPEAC network.

Flexible, Cost-Efficient, High-Throughput Architecture for Layered LDPC Decoders with Fully-Parallel Processing Units

2016 Euromicro Conference on Digital System Design (DSD), 2016

In this paper, we propose a layered LDPC decoder architecture targeting flexibility, high-throughput, low cost, and efficient use of the hardware resources. The proposed architecture provides full design time flexibility, i.e., it can accommodate any Quasi-Cyclic (QC) LDPC code, and also allows redefining a number of parameters of the QC-LDPC code at the run time. The main novelty of the paper consists of: (1) a new low-cost processing unit that merges in an efficient way the logical functionalities of the Variable-Node Unit (VNU) and the A Posteriori Log-Likelihood Ratio (AP-LLR) unit, (2) a high speed, low-cost Check-Node Unit (CNU) architecture, which is executed twice in order to complete the computation of the check-node messages at each iteration, (3) a splitting of the iteration processing in two perfectly symmetric stages, executed in two consecutive clock cycles, each one using exactly the same processing resources; the processing load is perfectly balanced between the two clock cycles, thus yielding an optimal clock frequency. Synthesis results targeting a 65nm CMOS technology for a (3, 6)-regular (648, 1296) Quasi-Cyclic LDPC code and for the WiMax (1152, 2304) irregular QC-LDPC code show significant improvements in terms of area and throughput compared to the baseline architecture discussed in this paper, as well as several state of the art implementations.

An Improved Throughput for Non-Binary Low-Density-Parity-Check Decoder

Computer Engineering and Applications, 2020

Low-Density-Parity-Check (LDPC) based error control decoders find wide range of application in both storage and communication systems, because of the merits they possess which include high appropriateness towards parallelization and excellent performance in error correction. Field-Programmable Gate Array (FPGA) has provided a robust platform in terms of parallelism, resource allocation and excellent performing speed for implementing non-binary LDPC decoder architectures. This paper proposes, a high throughput LDPC decoder through the implementation of fully parallel architecture and a reduction in the maximum iteration limit, needed for complete error correction. A Galois field of eight was utilized alongside a non-uniform quantization scheme, resulting in fewer bits per Log Likelihood Ratio (LLR) for the implementation. Verilog Hardware Description Language (HDL) was used in the description of the non-binary error control decoder. The propose decoder attained a throughput of 10Gbps...

Memory-efficient turbo decoder architectures for LDPC codes

IEEE Workshop on Signal Processing Systems

In this paper, we propose a turbo decoding messagepassing (TDMP) algorithm to decode regular and irregular lowdensity parity-check (LDPC) codes. The TDMP algorithm has two main advantages over the commonly employed two-phase messagepassing algorithm. First, it exhibits a faster convergence behavior (up to 50% less iterations), and improvement in coding gain (up to an order of magnitude for moderate-to-high SNR and small number of iterations). Second, the corresponding decoder architecture has a significantly reduced memory requirement that amounts to a savings of (75 + 25n/ C node-degrees)% > 75% for code-length n. A decoder architecture featuring the TDMP algorithm is also presented. Furthermore, we propose a new structure on the paritycheck matrix of an LDPC code based on permutation matrices aimed at reducing interconnect complexity and improving decoding throughput. In addition, we construct a wide range of LDPC codes based on Ramanujan graphs which possess this structure.

Architecture of a low-complexity non-binary LDPC decoder

2008 Second International Conference on Electrical Engineering, 2008

In this paper, we propose a hardware implementation of the EMS decoding algorithm for non-binary LDPC codes, presented in [10]. To the knowledge of the authors this is the first implementation of a GF(q) LDPC decoder for high order fields (q ≥ 64). The originality of the proposed architecture is that it takes into account the memory problem of the nonbinary LDPC decoders, together with a significant complexity reduction per decoding iteration which becomes independent from the field order. We present the estimation of the non-binary decoder implementation and key metrics including throughput and hardware complexity. The error decoding performance of the low complexity algorithm with proper compensation has been obtained through computer simulations. The frame error rate results are quite good with respect to the important complexity reduction. The results show also that an implementation of a nonbinary LDPC decoder is now feasible and the extra complexity of the decoder is balanced by the superior performance of this class of codes. With their foreseen simple architectures and good-error correcting performances, non-binary LDPC codes provide a promising vehicle for real-life efficient coding system implementations.

Enhancing the Error-Correcting Performance of LDPCCodes through an Efficient Use of Decoding Iterations

2013

The decoding of Low-Density Parity-Check (LDPC) codes is operated over a redundant structure known as the bipartite graph, meaning that the full set of bit nodes is not absolutely necessary for decoder convergence. In 2008, Soyjaudah and Catherine designed a recovery algorithm for LDPC codes based on this assumption and showed that the error-correcting performance of their codes outperformed conventional LDPC Codes. In this work, the use of the recovery algorithm is further explored to test the performance of LDPC codes while the number of iterations is progressively increased. For experiments conducted with small blocklengths of up to 800 bits and number of iterations of up to 2000, the results interestingly demonstrate that contrary to conventional wisdom, the errorcorrecting performance keeps increasing with increasing number of iterations.