Implementation of a High Throughput Soft MIMO Detector on GPU (original) (raw)

A GPU implementation of a real-time MIMO detector

2009 IEEE Workshop on Signal Processing Systems, 2009

Multiple-input multiple-output (MIMO) is an existing technique that can significantly increase throughput of the system by employing multiple antennas at the transmitter and the receiver. Realizing maximum benefit from this technique requires computationally intensive detectors which poses significant challenges to receiver design. Furthermore, a flexible detector or multiple detectors are needed to handle different configurations. Graphical Processor Unit (GPU), a highly parallel commodity programmable co-processor, can deliver extremely high computation throughput and is well suited for signal processing applications. However, careful architecture aware design is needed to leverage performance offered by GPU. We show we can achieve good performance while maintaining flexibility by employing an optimized trellis-based MIMO detector on GPU.

Flexible N-Way MIMO Detector on GPU

2012 IEEE Workshop on Signal Processing Systems, 2012

This paper proposes a flexible Multiple-Input Multiple-Output (MIMO) detector on graphics processing units (GPU). MIMO detection is a key technology in broadband wireless system such as LTE, WiMAX, and 802.11n. Existing detectors either use costly sorting for better performance or sacrifice sorting for high throughput. To achieve good performance with high thoughput, our detector runs multiple search passes in parallel, where each search pass detects the transmit stream with a different permuted detection order. We show that this flexible detector, including QR decomposition preprocessing, outperforms existing GPU MIMO detectors while maintaining good bit error rate (BER) performance. In addition, this detector can achieve different tradeoff between throughput and accuracy by changing the number of parallel search passes.

Reconfigurable real-time MIMO detector on GPU

2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers, 2009

In a high performance multiple-input multiple-output (MIMO) system, a soft output MIMO detector combined with a channel decoder is often used at the receiver to maximize performance gain. Graphic processor unit (GPU) is a low-cost parallel programmable co-processor that can deliver extremely high computation throughput and is well suited for signal processing applications. We propose and implement a novel soft MIMO detection algorithm and show we meet real-time performance while maintaining flexibility using GPU. 978-1-4244-5827-1/09/$26.00

IJERT-FPGA Implementation of High Throughput MIMO Detectors Based on Path Preserving Trellis Search Algorithm

International Journal of Engineering Research and Technology (IJERT), 2014

https://www.ijert.org/fpga-implementation-of-high-throughput-mimo-detectors-based-on-path-preserving-trellis-search-algorithm https://www.ijert.org/research/fpga-implementation-of-high-throughput-mimo-detectors-based-on-path-preserving-trellis-search-algorithm-IJERTV3IS20578.pdf In this paper , novel path preserving trellis search algorithm (PPTS) and its high speed VLSI architecture (FPGA) for soft-output multiple-input-multiple-output (MIMO) detection. In MIMO receiver design as the computational complexity increases exponentially with the number of antennas. So in order to overcome this complexity PPTS algorithms are introduced. The PPTS algorithm is a hardware-friendly data-parallel algorithm because the search operations are evenly distributed among multiple trellis node for parallel processing. The PPTS detector is guaranteed to have soft information for every possible symbol transmitted on every antenna so that log-likehood ratio (LLR) for each transmitted data bit can be more accurately formed. Simulating result shows that our proposed scheme PPTS algorithm can achieve low search complexity and high throughput mechanism. Keywords-FPGA, MIMO, soft output MIMO detectors , shortest path algorithms. I.INTRODUCTION Multiple-input-multiple-output (MIMO) system have potential to increase spectral efficiency by transmitting independent data streams on multiple antennas. MIMO technologies have been adopted in many new wireless standards such as Wi-MAX and WLAN. Soft-output MIMO detection poses significant challenges to the MIMO receiver design as the computational complexity increases exponentially with the number of antennas. However, the optimal soft-decision detector, the maximum a posterior detector, will consume enormous computing power and require tremendous computational resources which make it infeasible to be used in a practical MIMO receiver. So efficient algorithms used to reduce the MIMO detection complexity. The MIMO detection problem is usually tackled based on tree-search algorithms. The tree-search algorithms can be often categorized into the depth first search algorithm and the breadth-first search algorithm. The sphere detection algorithm is a depth-first tree-search algorithm to find the closest lattice point. To provide soft information for outer channel de-coders, a modified version of the sphere detection algorithm, or soft sphere detection algorithm, is introduced. There are many implementations of sphere detector sphere detector suffers from non-deterministic complexity and variable-time throughput. The sequential nature of the depth-first tree-search process significantly limits the throughput of the sphere detector especially when the SNR is low. The k-Best algorithm is a fixed-complexity algorithm based on the breadth-first tree-search algorithm. But this algorithm tends to have a high sorting complexity to find and retain the best candidates, which limits the throughput of the detector especially when k is large. In tree-search algorithms is that the counter-hypotheses for certain bits are missing due to tree pruning. As a consequence of missing counter-hypotheses, the magnitude of the LLRs for certain bits cannot be determined, which will lead to performance degradation. To avoid the missing counter-hypothesis problem and to reduce the search complexity, we investigate high performance MIMO detection algorithms and high-speed VLSI architectures. II.SYSTEM MODEL We consider a spatial-multiplexing MIMO system with N t transmit antennas and N r receive antennas (N r > N t). The MIMO transmission can be modeled as Y = Hs + n (1) Where H is a N r * N t complex matrix and is assumed to be known perfectly at the receiver, S is a Nt * 1 transmit symbol vector S [S 0 S 1 …..S Nt-1 ] T , is a received vector Y= [Y 0 Y 1 ….Y Nr-1 ] T , and is a vector of independent zero-mean complex Gaussian noise entries with variance per real component. A real bit-level vector X K = [ X K,0 X K,1 …X K, M C-1 ] T is mapped to the complex symbol S K , where the bth bit of X k is denoted as X k,b and is the number of bits per constellation point. Throughout this paper, the complex symbol and its associated bit vector X K will be used interchangeably. The optimal MAP detector is to compute the log-likelihood ratio (LLR) value for the a posteriori probability (APP) of each transmitted bit. Assuming there is no a priori information for the transmitted bit, the LLR APP of each bit X k,b can be computed , = ln exp (−1/2σ 2 y − H. s 2) s:x k ,b =+1 exp (−1/2σ 2 y − H. s 2 s:x k ,b =−1) (2) With max-log approximation,

A GPU implementation for two MIMO-OFDM detectors

2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, 2010

Two real-valued signal models based on selective spanning with fast enumeration (SSFE) and layered orthogonal lattice detector (LORD) algorithms are implemented on a Nvidia graphics processing unit (GPU). A 2 × 2 multiple-input multipleoutput (MIMO) antenna system with 16-quadrature amplitude modulation (16-QAM) is assumed. The chosen level update vector for SSFE is based on computer simulation results carried out in MATLAB environment. We implemented the algorithms with Nvidia Quadro FX 1700 GPU and achieved a throughput of 36.06 Mbps for SSFE and 16.8 Mbps for LORD. The results show that the general-purpose graphics processing unit (GPGPU) has the potential to achieve high throughput, presuming a detection algorithm that allows efficient parallel processing. The latency of the control code and partial Euclidean distance (PED) calculations are very small-scale, but the latency of memory loads and stores to the GPUs global memory are significant. We also compare results from the trellis based detector implementation for GPU, where a more powerful GPU and a different detection algorithm are used. The GPUs offer superior computing power and programmability compared to the application specific software defined radio (SDR) designs implemented so far.

High-Throughput Soft-Output MIMO Detector Based on Path-Preserving Trellis-Search Algorithm

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2000

In this paper, we propose a novel path-preserving trellis-search (PPTS) algorithm and its high-speed VLSI architecture for soft-output multiple-input-multiple-output (MIMO) detection. We represent the search space of the MIMO signal with an unconstrained trellis, where each node in stage of the trellis maps to a possible complex-valued symbol transmitted by antenna . Based on the trellis model, we convert the soft-output MIMO detection problem into a multiple shortest paths problem subject to the constraint that every trellis node must be covered in this set of paths. The PPTS detector is guaranteed to have soft information for every possible symbol transmitted on every antenna so that the log-likelihood ratio (LLR) for each transmitted data bit can be more accurately formed. Simulation results show that the PPTS algorithm can achieve near-optimal error performance with a low search complexity. The PPTS algorithm is a hardware-friendly data-parallel algorithm because the search operations are evenly distributed among multiple trellis nodes for parallel processing. As a case study, we have designed and synthesized a fully-parallel systolic-array detector and two folded detectors for a 4 4 16-QAM system using a 1.08 V TSMC 65-nm CMOS technology. With a 1.18 mm 2 core area, the folded detector can achieve a throughput of 2.1 Gbps. With a 3.19 mm 2 core area, the fully-parallel systolic-array detector can achieve a throughput of 6.4 Gbps.

Parallel SUMIS soft detector for large MIMO systems on multicore and GPU

The Journal of Supercomputing, 2018

The number of transmit and receive antennas is an important factor that affects the performance and complexity of a MIMO system. A MIMO systems with very large number of antennas is a promising candidate technology for next generations of wireless systems. However, the vast majority of the methods proposed for conventional MIMO system are not suitable for large dimensions. In this context, the use of High Performance Computing (HPC) systems, such us multicore CPUs and Grapfhics Processing Units (GPUs) has become attractive for efficient implementation of parallel signal processing algorithms with high computational requirements. In the present work two practical parallel approaches of the Subspace Marginalization with Interference Suppression (SUMIS) detector for large MIMO systems have been proposed. Both approaches have been evaluated and compared in terms of performance and complexity with other detectors for different system parameters.

A new MIMO detector architecture based on a Forward-Backward trellis algorithm

2008 42nd Asilomar Conference on Signals, Systems and Computers, 2008

In this paper, a recursive Forward-Backward (F-B) trellis algorithm is proposed for soft-output MIMO detection. Instead of using the traditional tree topology, we represent the search space of the MIMO signals with a fully connected trellis and a Forward-Backward recursion is applied to compute the a posteriori probability (APP) for each coded data bit. The proposed detector has the following advantages: a) it keeps a fixed throughput and has a regular datapath structure which makes it amenable to VLSI implementation, and b) it attempts to maximize the a posteriori probability by tracing both forward and backward on the trellis and it always ensures that at least one candidate exists for every possible transmitted bit x k ∈ {−1, +1}. Compared with the soft K-best detector, the proposed detector significantly reduces the complexity because sorting is not required, while still maintaining good performance. A maximum throughput of 533Mbps is achievable at a cost of 576K gates for 4 × 4 16-QAM system.

MIMOPack: a high-performance computing library for MIMO communication systems

The Journal of Supercomputing, 2014

Nowadays, several communication standards are emerging and evolving, searching higher transmission rates, reliability and coverage. This expansion is primarily driven by the continued increase in consumption of mobile multimedia services due to the emergence of new handheld devices such as smartphones and tablets. One of the most significant techniques employed to meet these demands is the use of multiple transmit and receive antennas, known as MIMO (Multiple Input Multiple Output) systems. The use of this technology allows to increase the transmission rate and the quality of the transmission through the use of multiple antennas at the transmitter and receiver sides. MIMO technologies have become an essential key in several wireless and broadband standards such as Wireless Local Area Network (WLAN), Worldwide interoperability for Microwave Acces (WiMAX), Long Term Evolution (LTE) and Next Generation Handheld (DVB-NGH), for the reception of Digital Terrestrial Television (DTT) in handheld devices. These technologies will be incorporated also in future standards, therefore is expected in the coming years a great deal of research in this field. Clearly, the study of MIMO systems is critical in the current investigation, however the problems that arise from this technology are very complex. High Performance Computing (HPC) systems, and specifically, modern hardware architectures as multi-core and many-cores (e.g Graphics Processing Units (GPU)) are playing a key role in the development of efficient and low-complexity algorithms for MIMO transmissions. Proof of this is that the number of scientific contributions and research projects related to its use has increased in the last years. Also, some high performance libraries have been implemented as tools for researchers or companies involved in the development of future communication standards. Two of the most popular libraries are: IT++ that is a library based on the use of some optimized libraries for multi-core processors and the Communications System Toolbox designed for use with MATLAB and Simulink, which uses GPU computing. However, there is not a library able to run on a heterogeneous platform using all the available resources. In view of the high computational requirements in MIMO application research and the shortage of tools able to satisfy them, we have made a vi Abstract special effort to develop a library to ease the development of adaptable parallel applications in accordance with the different architectures of the executing platform. The library, called MIMOPack, aims to implement efficiently using parallel computing, a set of functions to perform some of the critical stages of MIMO communication systems simulation. The main contribution of the thesis is the implementation of efficient Hard and Soft output detectors, since the detection stage is considered the most complex part of the communication process. These detectors are highly configurable and many of them include preprocessing techniques that reduce the computational cost and increase the performance. The proposed library shows three important features: portability, efficiency and easy of use. This library can be run over the last generation of machine architectures (current realease allows GPUs and multi-core computation), or even simultaneously, since it is designed to use on heterogeneous machines exploiting the whole computational capacity thus reducing the response time of the most complex problems. The interface of the functions are common to all environments in order to simplify the use of the library, regardless of the machine where the functions will be executed. Moreover, some of the functions are callable from MATLAB increasing the portability of developed codes between different computing environments. According to the library design and the performance assessment, we consider that MIMOPack may facilitate industrial and academic researchers the implementation of scientific codes without having to know different programming languages and machine architectures. This will allow to include more complex algorithms in their simulations and obtain their results faster. This is particularly important in the industry, since the manufacturers work to analyze and to propose their own technologies with the aim that it will be approved as a standard. Thus allowing to enforce their intellectual property rights over their competitors, who should obtain the corresponding licenses to include these technologies into their products.

A Survey of VLSI Implementations of Tree Search Algorithms for MIMO Detection

Multiple-input multiple-output (MIMO) detection algorithms have received considerable research interest in recent years, as a result of the increasing need for high data-rate communications. Detection techniques range from the low-complexity linear detectors to the maximum likelihood detector, which scales exponentially with the number of transmit antennas. In between these two extremes are the tree search (TS) algorithms, such as the popular sphere decoder, which have emerged as attractive choices for implementing MIMO detection, due to their excellent performance-complexity trade-offs. In this paper, we survey some of the state-of-the-art VLSI implementations of TS algorithms and compare their results using various metrics such as the throughput and power consumption. We also present notable contributions that have been made in the last three decades in implementing TS algorithms for MIMO detection, especially with respect to achieving low complexity, high throughput designs. Finally, a number of design considerations and trade-offs for implementing MIMO detectors in hardware are presented. Keywords MIMO detection algorithms · sphere decoding · survey · wireless communications · very large scale integration (VLSI)