Sparse Matrices Research Papers - Academia.edu (original) (raw)

2025

Previous studies in numerical analysis have shown how the calculation of a Jacobians, Hessians, and their factorizations can be accelerated when their sparsity pattern is known. However, accurate Jacobian and Hessian sparsity patterns... more

Previous studies in numerical analysis have shown how the calculation of a Jacobians, Hessians, and their factorizations can be accelerated when their sparsity pattern is known. However, accurate Jacobian and Hessian sparsity patterns cannot be computed numerically, leaving the burden on the user to provide them. In this manuscript we develop a method for the accurate and efficient construction of sparsity patterns by transforming an input program into one that computes the sparsity pattern of its Jacobian or Hessian. Our implementation, which we demonstrate on partial differential equations, is a scalable technique for acceleration of automatic differentiation on arbitrarily complex multivariate programs. This work also demonstrates that the effectiveness of dynamic program analysis when applied to differentiable programming is yet to be fully realized.

2025

For the Simultaneous Localization and Mapping problem several efficient algorithms have been proposed that make use of a sparse information matrix representation (e.g. SEIF, TJTF, treemap). Since the exact SLAM information matrix is... more

For the Simultaneous Localization and Mapping problem several efficient algorithms have been proposed that make use of a sparse information matrix representation (e.g. SEIF, TJTF, treemap). Since the exact SLAM information matrix is dense, these algorithm have to approximate it (sparsification). It has been empirically observed that this approximation is adequate because entries in the matrix corresponding to distant landmarks are extremely small. This paper provides a theoretical proof for this observation, specifically showing that the off-diagonal entries corresponding to two landmarks decay exponentially with the distance traveled between observation of first and second landmark.

2025, Neural Information Processing

Developing energy consumption models for smart buildings is important for studying demand response, home energy management, and distribution network simulation. In this work, we develop parsimonious Markovian models of smart buildings for... more

Developing energy consumption models for smart buildings is important for studying demand response, home energy management, and distribution network simulation. In this work, we develop parsimonious Markovian models of smart buildings for different periods in a day for predicting electricity consumption. To develop these models, we collect two data sets with widely different load profiles over a period of seven months and one year, respectively. We validate the accuracy of our models for load prediction and compare our results with neural networks.

2025, IEEE Transactions on Antennas and Propagation

2025, 2011 12th International Symposium on Quality Electronic Design

With the increase in circuit frequency and supply voltage Scaling, a robust power network design is essential to ensure that the circuits on a chip operate reliably at the guaranteed level of performance. Traditionally the power network... more

With the increase in circuit frequency and supply voltage Scaling, a robust power network design is essential to ensure that the circuits on a chip operate reliably at the guaranteed level of performance. Traditionally the power network analysis has its main focus on IR-drop effects. However, IR drop analysis approaches have strong dependence on the input vectors and may require a tremendously long execution time. In this paper, we propose a novel and fast power network analysis method which calculates the effective resistance between all power pads and power grids. This method explores huge parasitic power networks and detects hot spots with an abnormal effective resistance value resulted from gross errors in the post-layout power network. We currently use the proposed method for our memory and DDI circuits to validate the post-layout power network quickly. We developed our method by using multi-thread and multi-process techniques, resulting in up to 50 times speed improvement.

2025

This paper presents a hardware solution to the design of general low-density parity-check (LDPC) decoders, which simplifies the delivery network required by the message passing algorithm. While many designs of LDPC decoders for specific... more

This paper presents a hardware solution to the design of general low-density parity-check (LDPC) decoders, which simplifies the delivery network required by the message passing algorithm. While many designs of LDPC decoders for specific classes of codes exist in the literature, the design of a general LDPC decoder capable of supporting random LDPC codes is still challenging. The method proposed in this paper tries to pack different check node (CN) and variable node (VN) messages in the Tanner graph representation of the LDPC code, and is therefore called message packing. This method takes advantage of the fact that for high-rate LDPC's the CN's degree is much larger than the VN's, and two distinct methods for delivering the messages to the CNs and VNs are proposed. Using the proposed interconnection network (IN) results in lower complexity decoding of LDPC codes when compared to other designs.

2025

This paper presents a hardware solution to the design of general low-density parity-check (LDPC) decoders, which simplifies the delivery network required by the message passing algorithm. While many designs of LDPC decoders for specific... more

This paper presents a hardware solution to the design of general low-density parity-check (LDPC) decoders, which simplifies the delivery network required by the message passing algorithm. While many designs of LDPC decoders for specific classes of codes exist in the literature, the design of a general LDPC decoder capable of supporting random LDPC codes is still challenging. The method proposed in this paper tries to pack different check node (CN) and variable node (VN) messages in the Tanner graph representation of the LDPC code, and is therefore called message packing. This method takes advantage of the fact that for high-rate LDPC's the CN's degree is much larger than the VN's, and two distinct methods for delivering the messages to the CNs and VNs are proposed. Using the proposed interconnection network (IN) results in lower complexity decoding of LDPC codes when compared to other designs.

2025, 2010 IEEE 11th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)

We consider in this paper the problem of blind frame synchronization of systems using Reed-Solomon (RS) codes and other related families. We present first of all three techniques of blind frame synchronization based on the non-binary... more

We consider in this paper the problem of blind frame synchronization of systems using Reed-Solomon (RS) codes and other related families. We present first of all three techniques of blind frame synchronization based on the non-binary parity check matrix of RS codes. While the first two techniques involve the calculation of hard and soft values of the syndrome elements respectively, the third one perform an adaptation step of the parity check matrix before applying the soft criterion. Although RS codes are constructed from non-binary symbols, we show in this paper that it is also possible to synchronize them using the binary image expansion of their parity check matrix. Simulation results show that the synchronization algorithm based on the adaptation of the binary parity check matrix of RS codes has the best synchronization performance among all other techniques. Furthermore, the Frame Error Rate (FER) curves obtained after synchronization and decoding are very close to the perfect synchronization curves.

2025

In engineering and computing, the finite element approximation is one of the most well-known computational solution techniques. It is a great tool to find solutions for mechanic, fluid mechanic and ecological problems. Whoever works with... more

In engineering and computing, the finite element approximation is one of the most well-known computational solution techniques. It is a great tool to find solutions for mechanic, fluid mechanic and ecological problems. Whoever works with the finite element method will need to solve a large system of linear equations. There are different ways to find a solution. One way is to use a matrix decomposition technique such as LU or QR. The other possibility is to use an iterative solution algorithm like Conjugate Gradients, Gaus-Seidel, Multigrid Methods, etc. This paper will focus on iterative solvers and the needed storage techniques...

2025, Independent Research Submission

Matrix multiplication is a cornerstone of computational mathematics. Standard algorithms for 2x2 matrices require 8 scalar multiplications, while Strassen's algorithm reduces this to 7. This paper introduces and details Surya Matrix... more

Matrix multiplication is a cornerstone of computational mathematics. Standard algorithms for 2x2 matrices require 8 scalar multiplications, while Strassen's algorithm reduces this to 7. This paper introduces and details Surya Matrix Multiplication, a specialized method tailored for 2x2 symmetric matrices of the circulant form a b b a. We provide a direct algebraic derivation showing that the product can be achieved using only 2 scalar multiplications. This offers a significant computational advantage for this specific matrix structure. The method is compared against standard and Strassen's multiplication, including algorithmic descriptions and Python implementations, to highlight its distinct characteristics and efficiency gains. The resulting product matrix also retains the symmetric circulant form.

2025

Frequently, to determine resistivity distribution on Electrical Impedance Tomography (EIT), a sequence of direct Finite Elements problems must be solved. This is what happens on Newton-Raphson based algorithms and Kalman Filter based... more

Frequently, to determine resistivity distribution on Electrical Impedance Tomography (EIT), a sequence of direct Finite Elements problems must be solved. This is what happens on Newton-Raphson based algorithms and Kalman Filter based algorithms for EIT. Only a small fraction of the nodal potentials are possible to be measured and only this small number of potentials are taken into account in the error index that is to be minimized to estimate the resistivity distribution. This work investigates the effects of static condensation, iterative refinement, and routines for sparse matrices, to solve the direct problem in EIT from the point of view of computational time and numerical error propagation. Results indicate that computational time and numerical error propagation can be diminished under certain conditions.

2025, Numerical Algorithms

In this paper we analyze how to update incomplete Cholesky preconditioners to solve least squares problems using iterative methods when the set of linear relations is updated with some new information, a new variable is added or,... more

In this paper we analyze how to update incomplete Cholesky preconditioners to solve least squares problems using iterative methods when the set of linear relations is updated with some new information, a new variable is added or, contrarily, some information or variable is removed from the set. Our proposed method computes a low-rank update of the preconditioner using a bordering method which is inexpensive compared with the cost of computing a new preconditioner. Moreover the numerical experiments presented show that this strategy gives, in many * Partially supported by Spanish Grants MTM2014-58159-P and MTM2015-68805-REDT. 1 cases, a better preconditioner than other choices, including the computation of a new preconditioner from scratch or reusing an existing one.

2025

Fluorescence imaging in diffusive media is an emerging imaging modality for medical applications which uses injected fluorescent markers (several ones may be simultaneously injected) that bind to specific targets, as tumors. The region of... more

Fluorescence imaging in diffusive media is an emerging imaging modality for medical applications which uses injected fluorescent markers (several ones may be simultaneously injected) that bind to specific targets, as tumors. The region of interest is illuminated with near infrared light and the emitted back fluorescence is analyzed to localize the fluorescence sources. To investigate thick medium, as the fluorescence signal decreases with the light travel distance, any disturbing signal, such as biological tissues intrinsic fluorescence -called autofluorescence -, is a limiting factor. To remove autofluorescence and isolate each specific fluorescent signal from the others, a spectroscopic approach, based on Non-negative Matrix Factorization, is explored. We ran an NMF algorithm with sparsity constraints on experimental data, and successfully obtained separated in vivo fluorescence spectra.

2025, Applied Mathematics and Computation

The present paper argues that it suffices for an algorithmic time complexity measure to be system invariant rather than system independent (which means predicting from the desk).

2025, ArXiv

This letter addresses the estimation of directions-of-arrival (DoA) by a sensor array using a sparse model in the presence of array calibration errors and off-grid directions. The received signal utilizes previously used models for... more

This letter addresses the estimation of directions-of-arrival (DoA) by a sensor array using a sparse model in the presence of array calibration errors and off-grid directions. The received signal utilizes previously used models for unknown errors in calibration and structured linear representation of the off-grid effect. A convex optimization problem is formulated with an objective function to promote two-layer joint block-sparsity with its second-order cone programming (SOCP) representation. The performance of the proposed method is demonstrated by numerical simulations and compared with the Cramer-Rao Bound (CRB), and several previously proposed methods.

2025

The internet is a huge collection of websites in the order of 10 bytes. Around 90% of the world’s population uses search engines for getting relevant information. According to Wikipedia, more than 200 million Indians use the Internet... more

The internet is a huge collection of websites in the order of 10 bytes. Around 90% of the world’s population uses search engines for getting relevant information. According to Wikipedia, more than 200 million Indians use the Internet every day. Thus the correct data retrieval least time domain is the most important task. Hence need of efficient and parallel PageRanking algorithm. All the existing implementations are cluster based and to process huge lists of data take awful lot of time. The difficulty in cluster based approach is latency among different nodes participating in the computation. Since internet has large distributions of weblinks, collaboration of partial results after processing is a major issue. Thus latency factor overcomes the performance achievement of parallel cluster computation. As complete list can be hosted on one data server, PCI based communication mechanism can be used as a solution in addition of high parallel computation power with GPUs. So our approach a...

2025, arXiv (Cornell University)

We describe a parallel iterative least squares solver named LSRN that is based on random normal projection. LSRN computes the min-length solution to min x∈R n Ax -b 2 , where A ∈ R m×n with m n or m n, and where A may be rank-deficient.... more

We describe a parallel iterative least squares solver named LSRN that is based on random normal projection. LSRN computes the min-length solution to min x∈R n Ax -b 2 , where A ∈ R m×n with m n or m n, and where A may be rank-deficient. Tikhonov regularization may also be included. Since A is only involved in matrix-matrix and matrix-vector multiplications, it can be a dense or sparse matrix or a linear operator, and LSRN automatically speeds up when A is sparse or a fast linear operator. The preconditioning phase consists of a random normal projection, which is embarrassingly parallel, and a singular value decomposition of size γ min(m, n) × min(m, n), where γ is moderately larger than 1, e.g., γ = 2. We prove that the preconditioned system is well-conditioned, with a strong concentration result on the extreme singular values, and hence that the number of iterations is fully predictable when we apply LSQR or the Chebyshev semi-iterative method. As we demonstrate, the Chebyshev method is particularly efficient for solving large problems on clusters with high communication cost. Numerical results demonstrate that on a shared-memory machine, LSRN outperforms LAPACK's DGELSD on large dense problems, and MATLAB's backslash (SuiteSparseQR) on sparse problems. Further experiments demonstrate that LSRN scales well on an Amazon Elastic Compute Cloud cluster.

2025, Annals

Sparse storage formats are techniques for storing and processing the sparse matrix data efficiently. The performance of these storage formats depend upon the distribution of non-zeros, within the matrix in different dimensions. In order... more

Sparse storage formats are techniques for storing and processing the sparse matrix data efficiently. The performance of these storage formats depend upon the distribution of non-zeros, within the matrix in different dimensions. In order to have better results we ...

2025, 2016 IEEE High Performance Extreme Computing Conference (HPEC)

The GraphBLAS standard (GraphBlas.org) is being developed to bring the potential of matrix-based graph algorithms to the broadest possible audience. Mathematically, the GraphBLAS defines a core set of matrix-based graph operations that... more

The GraphBLAS standard (GraphBlas.org) is being developed to bring the potential of matrix-based graph algorithms to the broadest possible audience. Mathematically, the GraphBLAS defines a core set of matrix-based graph operations that can be used to implement a wide class of graph algorithms in a wide range of programming environments. This paper provides an introduction to the mathematics of the GraphBLAS. Graphs represent connections between vertices with edges. Matrices can represent a wide range of graphs using adjacency matrices or incidence matrices. Adjacency matrices are often easier to analyze while incidence matrices are often better for representing data. Fortunately, the two are easily connected by matrix multiplication. A key feature of matrix mathematics is that a very small number of matrix operations can be used to manipulate a very wide range of graphs. This composability of a small number of operations is the foundation of the GraphBLAS. A standard such as the GraphBLAS can only be effective if it has low performance overhead. Performance measurements of prototype GraphBLAS implementations indicate that the overhead is low.

2025

Chapter 6 is in part a reprint of material that will appear in the IEEE Intelligent Systems Magazine, 2010, by Brendan T. Morris and Mohan M. Trivedi. The dissertation author was the primary investigator and author of these papers.... more

Chapter 6 is in part a reprint of material that will appear in the IEEE Intelligent Systems Magazine, 2010, by Brendan T. Morris and Mohan M. Trivedi. The dissertation author was the primary investigator and author of these papers. Chapter 7 is in part a reprint of material that appears in the Proceedings of the IEEE International Conference on Vehicular Electronics and Safety (ICVES), 2009, by Brendan T.

2025, Proceedings of the 48th International Conference on Parallel Processing

Parallel File Systems (PFSs) are frequently deployed on leadership High Performance Computing (HPC) systems to ensure efficient I/O, persistent storage and scalable performance. Emerging Deep Learning (DL) applications incur new I/O and... more

Parallel File Systems (PFSs) are frequently deployed on leadership High Performance Computing (HPC) systems to ensure efficient I/O, persistent storage and scalable performance. Emerging Deep Learning (DL) applications incur new I/O and storage requirements to HPC systems with batched input of small random files. This mandates PFSs to have commensurate features that can meet the needs of DL applications. BeeGFS is a recently emerging PFS that has grabbed the attention of the research and industry world because of its performance, scalability and ease of use. While emphasizing a systematic performance analysis of BeeGFS, in this paper, we present the architectural and system features of BeeGFS, and perform an experimental evaluation using cutting-edge I/O, Metadata and DL application benchmarks. Particularly, we have utilized AlexNet and ResNet-50 models for the classification of ImageNet dataset using the Livermore Big Artificial Neural Network Toolkit (LBANN), and ImageNet data reader pipeline atop TensorFlow and Horovod. Through extensive performance characterization of BeeGFS, our study provides a useful documentation on how to leverage BeeGFS for the emerging DL applications.

2025

Sparse LU factorization offers some potential for parallelism, but at a level of very fine granularity. However, most current distributed memory MIMD architectures have too high communication latencies for exploiting all parallelism... more

Sparse LU factorization offers some potential for parallelism, but at a level of very fine granularity. However, most current distributed memory MIMD architectures have too high communication latencies for exploiting all parallelism available. To cope with this, latencies must be avoided by coarsening the granularity and by message fusion. However, both techniques limit the concurrency, thereby reducing the scalability. In this paper, an implementation of a parallel LU decomposition algorithm for linear programming bases is presented for distributed memory parallel computers with noticable communication latencies. Several design decisions due to latencies, including data distribution and load balancing techniques, are discussed. An approximate performance model is set up for the algorithm, which allows to quantify the impact of latencies on its performance. Finally, experimental results for an Intel iPSC/860 parallel computer are reported and discussed.

2025

We are concerned with the efficient numerical solution of minimization problems in Hilbert spaces involving sparsity constraints. These optimizations arise, e.g., in the context of inverse problems. In this work we analyze a very... more

We are concerned with the efficient numerical solution of minimization problems in Hilbert spaces involving sparsity constraints. These optimizations arise, e.g., in the context of inverse problems. In this work we analyze a very efficient variant of the well-known iterative soft-shrinkage algorithm for large or even infinite dimensional problems. This algorithm is modified in the following way. Instead of prescribing a fixed thresholding parameter, we use a decreasing thresholding strategy. Moreover, we use suitable variants of the adaptive schemes derived by Cohen, Dahmen and DeVore [11, 12] for the approximation of the infinite matrix-vector products. We derive a block multiscale preconditioning technique which allows for local wellconditioning of the underlying matrices and for extending the concept of restricted isometry property to infinitely labelled matrices. The careful combination of these ingredients gives rise to a numerical scheme that is guaranteed to converge with exponential rate, and which allows for a controlled inflation of the support size of the iterations. We also present numerical experiments that confirm the applicability of our approach which extends concepts from compressed sensing to large scale simulation.

2025, Advances in Computational Mathematics

This paper is concerned with the development of adaptive numerical methods for elliptic operator equations. We are especially interested in discretization schemes based on frames. The central objective is to derive an adaptive frame... more

This paper is concerned with the development of adaptive numerical methods for elliptic operator equations. We are especially interested in discretization schemes based on frames. The central objective is to derive an adaptive frame algorithm which is guaranteed to converge for a wide range of cases. As a core ingredient we use the concept of Gelfand frames which induces equivalences between smoothness norms and weighted sequence norms of frame coefficients. It turns out that this Gelfand characteristic of frames is closely related to their localization properties. We also give constructive examples of Gelfand wavelet frames on bounded domains. Finally, an application to the efficient adaptive computation of canonical dual frames is presented.

2025, 2006 IEEE International Conference on Field Programmable Technology

This paper presents a new real time programmable irregular Low Density Parity Check (LDPC) Encoder as specified in the IEEE P802.16E/D7 standard. The encoder is programmable for frame sizes from 576 to 2304 and for five different code... more

This paper presents a new real time programmable irregular Low Density Parity Check (LDPC) Encoder as specified in the IEEE P802.16E/D7 standard. The encoder is programmable for frame sizes from 576 to 2304 and for five different code rates. H matrix is efficiently generated and stored for a particular frame size and code rate. The encoder is implemented on Reconfigurable Instruction Cell based Architecture which has recently emerged as an ultra low power, high performance, ANSI-C programmable embedded core. Different general and technology specific optimization techniques are applied in order to achieve a throughput. ranging from 10 to 19 Mbps.

2025

Parallel computing using accelerators has gained widespread research attention in the past few years. In particular, using GPUs for general purpose computing has brought forth several success stories with respect to time taken, cost,... more

Parallel computing using accelerators has gained widespread research attention in the past few years. In particular, using GPUs for general purpose computing has brought forth several success stories with respect to time taken, cost, power, and other metrics. However, accelerator based computing has significantly relegated the role of CPUs in computation. As CPUs evolve and also offer matching computational resources, it is important to also include CPUs in the computation. We call this the hybrid computing model. Indeed, most computer systems of the present age offer a degree of heterogeneity and therefore such a model is quite natural. We reevaluate the claim of a recent paper by Lee et al. (ISCA 2010). We argue that the right question arising out of Lee et al. (ISCA 2010) should be how to use a CPU+GPU platform efficiently, instead of whether one should use a CPU or a GPU exclusively. To this end, we experiment with a set of 13 diverse workloads ranging from databases, image processing, sparse matrix kernels, and graphs. We experiment with two different hybrid platforms: one consisting of a 6-core Intel i7-980X CPU and an NVidia Tesla T10 GPU, and another consisting of an Intel E7400 dual core CPU with an NVidia GT520 GPU. On both these platforms, we show that hybrid solutions offer good advantage over CPU or GPU alone solutions. On both these platforms, we also show that our solutions are 90% resource efficient on average. Our work therefore suggests that hybrid computing can offer tremendous advantages at not only research-scale platforms but also the more realistic scale systems with significant performance gains and resource efficiency to the large scale user community.

2025, Bulletin of Electrical Engineering and Informatics

Low-density parity-check (LDPC) codes are widely recognized for their excellent forward error correction, near-Shannon-limit performance, and support for high data rates with effective hardware parallelization. Their convolutional... more

Low-density parity-check (LDPC) codes are widely recognized for their excellent forward error correction, near-Shannon-limit performance, and support for high data rates with effective hardware parallelization. Their convolutional counterpart, LDPC convolutional codes (LDPC-CCs), offer additional advantages such as variable codeword lengths, unlimited paritycheck matrices, and simpler encoding and decoding. These features make LDPC-CCs particularly suitable for practical implementations with varying channel conditions and data frame sizes. This paper investigates the performance of LDPC-CCs using the extrinsic information transfer (EXIT) chart, a graphical tool for analyzing iterative decoding. EXIT charts visualize mutual information exchange and help predict convergence behavior, estimate performance thresholds, and optimize code design. Starting with the EXIT chart principles for LDPC codes, we derived the mutual information functions for variable and check nodes in regular and irregular LDPC-CC tanner graphs. This involved adapting existing EXIT functions to the periodic parity-check matrix of LDPC-CCs. We compare regular and irregular LDPC-CC constructions, examining the impact of degree distributions and the number of periods in the parity-check matrix on convergence behavior. Our simulations show that irregular LDPC-CCs consistently outperform regular ones, and the EXIT chart analysis confirms that LDPC-CCs demonstrate superior bit error rate (BER) performance compared to equivalent LDPC block codes.

2025, HAL (Le Centre pour la Communication Scientifique Directe)

We study the enumeration of inversion sequences that avoid pattern 021 and another pattern of length four. We determine the generating trees for all possible pattern pairs and compute the corresponding generating functions. We introduce... more

We study the enumeration of inversion sequences that avoid pattern 021 and another pattern of length four. We determine the generating trees for all possible pattern pairs and compute the corresponding generating functions. We introduce the concept of d-regular generating trees and conjecture that for any 021-avoiding pattern τ , the generating tree T ({021, τ }) is d-regular for some integer d.

2025

We have made experiments with reordering algorithms on sparse very large graphs (VLGs), we have considered only undirected, unweighted, sparse and huge graphs, i.e. G = (V;E) with n = jV j from million to billion of nodes and with jEj =... more

We have made experiments with reordering algorithms on sparse very large graphs (VLGs), we have considered only undirected, unweighted, sparse and huge graphs, i.e. G = (V;E) with n = jV j from million to billion of nodes and with jEj = O(jV j). The problem of reordering a matrix to enhance the computation time (and sometimes the memory) is traditional in numerical algorithms but we focus on this short paper on results obtained for the approximate computation of the diameter of a sparse VLG (with some graphs on various different computers). The problem of reordering a graph has already been pointed, explicitly or implicitly by a lot of people, from the numerical community but also from the graph community, like the authors of the Louvain algorithm when they write that choosing an order is thus worth studying since it could accelerate the Louvain algorithm. Our experimental results show clearly that it can be worth (and simple) to preprocess a sparse VLG with a reordering algorithm.

2025, IEEE Transactions on Circuits and Systems I: Regular Papers

Modern communication standards, such as 5G new radio (5G NR), require a high speed decoder for highly irregular quasi-cyclic low density parity check (QC-LDPC) codes. A widely used approach in QC-LDPC decoders is a layered decoding... more

Modern communication standards, such as 5G new radio (5G NR), require a high speed decoder for highly irregular quasi-cyclic low density parity check (QC-LDPC) codes. A widely used approach in QC-LDPC decoders is a layered decoding schedule which processes the parity check matrix in parts, thus providing faster convergence. However, pipelined layered decoding architecture suffers from data hazards that reduce the throughput. This paper presents a novel architecture, which can facilitate any QC-LDPC decoding without stall cycles caused by pipeline hazards. The decoder conveniently incorporates both the layered and the flooding schedules in cases when hazards occur. The paper also presents the genetic algorithm based optimization of the decoding schedule for better signal-to-noise ratio (SNR) performance. The proposed architecture enables insertion of a large number of pipeline stages, thus providing high operating frequency. As a case study, the FPGA implementation for WiMAX, DVB-S2X, and 5G NR provided coded throughput of up to 1.77 Gbps, 4.32 Gbps, and 4.92 Gbps at 10 iterations, respectively. The results show a strong throughput increase of 30%-109% compared with the conventional layered decoder for 5G NR for the same SNR performance. The decoder provides highly efficient utilization of resources when compared with the state-of-the-art solutions. Index Terms-5G new radio, genetic algorithm optimization, high throughput, layered decoding, low density parity check (LDPC) codes, pipeline, quasi cyclic (QC) LDPC I. INTRODUCTION UE to their excellent error correcting performance, low density parity check (LDPC) codes [1] are increasingly used in many applications, e.g. in storage devices [2] and in many wired [3] and wireless communication standards [4]-[8]. LDPC code is completely defined by its parity-check matrix (PCM), but can also be represented using the Tanner graph . LDPC code is sparse, i.e. of low density, so both the encoding and the decoding processes can be of low computation complexity. The decoding process is usually based on the iterative message-passing algorithm [10], , which can Manuscript

2025, 2006 IEEE International Conference on Field Programmable Technology

This paper presents a new real time programmable irregular Low Density Parity Check (LDPC) Encoder as specified in the IEEE P802.16E/D7 standard. The encoder is programmable for frame sizes from 576 to 2304 and for five different code... more

This paper presents a new real time programmable irregular Low Density Parity Check (LDPC) Encoder as specified in the IEEE P802.16E/D7 standard. The encoder is programmable for frame sizes from 576 to 2304 and for five different code rates. H matrix is efficiently generated and stored for a particular frame size and code rate. The encoder is implemented on Reconfigurable Instruction Cell based Architecture which has recently emerged as an ultra low power, high performance, ANSI-C programmable embedded core. Different general and technology specific optimization techniques are applied in order to achieve a throughput. ranging from 10 to 19 Mbps.

2025, Future Generation Computer Systems

Writing efficient iterative solvers for irregular sparse matrices in High Performance Fortran (HPF) is hard. The locality in the computations is unclear, and for efficiency we use storage schemes that obscure any structure in the matrix.... more

Writing efficient iterative solvers for irregular sparse matrices in High Performance Fortran (HPF) is hard. The locality in the computations is unclear, and for efficiency we use storage schemes that obscure any structure in the matrix. Moreover, the limited capabilities of HPF to distribute and align data structures make it hard to implement the desired distributions, or to indicate these such that the compiler recognizes the efficient implementation. We propose techniques to handle these problems. We combine strategies that have become popular in message-passing parallel programming, like mesh partitioning and splitting the matrix in local submatrices, with the functionality of HPF and HPF compilers, like the implicit handling of communication and distribution. The implementation of these techniques in HPF is not trivial, and we describe in detail how we propose to solve the problems. Our results demonstrate that very efficient implementations are possible. We indicate how some of the 'approved extensions' of HPF-2.0 can be used, but they do not solve all problems. For comparison we show the results for regular sparse matrices.

2025, Future Generation Computer Systems

Writing efficient iterative solvers for irregular sparse matrices in High Performance Fortran (HPF) is hard. The locality in the computations is unclear, and for efficiency we use storage schemes that obscure any structure in the matrix.... more

Writing efficient iterative solvers for irregular sparse matrices in High Performance Fortran (HPF) is hard. The locality in the computations is unclear, and for efficiency we use storage schemes that obscure any structure in the matrix. Moreover, the limited capabilities of HPF to distribute and align data structures make it hard to implement the desired distributions, or to indicate these such that the compiler recognizes the efficient implementation. We propose techniques to handle these problems. We combine strategies that have become popular in message-passing parallel programming, like mesh partitioning and splitting the matrix in local submatrices, with the functionality of HPF and HPF compilers, like the implicit handling of communication and distribution. The implementation of these techniques in HPF is not trivial, and we describe in detail how we propose to solve the problems. Our results demonstrate that very efficient implementations are possible. We indicate how some of the 'approved extensions' of HPF-2.0 can be used, but they do not solve all problems. For comparison we show the results for regular sparse matrices.

2025, Advances in Pure Mathematics

The goal of this study is to propose a method of estimation of bounds for roots of polynomials with complex coefficients. A well-known and easy tool to obtain such information is to use the standard Gershgorin's theorem, however, it... more

The goal of this study is to propose a method of estimation of bounds for roots of polynomials with complex coefficients. A well-known and easy tool to obtain such information is to use the standard Gershgorin's theorem, however, it doesn't take into account the structure of the matrix. The modified disks of Gershgorin give the opportunity through some geometrical figures called Ovals of Cassini, to consider the form of the matrix in order to determine appropriated bounds for roots. Furthermore, we have seen that, the Hessenbeg matrices are indicated to estimate good bounds for roots of polynomials as far as we become improved bounds for high values of polynomial's coefficients. But the bounds are better for small values. The aim of the work was to take advantages of this, after introducing the Dehmer's bound, to find an appropriated property of the Hessenberg form. To illustrate our results, illustrative examples are given to compare the obtained bounds to those obtained through classical methods like Cauchy's bounds, Montel's bounds and Carmichel-Mason's bounds.

2024, IEEE Transactions on Antennas and Propagation

Various methods for efficiently solving electromagnetic problems are presented. Electromagnetic scattering problems can be roughly classified into surface and volume problems, while fast methods are either differential or integral... more

Various methods for efficiently solving electromagnetic problems are presented. Electromagnetic scattering problems can be roughly classified into surface and volume problems, while fast methods are either differential or integral equation based. The resultant systems of linear equations are either solved directly or iteratively. A review of various differential equation solvers, their complexities, and memory requirements is given. The issues of grid dispersion and hybridization with integral equation solvers are discussed. Several fast integral equation solvers for surface and volume scatterers are presented. These solvers have reduced computational complexities and memory requirements.

2024, Probabilistic Graphical Models

Existing score-based causal model search algorithms such as GES (and a speeded up version, FGS) are asymptotically correct, fast, and reliable, but make the unrealistic assumption that the true causal graph does not contain any unmeasured... more

Existing score-based causal model search algorithms such as GES (and a speeded up version, FGS) are asymptotically correct, fast, and reliable, but make the unrealistic assumption that the true causal graph does not contain any unmeasured confounders. There are several constraint-based causal search algorithms (e.g RFCI, FCI, or FCI+) that are asymptotically correct without assuming that there are no unmeasured confounders, but often perform poorly on small samples. We describe a combined score and constraint-based algorithm, GFCI, that we prove is asymptotically correct. On synthetic data, GFCI is only slightly slower than RFCI but more accurate than FCI, RFCI and FCI+.

2024

The solid transportation problem (STP) is a particular type of linear programming problem. This paper presented an approach for solving STP in a highly efficient and a few iterations until finding an optimal solution. The proposed method... more

The solid transportation problem (STP) is a particular type of linear programming problem. This paper presented an approach for solving STP in a highly efficient and a few iterations until finding an optimal solution. The proposed method provides an initial solution near to optimum solution to the solid transportation problem close to the optimal solution. Using Lingo software, a numerical example was used to compare the solution from the proposed method to the optimal solution.

2024, arXiv (Cornell University)

2024, Computing

Circulant Block-Factorization Preconditioners for Elliptic Problems. New circulant block-factorization preconditioners are introduced and studied. The general approach is first formulated for the case of block tridiagonal sparse matrices.... more

Circulant Block-Factorization Preconditioners for Elliptic Problems. New circulant block-factorization preconditioners are introduced and studied. The general approach is first formulated for the case of block tridiagonal sparse matrices. Then estimates of the relative condition number for a model Dirichlet boundary value problem are derived. In the case of y-periodic problems the circulant block-factorization preconditioner is shown to give an optimal convergence rate. Finally, using a proper imbedding of the original Dirichlet boundary value problem to a y-periodic one a preconditioner of optimal convergence rate for the general case is obtained. The total computational cost of the preconditioner is O(N log N) (based on FFT), where N is the number of unknowns. That is, the algorithm is nearly optimal Various numerical tests that demonstrate the features of the circulant block-factorization preconditioners are presented.

2024, HAL (Le Centre pour la Communication Scientifique Directe)

Example with a memory of size 2 data. The graph of input data dependencies is shown on the left. The figure on the right corresponds to the partition and schedule produced by the scheduler ▸ Deque Model Data Aware Ready (DMDAR): Deque... more

Example with a memory of size 2 data. The graph of input data dependencies is shown on the left. The figure on the right corresponds to the partition and schedule produced by the scheduler ▸ Deque Model Data Aware Ready (DMDAR): Deque Model Data Aware Ready (DMDAR):

2024, Reports of the Department of Applied Mathematical Analysis

IDR(s) [9] and BiCGstab(ℓ) [5] are two of the most efficient short recurrence iterative methods for solving large nonsymmetric linear systems of equations. Which of the two is best depends on the specific problem class. In this paper we... more

IDR(s) [9] and BiCGstab(ℓ) [5] are two of the most efficient short recurrence iterative methods for solving large nonsymmetric linear systems of equations. Which of the two is best depends on the specific problem class. In this paper we derive a new method, that we call IDRstab, that that combines the strengths of IDR(s) and BiCGstab(ℓ). To derive IDRstab we extend the results that we reported on in [7] where we considered Bi-CGSTAB as an IDR method. We will analyse the structure of IDR in detail and introduce the new concept of the Sonneveld space. Through numerical experiments we will show that IDRstab can outperform both IDR(s) and BiCGstab(ℓ).

2024, SIAM Journal on Scientific Computing

Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

2024, SIAM Journal on Scientific Computing

Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

2024, ArXiv

The goal of this paper is to introduce a new method in computer-aided geometry of solid modeling. We put forth a novel algebraic technique to evaluate any variadic expression between polyhedral d-solids (d = 2, 3) with regularized... more

The goal of this paper is to introduce a new method in computer-aided geometry of solid modeling. We put forth a novel algebraic technique to evaluate any variadic expression between polyhedral d-solids (d = 2, 3) with regularized operators of union, intersection, and difference, i.e., any CSG tree. The result is obtained in three steps: first, by computing an independent set of generators for the d-space partition induced by the input; then, by reducing the solid expression to an equivalent logical formula between Boolean terms made by zeros and ones; and, finally, by evaluating this expression using bitwise operators. This method is implemented in Julia using sparse arrays. The computational evaluation of every possible solid expression, usually denoted as CSG (Constructive Solid Geometry), is reduced to an equivalent logical expression of a finite set algebra over the cells of a space partition, and solved by native bitwise operators.

2024, IEEE Transactions on Automation Science and Engineering

In this paper we show that the (co)chain complex associated with a decomposition of the computational domain, commonly called a mesh in computational science and engineering, can be represented by a block-bidiagonal matrix that we call... more

In this paper we show that the (co)chain complex associated with a decomposition of the computational domain, commonly called a mesh in computational science and engineering, can be represented by a block-bidiagonal matrix that we call the Hasse matrix. Moreover, we show that topology-preserving mesh refinements, produced by the action of (the simplest) Euler operators, can be reduced to multilinear transformations of the Hasse matrix representing the complex. Our main result is a new representation of the (co)chain complex underlying field computations, a representation that provides new insights into the transformations induced by local mesh refinements. Our approach is based on first principles and is general in that it applies to most representational domains that can be characterized as cell complexes, without any restrictions on their type, dimension, codimension, orientability, manifoldness, connectedness.

2024, WCSMO4, Fourth World …

1. Abstract In the classical approach for Engineering Design Optimization, a Mathematical Program that depends on the design variables is solved. That is, the objective function and the constraints depend exclusively of the design... more

1. Abstract In the classical approach for Engineering Design Optimization, a Mathematical Program that depends on the design variables is solved. That is, the objective function and the constraints depend exclusively of the design variables. Thus, the state equation that ...

2024, HAL (Le Centre pour la Communication Scientifique Directe)