Residue Number System (RNS) Research Papers (original) (raw)

We consider software-hardware implementation of a digital filter (FIR-filter) using the residue number system and scaling of output samples of each filter channel. It is shown that scaling leads to a significant reduction in instrumental... more

We consider software-hardware implementation of a digital filter (FIR-filter) using the residue number system and scaling of output samples of each filter channel. It is shown that scaling leads to a significant reduction in instrumental and temporal costs.

The paper discusses an algorithm for correcting code errors of the residue number system using a positional characteristic. Application of parallel-pipeline computations of this positional characteristic allows us to reduce hardware costs... more

The paper discusses an algorithm for correcting code errors of the residue number system using a positional characteristic. Application of parallel-pipeline computations of this positional characteristic allows us to reduce hardware costs by 7.2% for processing 2-byte data represented in a residue number system code. The main properties of codes allow us to provide the required fault-tolerance for multi-rate DSP devices. The results provided in the paper can be applied to hydroacoustic monitoring tasks.

The cache ping-pong problem arises often in parallel processing systems where each processor has its own local cache and employs a copy-back protocol for the cache coherence. To solve the problem of large amounts of data moving back and... more

The cache ping-pong problem arises often in parallel processing systems where each processor has its own local cache and employs a copy-back protocol for the cache coherence. To solve the problem of large amounts of data moving back and forth between the caches in different processors, techniques associated with parallel compilers need to be developed. Based on the concept in [Fang, J. Z., Proc. International Conference on Parallel Processing, Aug.
1990, pp. 11-271-R-275] regarding the relations between array element accesses and enclosed loop indices in nested parallel loops, we present an algorithm in this paper to reduce the unnecessary data movement between the caches for parallel loops with multiple array subscript expressions. By analyzing the array subscript
expressions in the nested parallel loop constructs, the compilers can use the algorithm to prepare information at compile time and let the processor execute the corresponding iterations of parallel loops in terms of the data in its cache. It benefits the parallel programs in which parallel loops are enclosed by a sequential loop and have multiple different subscript expressions for the same array, whose elements are repeatedly used in the different
iterations of the outermost sequential loop.

Data size minimization is the focus of data compression procedures by altering representation and reducing redundancy of information to a more effective kind. In general, lossless approach is favoured by a number of compression methods... more

Data size minimization is the focus of data compression procedures by altering representation and reducing redundancy of information to a more effective kind. In general, lossless approach is favoured by a number of compression methods for the purpose of maintaining the content integrity of the file afterwards. The benefits of compression include saving storage space, speed up of data transmission and high quality of data. This paper observes the effectiveness of Chinese Remainder Theorem (CRT) enhancement in the implementation of Lempel-Ziv-Welch (LZW) and Huffman coding algorithms for the purpose of compressing large size images. Ten images of Yale database was used for testing. The outcomes revealed that CRT-LZW compression saved more space and speedy compression (or redundancy removal) of original images to CRT-Huffman coding by 29.78% to 14.00% respectively. In terms of compression time, CRT-LZW approach outperformed CRT-Huffman approach by 9.95 sec. to 19.15 sec. For compression ratio, CRT-LZW also outperformed CRT-Huffman coding by 0.39 db to 4.38 db, which is connected to low quality and Original Research Article Ibrahim and Gbolagade; AJRCOS, 3(3): 1-9, 2019; Article no.AJRCOS.49732 2 imperceptibility of the former. Similarly, CRT-Huffman coding (28.13db) offered better quality Peak-Signal-to-Noise-Ratio (PSNR) for the reconstructed images when compared to CRT-LZW (3.54db) and (25.59db) obtained in other investigated paper.

In this paper, we present an algorithm for Residue Number System (RNS) implementation of RSA cryptography based on an existing RNS division algorithm. The proposed algorithm and that of the state of the art were written in C++ programming... more

In this paper, we present an algorithm for Residue Number System (RNS) implementation of RSA cryptography based on an existing RNS division algorithm. The proposed algorithm and that of the state of the art were written in C++ programming language to compare their efficiency with respect to execution time. Experimental results show that our algorithm can encrypt and decrypt text without loss of inherent information and faster than the state of the art. It also offers firm resistance to Brute-force and key sensitivity attacks. Considering the moduli-set {2, 3, 5} experimental results shows that, our proposed algorithm is 7.29% and 15.51%, faster than the state of the art algorithm for integer and non-integer quotients respectively. Also, for the moduli-set {7, 9, 11}, our algorithm is as well 11.29% and 10.36% faster than that of the state of the art algorithm for integer and non-integer quotient respectively. We carried out an error analysis of the experimental results at 95 degrees significance level.

We investigate whether arithmetic operations based on Residue Number Systems (RNS) are cost-effective solutions to implement DSP applications into reconfigurable hardware. We simulated several RNS addition and multiplication... more

We investigate whether arithmetic operations based on Residue Number Systems (RNS) are cost-effective solutions to implement DSP applications into reconfigurable hardware. We simulated several RNS addition and multiplication implementations by varying the RNS parameters. For RNS addition, our results show that it can be implemented into a 3-stage 80.6-92.5 MHz pipeline using about 22 to 33 FPGAs' logic cells. For RNS multiplication, the attainable speed range was between 78.1 and 87.7 MHz, for operand lengths varying between 5 and 8 bits. Overall, a hybrid solution that combines logical elements and blocks of RAM is the best option, producing better average performance across the whole range of operand l e n g t h s .

Implementation of RNS addition and RNS multiplication into FPGAs (Extended Abstract) Luiz Maltar CB, Felipe MG França, Vladmir C. Alves and Cláudio L. Amorim COPPE - Universidade Federal do Rio de Janeiro Caixa Postal 68511, Postal Code... more

Implementation of RNS addition and RNS multiplication into FPGAs (Extended Abstract) Luiz Maltar CB, Felipe MG França, Vladmir C. Alves and Cláudio L. Amorim COPPE - Universidade Federal do Rio de Janeiro Caixa Postal 68511, Postal Code 21945-970, Rio de Janeiro ...

A new compact and highly regular redundant binary multiplier employing SD number representation has been developed. The n-bit multiplication time required by the multiplier is proportional to logzn. A modified redundant-to-binary... more

A new compact and highly regular redundant binary multiplier employing SD number representation has been developed. The n-bit multiplication time required by the multiplier is proportional to logzn. A modified redundant-to-binary converter has also been proposed, which results in a reduction of the delay time and size of the multiplier. The multiplier adopts a hybrid V-L adder tree structure. The modified Booth's algorithm is employed to reduce the number of partial products to half. For an 8 x 8 multiplier employing the modified converter, the chip size is 2096 x 3 2 5 6 p d. The multiplication time is measured as 48.9ns using a 2pm design rule. We estimated the multiplication time of a 16 x 16 multiplier to be 70ns using a 2pm design rule, and about 35ns using a l p m design rule and double layer metal wiring. The estimation of the delay time of a 32 x 32 multiplier is about 7011s using a l p m design rule and double layer metal wiring.

Residue Number System is generally supposed to use co-prime moduli set. Non-coprime moduli sets are a field in RNS which is little studied. That's why this work was devoted to them. The resources that discuss non-coprime in RNS are very... more

Residue Number System is generally supposed to use co-prime moduli set. Non-coprime moduli sets are a field in RNS which is little studied. That's why this work was devoted to them. The resources that discuss non-coprime in RNS are very limited. For the previous reasons, this paper analyses the RNS conversion using suggested non-coprime moduli set. This paper suggests a new non-coprime moduli set and investigates its performance. The suggested new moduli set has the general representation as {2 n-2, 2 n , 2 n +2}, where n ∈ {2,3,…..,∞}. The calculations among the moduli are done with this n value. These moduli are 2 spaces apart on the numbers line from each other. This range helps in the algorithm's calculations as to be shown. The proposed non-coprime moduli set is investigated. Conversion algorithm from Binary to Residue is developed. Correctness of the algorithm was obtained through simulation program. Conversion algorithm is implemented.

Ever since Adleman [1] solved the Hamilton Path problem using a combinatorial molecular method, many other hard computational problems have been investigated with the proposed DNA computer [3] [25] [9] [12] [19] [22] [24] [27] [29] [30].... more

Ever since Adleman [1] solved the Hamilton Path problem using a combinatorial molecular method, many other hard computational problems have been investigated with the proposed DNA computer [3] [25] [9] [12] [19] [22] [24] [27] [29] [30]. However, these computation methods all work toward one destination through a couple of steps based on the initial conditions. If there is a single change on these given conditions, all the procedures need to be gone through no matter how complicate these procedures are and how simple the change is. The new method we are proposing here in the paper will take care of this problem. Only a few extra steps are necessary to take when the initial condition has been changed. This will provide a lot of savings in terms of time and cost.

In this paper, two reverse converters for the four-moduli set {2 n , 2 n-1, 2 n +1, 2 n-1-1} are described. One of these is based on Mixed Radix Conversion (MRC). Another converter is based on two-stage MRC in which two pairs of moduli... more

In this paper, two reverse converters for the four-moduli set {2 n , 2 n-1, 2 n +1, 2 n-1-1} are described. One of these is based on Mixed Radix Conversion (MRC). Another converter is based on two-stage MRC in which two pairs of moduli are considered and intermediate results are obtained using MRC. A second stage uses MRC to obtain the final decoded number from these intermediate results. Both the converters are compared with previously reported converter for this moduli set regarding hardware resources and conversion time. Synthesis results on FPGA and ASIC are also presented.

In this paper area-power efficient modulo 2+1 multiplier is proposed. The result and one operand for the new modulo multipliers use weighted representation, while the other uses the diminished1. By using the radix-4 Booth recoding, the... more

In this paper area-power efficient modulo 2+1 multiplier is proposed. The result and one operand for the new modulo multipliers use weighted representation, while the other uses the diminished1. By using the radix-4 Booth recoding, the new multipliers reduce the number of the partial products to n/2 for even and (n+1)/2 for odd except for one correction term. According to our algorithm, the resulting partial products are added through inverted end around carry save adder into two operands, which are finally adder by a 2-stage n-bit adder containing 2:1 multiplexer. By using the purposed adder, the new multipliers reduce the area and power. The analytical and experimental result indicates that the new modulo 2 n +1 multipliers, offer reduced power and more compact area among all the existing structures.

In this paper, Mixed Radix Conversion (MRC)-based Residue Number System (RNS)-to-binary converters for the three-moduli set {2m-1, 2m, 2m ? 1} are presented. The proposed reverse converters are evaluated and compared to reverse converters... more

In this paper, Mixed Radix Conversion (MRC)-based Residue Number System (RNS)-to-binary converters for the three-moduli set {2m-1, 2m, 2m ? 1} are presented. The proposed reverse converters are evaluated and compared to reverse converters proposed earlier in literature using Chinese Remainder Theorem (CRT) and New CRT for this moduli set as well as two four-moduli sets {2 n-1, 2 n , 2 n ? 1, 2 n?1-1} and {2 n-1, 2 n , 2 n ? 1, 2 n?1 ? 1} regarding hardware requirement and conversion time.

In this paper, a novel architecture of RNS based 1D Lifting Integer Wavelet Transform (IWT) has been introduced. Advantage of Residue Number System (RNS) based Lifting Scheme over RNS based Filter Bank and non-binary IWT has been... more

In this paper, a novel architecture of RNS based 1D Lifting Integer Wavelet Transform (IWT) has been introduced. Advantage of Residue Number System (RNS) based Lifting Scheme over RNS based Filter Bank and non-binary IWT has been discussed. The performance of traditional predicts and updates stage of binary Lifting Scheme (LS) for Discrete Wavelet Transform (DWT) generates huge carry propagation delay, power and complexity. As a result non binary number system is becoming popular in the field of Digital Signal Processing (DSP) due to its efficient performance. In this paper also a new fixed number ROM based RNS division circuit has been proposed. The proposed architecture has been validated on Xilinx Vertex5 FPGA platform and the corresponding result and reports are shown in here.

Floating-point summation is one of the most important operations in scientific/numerical computing applications and also a basic subroutine (SUM) in BLAS (Basic Linear Algebra Subprograms) library. However, standard floating-point... more

Floating-point summation is one of the most important operations in scientific/numerical computing applications and also a basic subroutine (SUM) in BLAS (Basic Linear Algebra Subprograms) library. However, standard floating-point arithmetic based summation algorithms may not always result in accurate solutions because of possible catastrophic cancellations. To make the situation worse, the sequence of consecutive additions will affect the final result, which makes it impossible to produce a unique solution for the same input dataset on different computer platforms with different software compilers. The emergence of high-density reconfigurable hardware devices gives us an option to customize high-performance arithmetic units for our specific computing problems. In this paper, we design an FPGA-based hardware algorithm for accurate floating-point summations using group alignment technique. The corresponding full-pipelined summation unit is proven to provide similar or even better numerical errors than standard floating-point arithmetic. Moreover, it consumes much less RC resources as well as pipelining stages than existent designs, but achieves the optimal working speed at one summation per clock cycle with only moderate start-up latency. This new technique can also be used to accelerate executions of other linear algebra subroutines on FPGAs and result in much more efficient and compact implementations without negative impact on computational performance or numerical accuracy.

We present in this paper a novel general algorithm for signed number division in Residue Number Systems (RNS). A parity checking technique is used to accomplish the sign and overflow detection in this algorithm. Compared with conventional... more

We present in this paper a novel general algorithm for signed number division in Residue Number Systems (RNS). A parity checking technique is used to accomplish the sign and overflow detection in this algorithm. Compared with conventional methods of sign and overflow detection, the parity checking method is more efficient and practical. Sign magnitude arithmetic division is implemented using binary search. There is no restriction to the dividend and the divisor (except zero divisor), and no quotient estimation is necessary before the division is executed. Only simple operations are needed to accomplish the RNS division. All these characteristics have made our algorithm simple, efficient, and practical to be implemented on a real RNS divider.

In this paper, mixed radix conversion (MRC)-based residue number system (RNS)-to-binary converters for two new three-moduli sets {2[Formula: see text], 2[Formula: see text], 2[Formula: see text]} and {2[Formula: see text], 2[Formula: see... more

In this paper, mixed radix conversion (MRC)-based residue number system (RNS)-to-binary converters for two new three-moduli sets {2[Formula: see text], 2[Formula: see text], 2[Formula: see text]} and {2[Formula: see text], 2[Formula: see text], 2[Formula: see text]} which are derived from the moduli set {2[Formula: see text], 2[Formula: see text], 2[Formula: see text]} are presented. These have the advantage of having one modulus of the form 2[Formula: see text] or 2[Formula: see text] simplifying computations in one residue channel. The proposed reverse converters are evaluated and compared with state-of-the-art reverse converters proposed in literature for other three-moduli sets regarding hardware requirement and conversion time.

In this paper, two reverse converters for the four-moduli set 2n,2n−1,2n+1,2n−1−1\{2^{n},\ 2^{n}-1,\ 2^{n}+1,\ 2^{n-1}-1\}2n,2n1,2n+1,2n11 are described. One of these is based on Mixed Radix Conversion (MRC). Another converter is based on two-stage MRC in which two pairs... more

In this paper, two reverse converters for the four-moduli set 2n,2n−1,2n+1,2n−1−1\{2^{n},\ 2^{n}-1,\ 2^{n}+1,\ 2^{n-1}-1\}2n,2n1,2n+1,2n11 are described. One of these is based on Mixed Radix Conversion (MRC). Another converter is based on two-stage MRC in which two pairs of moduli are considered and intermediate results are obtained using MRC. A second stage uses MRC to obtain the final decoded number from these intermediate results. Both the converters are compared with previously reported converter for this moduli set regarding hardware resources and conversion time. Synthesis results on FPGA and ASIC are also presented.

Ever since Adleman solved the Hamilton Path problem using a combinatorial molecular method, many other hard computational problems have been investigated with the proposed DNA computing. However, simple computer operations, such as the... more

Ever since Adleman solved the Hamilton Path problem using a combinatorial molecular method, many other hard computational problems have been investigated with the proposed DNA computing. However, simple computer operations, such as the Boolean and basic arithmetic operations, have not been addressed much. We propose a new method for these operations based on insertion, deletion and substitution. The great aspect of this method is that the total number of biological manipulation steps needed is linear and the output result can be easily used as the input for further operations.