High speed self-timed pipelined datapath for square rooting (original) (raw)
Related papers
Advances in Pure Mathematics, 2015
Novel high speed energy efficient square root architecture has been reported in this paper. In this architecture, we have blended ancient Indian Vedic mathematics and Bakhshali mathematics to achieve a significant amount of accuracy in performing the square root operation. Basically, Vedic Duplex method and iterative division method reported in Bakhshali Manuscript have been utilized for that computation. The proposed technique has been compared with the well known Newton-Raphson's (N-R) technique for square root computation. The algorithm has been implemented and tested using Modelsim simulator, and performance parameters such as the number of lookup tables, propagation delay and power consumption have been estimated using Xilinx ISE simulator. The functionality of the circuitry has been checked using Xilinx Virtex-5 FPGA board.
The square root is a basic arithmetic operation in image and signal processing. We present a novel pipelined architecture to implement N-bit fixed-point square root operation on an FPGA using a non-restoring pipelined algorithm that does not require floating-point hardware. Pipelining hazards in its hardware realization are avoided by modifying the classic non-restoring algorithm, thus resulting in a 13% improved latency. Furthermore, the proposed architecture is flexible allowing modification as per individual application needs. It is demonstrated that the proposed architecture is approximately four times faster than its popular counterparts and at the same time it consumes 50% less energy for envelope detection at 268 MHz sampling rate.
IEE Proceedings - Circuits, Devices and Systems, 2000
I'ipcliiiccl ~CIIUIPIIarray implciiiciit~itioiis ol'aritliiiictic circuits arc nsually ;idoptctl to ohtain high ~II~OLI~III~LI~ a t rc;isoiial-rlc cost. Tlic circuit dcsigii style tisccl to iuiplcincnt thc iiri-iiy greatly inllucnccs both p c r l h i i a n c c ancl cost. 'I'lic dcsigncr liiis to iiio\)c iti ii varied iiiicl complex scciiiirio, sincc iiow;id;iys scorcs 01' logic 5tylcs iiic knowii niiiong CMOS liimilics. Static logic styles iirc casy to tisc and they allow low powcr consuiiiptioii. wliilc dynamic logic styles liiivc soiiic potential a c l w i n t q y s. Thcsc circuits tend to lie liistcr iiiitl, zit least Ibr tlic implcmciitalioti 01' siniplc logic liinctions. they rcqtiirc I'cwcr transistors. Ol'tcn the clioicc 01' the circuit elcsign style is done by iiiciiiis 01' qtialitiitivc aiialysis. I<cl'crriiig to the c i-c a h i 01' ii Ipipclincel square-rooting cii-cuit, Iiotli static iiiicl dynamic implcmcntations arc clti;iiititativcly co~iip~ircd ror scvcral operatiti worcllcngtlis. Using 0.SLiiii technology p:iraiiictcrs, II pie -Iayotil comparisoii is 1 i c r l i x " iii tcrins 01' iict transistor ;iIca. iiuiiihcr 01' tninsistors, propagation tlclay 2iiid a v c~i g c ~-rowcr clissipation. Rcsiilts iiiclicatc that I)OM I N 0 logic implementation slioxs tlic hest area timc powcr t ~itlc-oll', Then a set 01' slxiilard cells liiis hccn clcsignctl to liiyotit the IIOMINO logic array. I'm-layout data sliows that a 32-hit array dcsignctl iii this w;iy and iulisccl using O.S!iiii 3.3V C:R/IOS proccss r c d i c s a iiiiixiiiitiiii tlirotighput rntc 111-r to 175MCl~. rcqnircc ii silicon iii'cii 01' 1.4 x 1.41iim' aiid dissipitcs I.SOmW/MI 17,. 'l'lic proposed IZC'Ahascd circuit rcaclics :I throtiglipul coiiiparahlc to that 01' (:L.A-hascd silti;irc-i.ootiiig arrays. iiiiplcmcntctl tising convcntioiial static C'MOS circuilry, thereby saving area iiiid power.
FPGA and ASIC Square Root Designs for High Performance and Power Efficiency
Floating-point square root is a fundamental operation in signal processing and various HPC applications. Since this is an expensive operation in resource and energy consumption, its efficient implementation should be of priority in future multicores that will face dark silicon issues. This paper presents a low-cost, low-power consumption design to calculate the square root using the IEEE754 single-precision floating-point format. Two versions of the design are investigated with and without clock gating (CG), respectively. Evaluation involves FPGA and ASIC technologies at 40 and 65 nm. Substantial performance growth and reduced power consumption are gained as compared to a popular iterative solution. The ASIC design demonstrates much lower power consumption, which at 40 nm is lower than that at 65 nm by about a threefold. At 40 nm, CG for the ASIC realization is justified primarily for low activity rates.
ETF Journal of Electrical …, 2004
Three algorithm implementations for square root computation are considered in this paper. Newton-Raphson's, iterative, and binary search algorithm implementations are completely designed, verified and compared. The algorithms, entire system-on-chip realisations and their functioning are described.
A multi-cycle fixed point square root module for FPGAs
IEICE Electronics Express, 2012
This paper presents a module that solves the square root by obtaining a number of more significant bits from a look-up table as an approximate root. A set of possible roots are then appended and squared for comparison to the original radicand, finely tuning the calculation. The module stops as soon as it finds an exact root, therefore not all entries take the same number of cycles, reducing the number of iterations required for full resolution. The proposed FPGA module overcomes a Xilinx's logiCORE IP in terms of resources utilization and in several cases latency due to its flexible structure configuration.
IEE Proceedings - Circuits, Devices and Systems, 1998
The authors describe the design, integration and characterisation of a bit-level pipelined self-timed multiplier architecture. The differential structure SODS (switched-output differential struciture) has been used for computation blocks and the PLCAR structure (protocol and latching controlled by acknowledge and request) for the interface blocks, introduced in an array-based architecture. A 4 x 4-bit multiplier has been integrated in a 1. 0~ CMOS technology and the proposed architecture has been compared with other asynchronous approaches, showing a considerable improvement, up io 50% in terms of area, speed and power consumption. Compared with a synchronous approach, the main advantage of the proposed architecture is a lower power consumption below a certain incoming input data rate, but at the expense of area and speed.
An Efficient Implementation of the Non Restoring Square Root Algorithm in Gate Level
Abstract—This paper proposes an efficient strategy to implement modified non restoring algorithm based on FPGA in gate level abstraction of VHDL, which adopt fully pipelined architecture. A new basic building block is called controlled subtract-multiplex (CSM) is introduced. The main principle of the proposed method is similar with conventional non-restoring algorithm, but it only uses subtract operation and append 01, while add operation and append 11 is not used.
New algorithms and VLSI architectures for SRT division and square root
Proceedings of IEEE 11th Symposium on Computer Arithmetic
In real time digital signal processing, high performance modules for division and square root are essential if many powerful algorithms are to be implemented. In this paper, new radix 2 algorithms for SRT division and square root are developed. For these new schemes, the result digits and the residuals are computed concurrently and the computations in adjacent rows are overlapped. Consequently, their peflonnance should exceed that of the radix 2 SRT methods. E S I array architectures to implement the new division and square root schemes are also presented.
An Optimized Square Root Algorithm for Implementation in FPGA Hardware
2010
Abstract This paper presents an optimized digit recurrence method to solve complicated square root calculation in hardware, as a proposed simple algorithm for implementation in field programmable gate array (FPGA). The main principle of proposed method is two-bit shifting and subtracting-multiplexing operations, in order to achieve a simpler implementation and faster calculation. The proposed algorithm has conducted to implement FPGA based unsigned 32-bit and 64-bit binary square root successfully.