Marwan Jaber | Université du Québec à Trois-Rivières (original) (raw)
Papers by Marwan Jaber
10th IEEE International NEWCAS Conference, 2012
ABSTRACT The Discrete Fourier Transform (DFT) is a mathematical procedure at the core of processi... more ABSTRACT The Discrete Fourier Transform (DFT) is a mathematical procedure at the core of processing inside a Digital Signal Processor. Speed and low complexity are crucial in the FFT process; they can be achieved by avoiding trivial multiplications through a proper handling of the input/output data and the twiddle factors. Accordingly, this paper presents an innovative approach for handling the input/output data efficiently by avoiding trivial multiplications. This approach consists of a simple mapping of the three indices (FFT stage, butterfly and element) to the addresses of the input/output data with their corresponding coefficient multiplier. A self-sorting algorithm that reduces the amount of memory accesses to the coefficient multipliers' memory can also reduce the computational load by avoiding all trivial multiplications. Compared with the most-recent work [5], performance evaluation in terms of the number of cycles on the general-purpose TMS320C6416 DSP shows a reduction of 29% (FFT of size 4096) and a 50% memory reduction to stock twiddle factors. The algorithm has also shown a speed gain of 24% on the FFTW platform for a FFT of size 4096.
VLSI design (Print), 2014
This paper describes an embedded FFT processor where the higher radices butterflies maintain one ... more This paper describes an embedded FFT processor where the higher radices butterflies maintain one complex multiplier in its critical path. Based on the concept of a radix-r fast Fourier factorization and based on the FFT parallel processing, we introduce a new concept of a radix-r Fast Fourier Transform in which the concept of the radix-r butterfly computation has been formulated as the combination of radix-2 /4 butterflies implemented in parallel. By doing so, the VLSI butterfly implementation for higher radices would be feasible since it maintains approximately the same complexity of the radix-2/4 butterfly which is obtained by block building of the radix-2/4 modules. The block building process is achieved by duplicating the block circuit diagram of the radix-2/4 module that is materialized by means of a feedback network which will reuse the block circuit diagram of the radix-2/4 module.
2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2019
Over the last few years, the application of Digital Signal Processing (DSP) techniques for genomi... more Over the last few years, the application of Digital Signal Processing (DSP) techniques for genomic sequence analysis has received great interest. Indeed, among its applications in genomic analysis, it has been demonstrated that DSP can be used to detect protein coding regions (exons) among non-coding regions in a DNA sequence. The period-3 behavior exhibited by exons is one of its features that has been exploited in several developed algorithms for exon prediction. Identification of this periodicity in genomic sequences can be done by using different methods such as the well-known Fast Fourier Transform (FFT) and the Goertzel algorithm for complexity reduction in which the reduction of computational time is a great challenge in genomic analysis. Therefore, this paper presents a novel one frequency analysis by using half of the arithmetic complexity of the Goertzel algorithm for gene prediction. Compared to the Intel®’s FFT (MKL) optimized function, the Goertzel’s (IPP) and the dedic...
IEEE Transactions on Signal Processing, 2021
The Discrete Fourier Transform (DFT) is a mathematical procedure that stands at the center of the... more The Discrete Fourier Transform (DFT) is a mathematical procedure that stands at the center of the processing inside a digital signal processor. It has been widely known and argued in relevant literature that the Fast Fourier Transform (FFT) is useless in detecting specific frequencies in a monitored signal of length N because most of the computed results are ignored. In this paper, we present an efficient FFT-based method to detect specific frequencies in a monitored signal, which will then be compared to the most frequently used method which is the recursive Goertzel algorithm that detects and analyses one selectable frequency component from a discrete signal. The proposed JM-Filter algorithm presents a reduction of iterations compared to the first and second order Goertzel algorithm by a factor of r, where r represents the radix of the JM-Filter. The obtained results are significant in terms of computational reduction and accuracy in fixed-point implementation. Gains of 15 dB and ...
Solving Complex Problem that is coupled with intensive workloads; necessities the access to a mas... more Solving Complex Problem that is coupled with intensive workloads; necessities the access to a massively parallel computational power. Up to date, Graphic Processing Units (GPUs) are the only architecture that could handle the most complex computationally intensive workloads. In the light of this rapid-growing advancement in computational technologies, this paper will propose a high-performance parallel radix-23 FFT suitable for such GPU and CPU systems. The proposed algorithm could reduce the computational complexity by a factor that tends to reach pr if implemented in parallel (pr is the number of cores/threads) plus the combination phase to complete the required FFT.
Fast Fourier transform (FFT) is one of the fundamental processing block used in many signal proce... more Fast Fourier transform (FFT) is one of the fundamental processing block used in many signal processing applications (i.e. for orthogonal frequency division multiplexing in wireless telecommunication). Therefore, every proposal to reduced latency, resources or accuracy errors of FFT implementation counts. This paper proposes the implementation of the butterfly processing elements (BPE) where the concept of the radix-r butterfly computation has been formulated as the combination of α radix-2 butterflies implemented in parallel. An efficient FFT implementation is feasible using our proposed multiplexed and pipelined BPE. Compared to a state-of-the-art reference based on pipelined and parallel structure FFTs, and FPGA based implementation reveals that the maximum throughput is improved by a factor of 1.3 for a 256-point FFT and reach a throughput of 2680 MSps on Virtex-7. The analysis extends to touch on key performance measurements metrics such as throughput, latency and resource utili...
Proceedings of 2010 Ieee International Symposium on Circuits and Systems, 2010
The Fast Fourier Transform (FFT) is a key role in signal processing applications that is useful f... more The Fast Fourier Transform (FFT) is a key role in signal processing applications that is useful for the frequency domain analysis of signals. The FFT computation requires an indexing scheme at each stage to address input/output data and coefficient multipliers properly. Most of these indexing schemes are based on bit-reversal techniques that are boosted by a look-up table requiring extra
Acoustics Speech and Signal Processing 1988 Icassp 88 1988 International Conference on, 2008
The FFT process is an operation that could be performed through different stages. In each stage, ... more The FFT process is an operation that could be performed through different stages. In each stage, the butterfly operation is computed in which the accessed data is multiplied by certain Walpha, added or subtracted and finally it is stored or held for further processing. This process is repeated to each stage until the final stage where the processed data is
10th IEEE International NEWCAS Conference, 2012
ABSTRACT The Discrete Fourier Transform (DFT) is a mathematical procedure at the core of processi... more ABSTRACT The Discrete Fourier Transform (DFT) is a mathematical procedure at the core of processing inside a Digital Signal Processor. Speed and low complexity are crucial in the FFT process; they can be achieved by avoiding trivial multiplications through a proper handling of the input/output data and the twiddle factors. Accordingly, this paper presents an innovative approach for handling the input/output data efficiently by avoiding trivial multiplications. This approach consists of a simple mapping of the three indices (FFT stage, butterfly and element) to the addresses of the input/output data with their corresponding coefficient multiplier. A self-sorting algorithm that reduces the amount of memory accesses to the coefficient multipliers' memory can also reduce the computational load by avoiding all trivial multiplications. Compared with the most-recent work [5], performance evaluation in terms of the number of cycles on the general-purpose TMS320C6416 DSP shows a reduction of 29% (FFT of size 4096) and a 50% memory reduction to stock twiddle factors. The algorithm has also shown a speed gain of 24% on the FFTW platform for a FFT of size 4096.
VLSI design (Print), 2014
This paper describes an embedded FFT processor where the higher radices butterflies maintain one ... more This paper describes an embedded FFT processor where the higher radices butterflies maintain one complex multiplier in its critical path. Based on the concept of a radix-r fast Fourier factorization and based on the FFT parallel processing, we introduce a new concept of a radix-r Fast Fourier Transform in which the concept of the radix-r butterfly computation has been formulated as the combination of radix-2 /4 butterflies implemented in parallel. By doing so, the VLSI butterfly implementation for higher radices would be feasible since it maintains approximately the same complexity of the radix-2/4 butterfly which is obtained by block building of the radix-2/4 modules. The block building process is achieved by duplicating the block circuit diagram of the radix-2/4 module that is materialized by means of a feedback network which will reuse the block circuit diagram of the radix-2/4 module.
2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2019
Over the last few years, the application of Digital Signal Processing (DSP) techniques for genomi... more Over the last few years, the application of Digital Signal Processing (DSP) techniques for genomic sequence analysis has received great interest. Indeed, among its applications in genomic analysis, it has been demonstrated that DSP can be used to detect protein coding regions (exons) among non-coding regions in a DNA sequence. The period-3 behavior exhibited by exons is one of its features that has been exploited in several developed algorithms for exon prediction. Identification of this periodicity in genomic sequences can be done by using different methods such as the well-known Fast Fourier Transform (FFT) and the Goertzel algorithm for complexity reduction in which the reduction of computational time is a great challenge in genomic analysis. Therefore, this paper presents a novel one frequency analysis by using half of the arithmetic complexity of the Goertzel algorithm for gene prediction. Compared to the Intel®’s FFT (MKL) optimized function, the Goertzel’s (IPP) and the dedic...
IEEE Transactions on Signal Processing, 2021
The Discrete Fourier Transform (DFT) is a mathematical procedure that stands at the center of the... more The Discrete Fourier Transform (DFT) is a mathematical procedure that stands at the center of the processing inside a digital signal processor. It has been widely known and argued in relevant literature that the Fast Fourier Transform (FFT) is useless in detecting specific frequencies in a monitored signal of length N because most of the computed results are ignored. In this paper, we present an efficient FFT-based method to detect specific frequencies in a monitored signal, which will then be compared to the most frequently used method which is the recursive Goertzel algorithm that detects and analyses one selectable frequency component from a discrete signal. The proposed JM-Filter algorithm presents a reduction of iterations compared to the first and second order Goertzel algorithm by a factor of r, where r represents the radix of the JM-Filter. The obtained results are significant in terms of computational reduction and accuracy in fixed-point implementation. Gains of 15 dB and ...
Solving Complex Problem that is coupled with intensive workloads; necessities the access to a mas... more Solving Complex Problem that is coupled with intensive workloads; necessities the access to a massively parallel computational power. Up to date, Graphic Processing Units (GPUs) are the only architecture that could handle the most complex computationally intensive workloads. In the light of this rapid-growing advancement in computational technologies, this paper will propose a high-performance parallel radix-23 FFT suitable for such GPU and CPU systems. The proposed algorithm could reduce the computational complexity by a factor that tends to reach pr if implemented in parallel (pr is the number of cores/threads) plus the combination phase to complete the required FFT.
Fast Fourier transform (FFT) is one of the fundamental processing block used in many signal proce... more Fast Fourier transform (FFT) is one of the fundamental processing block used in many signal processing applications (i.e. for orthogonal frequency division multiplexing in wireless telecommunication). Therefore, every proposal to reduced latency, resources or accuracy errors of FFT implementation counts. This paper proposes the implementation of the butterfly processing elements (BPE) where the concept of the radix-r butterfly computation has been formulated as the combination of α radix-2 butterflies implemented in parallel. An efficient FFT implementation is feasible using our proposed multiplexed and pipelined BPE. Compared to a state-of-the-art reference based on pipelined and parallel structure FFTs, and FPGA based implementation reveals that the maximum throughput is improved by a factor of 1.3 for a 256-point FFT and reach a throughput of 2680 MSps on Virtex-7. The analysis extends to touch on key performance measurements metrics such as throughput, latency and resource utili...
Proceedings of 2010 Ieee International Symposium on Circuits and Systems, 2010
The Fast Fourier Transform (FFT) is a key role in signal processing applications that is useful f... more The Fast Fourier Transform (FFT) is a key role in signal processing applications that is useful for the frequency domain analysis of signals. The FFT computation requires an indexing scheme at each stage to address input/output data and coefficient multipliers properly. Most of these indexing schemes are based on bit-reversal techniques that are boosted by a look-up table requiring extra
Acoustics Speech and Signal Processing 1988 Icassp 88 1988 International Conference on, 2008
The FFT process is an operation that could be performed through different stages. In each stage, ... more The FFT process is an operation that could be performed through different stages. In each stage, the butterfly operation is computed in which the accessed data is multiplied by certain Walpha, added or subtracted and finally it is stored or held for further processing. This process is repeated to each stage until the final stage where the processed data is