Fast computing of discrete cosine and sine transforms of types VI and VII (original) (raw)

Discrete cosine and sine transforms—regular algorithms and pipeline architectures

Signal Processing, 2006

In this paper, regular fast algorithms for discrete cosine transform (DCT) and discrete sine transform (DST) of types II-IV are proposed and mapped onto pipeline architectures. The algorithms are based on the factorization of transform matrices described earlier by Wang. The regular structures of the algorithms are advantageous when mapping them onto hardware although such algorithms do not reach the theoretical lower bound on multiplicative complexity. Instead, the algorithms lend themselves for vertical mapping resulting in area-efficient pipeline structures. A unified pipeline architecture supporting both the DCT-II and its inverse is implemented with data path synthesis for proving the feasibility and estimating the performance. The latency of an ASIC implementation is 94 cycles while operating at 250 MHz frequency. r

Low-complexity approximation of 8-point discrete cosine transform for image compression

Journal of Applied Computer Science, 2012

In this paper the authors propose a new low-complexity approximation of 8-point discrete cosine transform (DCT) that requires 18 additions and two bit-shift operations. It is shown that the proposed transform outperforms significantly the known transform of the same computational complexity when applied to a JPEG compression stream in practical cases of encoding and decoding of still images. As such, the proposed transform can be effectively used in any practical applications where significant limitations exist regarding the computational capabilities coding and / or decoding devices, i.e. mobile devices or industrial imaging devices.

Design of fast transforms for high-resolution image and video coding

Applications of Digital Image Processing XXXII, 2009

We study factorization techniques and performance of Discrete Cosine Transforms of various sizes (including nondyadic and odd numbers). In our construction we utilize an array of known techniques (such as Heideman's mapping between odd-sized DCT and DFT, Winograd fast DFT algorithms, prime-factoring, etc), and also propose a new decimation strategy for construction of even-sized scaled transforms. We then analyze complexity and coding gain of such transforms with sizes 2-64 and identify ones that show best complexity/performance tradeoffs.

FPGA implementation of Integer DCT for HEVC

2016

In this paper, area-efficient architectures for the implementation of integer discrete cosine transform (DCT) of different lengths to be used in High Efficiency Video Coding (HEVC) are proposed. An efficient constant matrix multiplication scheme can be used to derive parallel architectures for 1-D integer DCT of different lengths such as 4, 8, 16, and 32. Also power-efficient structures for folded and full-parallel implementations of 2-D DCT is implemented with proposed architecture. The proposed architecture with 32-point length is 29.2% and 9.2% area efficient, also results in 13.1% and 2.8% less Area-Delay product respectively when compared to basic and existing models. Also pruning is applied to proposed architecture to improve the performance which results in 50.78% decrease in area Delay product for 32-point integer DCT.

DCT-like Transform for Image and Video Compression Requires 10 Additions Only

A multiplierless pruned approximate 8-point discrete cosine transform (DCT) requiring only 10 additions is introduced. The proposed algorithm was assessed in image and video compression, showing competitive performance with state-of-the-art methods. Digital implementation in 45 nm CMOS technology up to place-and-route level indicates clock speed of 255 MHz at a 1.1V supply. The 8Ă—8 block rate is 31.875 MHz. The DCT approximation was embedded into HEVC reference software; resulting video frames, at up to 327 Hz for 8-bit RGB HEVC, presented negligible image degradation.

Small-Size Algorithms for Type-IV Discrete Cosine Transform with Reduced Multiplicative Complexity

Radioelectronics and Communications Systems, 2020

Discrete cosine transforms are widely used in smart radioelectronic systems for processing and analysis of incoming information. The popularity of using these transform is explained by the presence of fast algorithms that minimize the computational and hardware complexity of their implementation. Type-IV discrete cosine transform occupies a special place in the list of the specified transformations. This article proposes several algorithmic solutions for implementing the type-IV discrete cosine transform. The effectiveness of the proposed solutions is explained by the possibility of factorization of the DCT-IV matrix, which leads to a decrease in computational and implementation complexity. A set of completely parallel type-IV DCT algorithms for small lengths of signal sequences (N

Design of fast transforms for high-resolution image and video coding

2009

We study factorization techniques and performance of Discrete Cosine Transforms of various sizes (including nondyadic and odd numbers). In our construction we utilize an array of known techniques (such as Heideman's mapping between odd-sized DCT and DFT, Winograd fast DFT algorithms, prime-factoring, etc), and also propose a new decimation strategy for construction of even-sized scaled transforms. We then analyze complexity and coding gain of such transforms with sizes 2-64 and identify ones that show best complexity/performance tradeoffs.

Practical fast 1-D DCT algorithms with 11 multiplications

International Conference on Acoustics, Speech, and Signal Processing,

A new class of practical fast algorithms is introduced for the Discrete Cosine 'nunsfom (DCT), a n important transform that is of particular interest i n image compression. For a n 8-point DCT only 11 multiplications and 29 additions are required. A systematic approach is presented to generate t h e different members in this class all having the same mini m u m arithmetic complexity. T h e structure of many of t h e published algorithms can be found in members of this class. An extension of t h e algorithm for longer transformations is presented. As a result, the 16-point DCT requires only SI multiplications a n d 81 additions, which is, to our knowledge, less t h a n t h e currently published algorithms.

A refined fast 2-D discrete cosine transform algorithm

IEEE Transactions on Signal Processing, 1999

In this correspondence, an index permutation-based fast twodimensional discrete cosine transform (2-D DCT) algorithm is presented. It is shown that the N 2 N N 2 N N 2 N 2-D DCT, where N = 2 m N = 2 m N = 2 m , can be computed using only N N N 1-D DCT's and some post additions.