David Barina | Brno University of Technology (original) (raw)
Papers by David Barina
Computer Science Research Notes, 2021
In recent years, a bag with image and video compression formats has been torn. However, most of t... more In recent years, a bag with image and video compression formats has been torn. However, most of them are focused on lossy compression and only marginally support the lossless mode. In this paper, I will focus on lossless formats and the critical question: "Which one is the most efficient?" It turned out that FLIF is currently the most efficient format for lossless image compression. This finding is in contrast to that FLIF developers stopped its development in favor of JPEG XL.
IEEE Conference Proceedings, 2016
Microprocessors and Microsystems, Apr 1, 2023
x3 is a lossless optimizing dictionary-based data compressor. The algorithm uses a combination of... more x3 is a lossless optimizing dictionary-based data compressor. The algorithm uses a combination of a dictionary, context modeling, and arithmetic coding. Optimization adds the ability to find the most appropriate parameters for each file. Even without optimization, x3 can compress data with a compression ratio comparable to the best dictionary compression methods like LZMA, zstd, or Brotli.
Theory of computing systems, May 15, 2020
This article presents a new multiplication algorithm based on the Collatz function. Assuming the ... more This article presents a new multiplication algorithm based on the Collatz function. Assuming the validity of the Collatz conjecture, the time complexity of multiplying two n-digit numbers is O(kn), where the k is the number of odd steps in the Collatz trajectory of the first multiplicand. Most likely, the algorithm is only of theoretical interest.
Journal of Real-time Image Processing, Jul 8, 2020
This article presents a single-loop approach to a 2-D discrete wavelet transform that allows proc... more This article presents a single-loop approach to a 2-D discrete wavelet transform that allows processing infinitely high-image strip-maps. The paper gradually compares several computational strategies to finally show how to deal with a multi-scale wavelet transform of infinite image streams. Besides, the transform is followed by a bit-plane encoder which also processes data in a single loop. The whole machinery is part of a CCSDS 122.0 image codec which manages to process a single pixel in about 33 ns on a contemporary desktop computer, without the contribution of any parallel computing or SIMD vectorization.
Journal of Real-time Image Processing, Jan 24, 2015
With the start of the widespread use of discrete wavelet transform in image processing, the need ... more With the start of the widespread use of discrete wavelet transform in image processing, the need for its efficient implementation is becoming increasingly more important. This work presents several novel SIMD-vectorized algorithms of 2-D discrete wavelet transform, using a lifting scheme. At the beginning, a stand-alone core of an already known single-loop approach is extracted. This core is further simplified by an appropriate reorganization of operations. Furthermore, the influence of the CPU cache on a 2-D processing order is examined. Finally, SIMD-vectorizations and parallelizations of the proposed approaches are evaluated. The best of the proposed algorithms scale almost linearly with the number of threads. For all of the platforms used in the tests, these algorithms are significantly faster than other known methods, as shown in the experimental sections of the paper.
The Journal of Supercomputing, Jul 1, 2020
With the wide spread of the discrete wavelet transform, the need for its efficient implementation... more With the wide spread of the discrete wavelet transform, the need for its efficient implementation becomes increasingly important. This work presents an improved version of an algorithm suitable to compute the 2-D discrete wavelet transform on GPU. Depending on the GPU platform, it is suitable to split the 2-D transform computation into separated horizontal and vertical passes. Considering the horizontal passes, we have examined and chosen the best performing method among the already known ones. Furthermore, we have adapted this method for an existing algorithm computing the vertical transform pass. This step helps to reduce several synchronizations and arithmetic operations in the utilized computation scheme. For large data, the proposed vertical method achieves speed-up about 30% compared to the current state of the art methods. In contrast to previously published works, the presented approach is built on the OpenCL parallel programming framework.
arXiv (Cornell University), Jul 2, 2018
We show an iterated function of which iterates oscillate wildly and grow at a dizzying pace. We c... more We show an iterated function of which iterates oscillate wildly and grow at a dizzying pace. We conjecture that the orbit of arbitrary positive integer always returns to 1, as in the case of Collatz function. The conjecture is supported by a heuristic argument and computational results.
arXiv (Cornell University), Jun 25, 2021
In recent years, a bag with image and video compression formats has been torn. However, most of t... more In recent years, a bag with image and video compression formats has been torn. However, most of them are focused on lossy compression and only marginally support the lossless mode. In this paper, I will focus on lossless formats and the critical question: "Which one is the most efficient?" It turned out that FLIF is currently the most efficient format for lossless image compression. This finding is in contrast to that FLIF developers stopped its development in favor of JPEG XL.
arXiv (Cornell University), Feb 10, 2016
This work shows the use of a two-dimensional Gabor wavelets in image processing. Convolution with... more This work shows the use of a two-dimensional Gabor wavelets in image processing. Convolution with such a two-dimensional wavelet can be separated into two series of one-dimensional ones. The key idea of this work is to utilize a Gabor wavelet as a multiscale partial differential operator of a given order. Gabor wavelets are used here to detect edges, corners and blobs. A performance of such an interest point detector is compared to detectors utilizing a Haar wavelet and a derivative of a Gaussian function. The proposed approach may be useful when a fast implementation of the Gabor transform is available or when the transform is already precomputed.
arXiv (Cornell University), Aug 25, 2017
The discrete wavelet transform can be found at the heart of many image-processing algorithms. Unt... more The discrete wavelet transform can be found at the heart of many image-processing algorithms. Until now, the transform on general-purpose processors (CPUs) was mostly computed using a separable lifting scheme. As the lifting scheme consists of a small number of operations, it is preferred for processing using single-core CPUs. However, considering a parallel processing using multi-core processors, this scheme is inappropriate due to a large number of steps. On such architectures, the number of steps corresponds to the number of points that represent the exchange of data. Consequently, these points often form a performance bottleneck. Our approach appropriately rearranges calculations inside the transform, and thereby reduces the number of steps. In other words, we propose a new scheme that is friendly to parallel environments. When evaluating on multi-core CPUs, we consistently overcome the original lifting scheme. The evaluation was performed on 61-core Intel Xeon Phi and 8-core Intel Xeon processors.
arXiv (Cornell University), Apr 27, 2017
The 2-D discrete wavelet transform (DWT) can be found in the heart of many image-processing algor... more The 2-D discrete wavelet transform (DWT) can be found in the heart of many image-processing algorithms. Until recently, several studies have compared the performance of such transform on various shared-memory parallel architectures, especially on graphics processing units (GPUs). All these studies, however, considered only separable calculation schemes. We show that corresponding separable parts can be merged into non-separable units, which halves the number of steps. In addition, we introduce an optional optimization approach leading to a reduction in the number of arithmetic operations. The discussed schemes were adapted on the OpenCL framework and pixel shaders, and then evaluated using GPUs of two biggest vendors. We demonstrate the performance of the proposed non-separable methods by comparison with existing separable schemes. The non-separable schemes outperform their separable counterparts on numerous setups, especially considering the pixel shaders.
We have reduced the number of lifting steps in the calculation of the two-dimensional discrete wa... more We have reduced the number of lifting steps in the calculation of the two-dimensional discrete wavelet transform by factoring the underlying lifting scheme into a new spatial form. Compared with recently proposed non-separable structure, we have reduced also the number of operations. Our scheme is primarily designed for CDF 5/3 and CDF 9/7 wavelets employed in JPEG 2000 image compression standard. In the result, our scheme requires only two steps for 2-D CDF 5/3 transform compared to four steps in the original separable form or three steps in the recent non-separable scheme.
A novel approach to 2-D single-loop wavelet lifting with compatibility to JPEG 2000 is presented ... more A novel approach to 2-D single-loop wavelet lifting with compatibility to JPEG 2000 is presented in this paper. A newly developed 2-D core of CDF 5/3 wavelet filter is presented that, using a new sequence of operations, simplify the design. Moreover, the proposed approach, that uses one pass for 2-D transform, directly produces final output and reduces significantly the need for storing intermediate results into memory. All the proposed structures can be efficiently pipelined in hardware. This paper describes the proposed approach, its implementation in FPGA, cost of such implementation, and brings an experimental evaluation as well as discussion of the features of the approach.
Discrete wavelet transform of finite-length signals must necessarily handle the signal boundaries... more Discrete wavelet transform of finite-length signals must necessarily handle the signal boundaries. The state-ofthe-art approaches treat such boundaries in a complicated and inflexible way, using special prolog or epilog phases. This holds true in particular for images decomposed into a number of scales, exemplary in JPEG 2000 coding system. In this paper, the state-of-the-art approaches are extended to perform the treatment using a compact streaming core, possibly in multiscale fashion. We present the core focused on CDF 5/3 wavelet and the symmetric border extension method, both employed in the JPEG 2000. As a result of our work, every input sample is visited only once, while the results are produced immediately, i.e. without buffering.
With the start of the widespread use of discrete wavelet transform the need for its efficient imp... more With the start of the widespread use of discrete wavelet transform the need for its efficient implementation is becoming increasingly more important. This work presents a general approach of discrete wavelet transform scheme vectorisation evaluated on an FPGAbased Application-Specific Vector Processor (ASVP). This unit can be classified as SIMD computer in Flynn's taxonomy. The presented approach is compared with two other non-vectorised approaches. Using the frequently exploited CDF 9/7 wavelet, the achieved speedup is about 2.6× compared to naive implementation.
Computer Science Research Notes, 2021
In recent years, a bag with image and video compression formats has been torn. However, most of t... more In recent years, a bag with image and video compression formats has been torn. However, most of them are focused on lossy compression and only marginally support the lossless mode. In this paper, I will focus on lossless formats and the critical question: "Which one is the most efficient?" It turned out that FLIF is currently the most efficient format for lossless image compression. This finding is in contrast to that FLIF developers stopped its development in favor of JPEG XL.
IEEE Conference Proceedings, 2016
Microprocessors and Microsystems, Apr 1, 2023
x3 is a lossless optimizing dictionary-based data compressor. The algorithm uses a combination of... more x3 is a lossless optimizing dictionary-based data compressor. The algorithm uses a combination of a dictionary, context modeling, and arithmetic coding. Optimization adds the ability to find the most appropriate parameters for each file. Even without optimization, x3 can compress data with a compression ratio comparable to the best dictionary compression methods like LZMA, zstd, or Brotli.
Theory of computing systems, May 15, 2020
This article presents a new multiplication algorithm based on the Collatz function. Assuming the ... more This article presents a new multiplication algorithm based on the Collatz function. Assuming the validity of the Collatz conjecture, the time complexity of multiplying two n-digit numbers is O(kn), where the k is the number of odd steps in the Collatz trajectory of the first multiplicand. Most likely, the algorithm is only of theoretical interest.
Journal of Real-time Image Processing, Jul 8, 2020
This article presents a single-loop approach to a 2-D discrete wavelet transform that allows proc... more This article presents a single-loop approach to a 2-D discrete wavelet transform that allows processing infinitely high-image strip-maps. The paper gradually compares several computational strategies to finally show how to deal with a multi-scale wavelet transform of infinite image streams. Besides, the transform is followed by a bit-plane encoder which also processes data in a single loop. The whole machinery is part of a CCSDS 122.0 image codec which manages to process a single pixel in about 33 ns on a contemporary desktop computer, without the contribution of any parallel computing or SIMD vectorization.
Journal of Real-time Image Processing, Jan 24, 2015
With the start of the widespread use of discrete wavelet transform in image processing, the need ... more With the start of the widespread use of discrete wavelet transform in image processing, the need for its efficient implementation is becoming increasingly more important. This work presents several novel SIMD-vectorized algorithms of 2-D discrete wavelet transform, using a lifting scheme. At the beginning, a stand-alone core of an already known single-loop approach is extracted. This core is further simplified by an appropriate reorganization of operations. Furthermore, the influence of the CPU cache on a 2-D processing order is examined. Finally, SIMD-vectorizations and parallelizations of the proposed approaches are evaluated. The best of the proposed algorithms scale almost linearly with the number of threads. For all of the platforms used in the tests, these algorithms are significantly faster than other known methods, as shown in the experimental sections of the paper.
The Journal of Supercomputing, Jul 1, 2020
With the wide spread of the discrete wavelet transform, the need for its efficient implementation... more With the wide spread of the discrete wavelet transform, the need for its efficient implementation becomes increasingly important. This work presents an improved version of an algorithm suitable to compute the 2-D discrete wavelet transform on GPU. Depending on the GPU platform, it is suitable to split the 2-D transform computation into separated horizontal and vertical passes. Considering the horizontal passes, we have examined and chosen the best performing method among the already known ones. Furthermore, we have adapted this method for an existing algorithm computing the vertical transform pass. This step helps to reduce several synchronizations and arithmetic operations in the utilized computation scheme. For large data, the proposed vertical method achieves speed-up about 30% compared to the current state of the art methods. In contrast to previously published works, the presented approach is built on the OpenCL parallel programming framework.
arXiv (Cornell University), Jul 2, 2018
We show an iterated function of which iterates oscillate wildly and grow at a dizzying pace. We c... more We show an iterated function of which iterates oscillate wildly and grow at a dizzying pace. We conjecture that the orbit of arbitrary positive integer always returns to 1, as in the case of Collatz function. The conjecture is supported by a heuristic argument and computational results.
arXiv (Cornell University), Jun 25, 2021
In recent years, a bag with image and video compression formats has been torn. However, most of t... more In recent years, a bag with image and video compression formats has been torn. However, most of them are focused on lossy compression and only marginally support the lossless mode. In this paper, I will focus on lossless formats and the critical question: "Which one is the most efficient?" It turned out that FLIF is currently the most efficient format for lossless image compression. This finding is in contrast to that FLIF developers stopped its development in favor of JPEG XL.
arXiv (Cornell University), Feb 10, 2016
This work shows the use of a two-dimensional Gabor wavelets in image processing. Convolution with... more This work shows the use of a two-dimensional Gabor wavelets in image processing. Convolution with such a two-dimensional wavelet can be separated into two series of one-dimensional ones. The key idea of this work is to utilize a Gabor wavelet as a multiscale partial differential operator of a given order. Gabor wavelets are used here to detect edges, corners and blobs. A performance of such an interest point detector is compared to detectors utilizing a Haar wavelet and a derivative of a Gaussian function. The proposed approach may be useful when a fast implementation of the Gabor transform is available or when the transform is already precomputed.
arXiv (Cornell University), Aug 25, 2017
The discrete wavelet transform can be found at the heart of many image-processing algorithms. Unt... more The discrete wavelet transform can be found at the heart of many image-processing algorithms. Until now, the transform on general-purpose processors (CPUs) was mostly computed using a separable lifting scheme. As the lifting scheme consists of a small number of operations, it is preferred for processing using single-core CPUs. However, considering a parallel processing using multi-core processors, this scheme is inappropriate due to a large number of steps. On such architectures, the number of steps corresponds to the number of points that represent the exchange of data. Consequently, these points often form a performance bottleneck. Our approach appropriately rearranges calculations inside the transform, and thereby reduces the number of steps. In other words, we propose a new scheme that is friendly to parallel environments. When evaluating on multi-core CPUs, we consistently overcome the original lifting scheme. The evaluation was performed on 61-core Intel Xeon Phi and 8-core Intel Xeon processors.
arXiv (Cornell University), Apr 27, 2017
The 2-D discrete wavelet transform (DWT) can be found in the heart of many image-processing algor... more The 2-D discrete wavelet transform (DWT) can be found in the heart of many image-processing algorithms. Until recently, several studies have compared the performance of such transform on various shared-memory parallel architectures, especially on graphics processing units (GPUs). All these studies, however, considered only separable calculation schemes. We show that corresponding separable parts can be merged into non-separable units, which halves the number of steps. In addition, we introduce an optional optimization approach leading to a reduction in the number of arithmetic operations. The discussed schemes were adapted on the OpenCL framework and pixel shaders, and then evaluated using GPUs of two biggest vendors. We demonstrate the performance of the proposed non-separable methods by comparison with existing separable schemes. The non-separable schemes outperform their separable counterparts on numerous setups, especially considering the pixel shaders.
We have reduced the number of lifting steps in the calculation of the two-dimensional discrete wa... more We have reduced the number of lifting steps in the calculation of the two-dimensional discrete wavelet transform by factoring the underlying lifting scheme into a new spatial form. Compared with recently proposed non-separable structure, we have reduced also the number of operations. Our scheme is primarily designed for CDF 5/3 and CDF 9/7 wavelets employed in JPEG 2000 image compression standard. In the result, our scheme requires only two steps for 2-D CDF 5/3 transform compared to four steps in the original separable form or three steps in the recent non-separable scheme.
A novel approach to 2-D single-loop wavelet lifting with compatibility to JPEG 2000 is presented ... more A novel approach to 2-D single-loop wavelet lifting with compatibility to JPEG 2000 is presented in this paper. A newly developed 2-D core of CDF 5/3 wavelet filter is presented that, using a new sequence of operations, simplify the design. Moreover, the proposed approach, that uses one pass for 2-D transform, directly produces final output and reduces significantly the need for storing intermediate results into memory. All the proposed structures can be efficiently pipelined in hardware. This paper describes the proposed approach, its implementation in FPGA, cost of such implementation, and brings an experimental evaluation as well as discussion of the features of the approach.
Discrete wavelet transform of finite-length signals must necessarily handle the signal boundaries... more Discrete wavelet transform of finite-length signals must necessarily handle the signal boundaries. The state-ofthe-art approaches treat such boundaries in a complicated and inflexible way, using special prolog or epilog phases. This holds true in particular for images decomposed into a number of scales, exemplary in JPEG 2000 coding system. In this paper, the state-of-the-art approaches are extended to perform the treatment using a compact streaming core, possibly in multiscale fashion. We present the core focused on CDF 5/3 wavelet and the symmetric border extension method, both employed in the JPEG 2000. As a result of our work, every input sample is visited only once, while the results are produced immediately, i.e. without buffering.
With the start of the widespread use of discrete wavelet transform the need for its efficient imp... more With the start of the widespread use of discrete wavelet transform the need for its efficient implementation is becoming increasingly more important. This work presents a general approach of discrete wavelet transform scheme vectorisation evaluated on an FPGAbased Application-Specific Vector Processor (ASVP). This unit can be classified as SIMD computer in Flynn's taxonomy. The presented approach is compared with two other non-vectorised approaches. Using the frequently exploited CDF 9/7 wavelet, the achieved speedup is about 2.6× compared to naive implementation.