David Barina | Brno University of Technology (original) (raw)

Papers by David Barina

Research paper thumbnail of Comparison of Lossless Image Formats

Computer Science Research Notes, 2021

In recent years, a bag with image and video compression formats has been torn. However, most of t... more In recent years, a bag with image and video compression formats has been torn. However, most of them are focused on lossy compression and only marginally support the lossless mode. In this paper, I will focus on lossless formats and the critical question: "Which one is the most efficient?" It turned out that FLIF is currently the most efficient format for lossless image compression. This finding is in contrast to that FLIF developers stopped its development in favor of JPEG XL.

Research paper thumbnail of JPEG2000のための単一ループソフトウェアアーキテクチャ【Powered by NICT】

IEEE Conference Proceedings, 2016

Research paper thumbnail of Vectorisation of Wavelet Lifting

Research paper thumbnail of Diagonal vectorisation of 2-D wavelet lifting

Research paper thumbnail of Experimental lossless data compressor

Microprocessors and Microsystems, Apr 1, 2023

Research paper thumbnail of x3: Lossless Data Compressor

x3 is a lossless optimizing dictionary-based data compressor. The algorithm uses a combination of... more x3 is a lossless optimizing dictionary-based data compressor. The algorithm uses a combination of a dictionary, context modeling, and arithmetic coding. Optimization adds the ability to find the most appropriate parameters for each file. Even without optimization, x3 can compress data with a compression ratio comparable to the best dictionary compression methods like LZMA, zstd, or Brotli.

Research paper thumbnail of Multiplication Algorithm Based on Collatz Function

Theory of computing systems, May 15, 2020

This article presents a new multiplication algorithm based on the Collatz function. Assuming the ... more This article presents a new multiplication algorithm based on the Collatz function. Assuming the validity of the Collatz conjecture, the time complexity of multiplying two n-digit numbers is O(kn), where the k is the number of odd steps in the Collatz trajectory of the first multiplicand. Most likely, the algorithm is only of theoretical interest.

Research paper thumbnail of Real-time wavelet transform for infinite image strips

Journal of Real-time Image Processing, Jul 8, 2020

This article presents a single-loop approach to a 2-D discrete wavelet transform that allows proc... more This article presents a single-loop approach to a 2-D discrete wavelet transform that allows processing infinitely high-image strip-maps. The paper gradually compares several computational strategies to finally show how to deal with a multi-scale wavelet transform of infinite image streams. Besides, the transform is followed by a bit-plane encoder which also processes data in a single loop. The whole machinery is part of a CCSDS 122.0 image codec which manages to process a single pixel in about 33 ns on a contemporary desktop computer, without the contribution of any parallel computing or SIMD vectorization.

Research paper thumbnail of Vectorization and parallelization of 2-D wavelet lifting

Journal of Real-time Image Processing, Jan 24, 2015

With the start of the widespread use of discrete wavelet transform in image processing, the need ... more With the start of the widespread use of discrete wavelet transform in image processing, the need for its efficient implementation is becoming increasingly more important. This work presents several novel SIMD-vectorized algorithms of 2-D discrete wavelet transform, using a lifting scheme. At the beginning, a stand-alone core of an already known single-loop approach is extracted. This core is further simplified by an appropriate reorganization of operations. Furthermore, the influence of the CPU cache on a 2-D processing order is examined. Finally, SIMD-vectorizations and parallelizations of the proposed approaches are evaluated. The best of the proposed algorithms scale almost linearly with the number of threads. For all of the platforms used in the tests, these algorithms are significantly faster than other known methods, as shown in the experimental sections of the paper.

Research paper thumbnail of Convergence verification of the Collatz problem

The Journal of Supercomputing, Jul 1, 2020

Research paper thumbnail of 2-D Discrete Wavelet Transform Using GPU

With the wide spread of the discrete wavelet transform, the need for its efficient implementation... more With the wide spread of the discrete wavelet transform, the need for its efficient implementation becomes increasingly important. This work presents an improved version of an algorithm suitable to compute the 2-D discrete wavelet transform on GPU. Depending on the GPU platform, it is suitable to split the 2-D transform computation into separated horizontal and vertical passes. Considering the horizontal passes, we have examined and chosen the best performing method among the already known ones. Furthermore, we have adapted this method for an existing algorithm computing the vertical transform pass. This step helps to reduce several synchronizations and arithmetic operations in the utilized computation scheme. For large data, the proposed vertical method achieves speed-up about 30% compared to the current state of the art methods. In contrast to previously published works, the presented approach is built on the OpenCL parallel programming framework.

Research paper thumbnail of <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>7</mn><mi>x</mi><mo>±</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">7x\pm1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7278em;vertical-align:-0.0833em;"></span><span class="mord">7</span><span class="mord mathnormal">x</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">±</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">1</span></span></span></span>: Close Relative of Collatz Problem

arXiv (Cornell University), Jul 2, 2018

We show an iterated function of which iterates oscillate wildly and grow at a dizzying pace. We c... more We show an iterated function of which iterates oscillate wildly and grow at a dizzying pace. We conjecture that the orbit of arbitrary positive integer always returns to 1, as in the case of Collatz function. The conjecture is supported by a heuristic argument and computational results.

Research paper thumbnail of Comparison of Lossless Image Formats

arXiv (Cornell University), Jun 25, 2021

In recent years, a bag with image and video compression formats has been torn. However, most of t... more In recent years, a bag with image and video compression formats has been torn. However, most of them are focused on lossy compression and only marginally support the lossless mode. In this paper, I will focus on lossless formats and the critical question: "Which one is the most efficient?" It turned out that FLIF is currently the most efficient format for lossless image compression. This finding is in contrast to that FLIF developers stopped its development in favor of JPEG XL.

Research paper thumbnail of Gabor Wavelets in Image Processing

arXiv (Cornell University), Feb 10, 2016

This work shows the use of a two-dimensional Gabor wavelets in image processing. Convolution with... more This work shows the use of a two-dimensional Gabor wavelets in image processing. Convolution with such a two-dimensional wavelet can be separated into two series of one-dimensional ones. The key idea of this work is to utilize a Gabor wavelet as a multiscale partial differential operator of a given order. Gabor wavelets are used here to detect edges, corners and blobs. A performance of such an interest point detector is compared to detectors utilizing a Haar wavelet and a derivative of a Gaussian function. The proposed approach may be useful when a fast implementation of the Gabor transform is available or when the transform is already precomputed.

Research paper thumbnail of The Parallel Algorithm for the 2-D Discrete Wavelet Transform

arXiv (Cornell University), Aug 25, 2017

The discrete wavelet transform can be found at the heart of many image-processing algorithms. Unt... more The discrete wavelet transform can be found at the heart of many image-processing algorithms. Until now, the transform on general-purpose processors (CPUs) was mostly computed using a separable lifting scheme. As the lifting scheme consists of a small number of operations, it is preferred for processing using single-core CPUs. However, considering a parallel processing using multi-core processors, this scheme is inappropriate due to a large number of steps. On such architectures, the number of steps corresponds to the number of points that represent the exchange of data. Consequently, these points often form a performance bottleneck. Our approach appropriately rearranges calculations inside the transform, and thereby reduces the number of steps. In other words, we propose a new scheme that is friendly to parallel environments. When evaluating on multi-core CPUs, we consistently overcome the original lifting scheme. The evaluation was performed on 61-core Intel Xeon Phi and 8-core Intel Xeon processors.

Research paper thumbnail of Accelerating Discrete Wavelet Transforms on Parallel Architectures

arXiv (Cornell University), Apr 27, 2017

The 2-D discrete wavelet transform (DWT) can be found in the heart of many image-processing algor... more The 2-D discrete wavelet transform (DWT) can be found in the heart of many image-processing algorithms. Until recently, several studies have compared the performance of such transform on various shared-memory parallel architectures, especially on graphics processing units (GPUs). All these studies, however, considered only separable calculation schemes. We show that corresponding separable parts can be merged into non-separable units, which halves the number of steps. In addition, we introduce an optional optimization approach leading to a reduction in the number of arithmetic operations. The discussed schemes were adapted on the OpenCL framework and pixel shaders, and then evaluated using GPUs of two biggest vendors. We demonstrate the performance of the proposed non-separable methods by comparison with existing separable schemes. The non-separable schemes outperform their separable counterparts on numerous setups, especially considering the pixel shaders.

Research paper thumbnail of New non-separable lifting scheme for images

We have reduced the number of lifting steps in the calculation of the two-dimensional discrete wa... more We have reduced the number of lifting steps in the calculation of the two-dimensional discrete wavelet transform by factoring the underlying lifting scheme into a new spatial form. Compared with recently proposed non-separable structure, we have reduced also the number of operations. Our scheme is primarily designed for CDF 5/3 and CDF 9/7 wavelets employed in JPEG 2000 image compression standard. In the result, our scheme requires only two steps for 2-D CDF 5/3 transform compared to four steps in the original separable form or three steps in the recent non-separable scheme.

Research paper thumbnail of Single-Loop Approach to 2-D Wavelet Lifting with JPEG 2000 Compatibility

A novel approach to 2-D single-loop wavelet lifting with compatibility to JPEG 2000 is presented ... more A novel approach to 2-D single-loop wavelet lifting with compatibility to JPEG 2000 is presented in this paper. A newly developed 2-D core of CDF 5/3 wavelet filter is presented that, using a new sequence of operations, simplify the design. Moreover, the proposed approach, that uses one pass for 2-D transform, directly produces final output and reduces significantly the need for storing intermediate results into memory. All the proposed structures can be efficiently pipelined in hardware. This paper describes the proposed approach, its implementation in FPGA, cost of such implementation, and brings an experimental evaluation as well as discussion of the features of the approach.

Research paper thumbnail of Simple signal extension method for discrete wavelet transform

Discrete wavelet transform of finite-length signals must necessarily handle the signal boundaries... more Discrete wavelet transform of finite-length signals must necessarily handle the signal boundaries. The state-ofthe-art approaches treat such boundaries in a complicated and inflexible way, using special prolog or epilog phases. This holds true in particular for images decomposed into a number of scales, exemplary in JPEG 2000 coding system. In this paper, the state-of-the-art approaches are extended to perform the treatment using a compact streaming core, possibly in multiscale fashion. We present the core focused on CDF 5/3 wavelet and the symmetric border extension method, both employed in the JPEG 2000. As a result of our work, every input sample is visited only once, while the results are produced immediately, i.e. without buffering.

Research paper thumbnail of Wavelet Lifting on Application Specific Vector Processor

With the start of the widespread use of discrete wavelet transform the need for its efficient imp... more With the start of the widespread use of discrete wavelet transform the need for its efficient implementation is becoming increasingly more important. This work presents a general approach of discrete wavelet transform scheme vectorisation evaluated on an FPGAbased Application-Specific Vector Processor (ASVP). This unit can be classified as SIMD computer in Flynn's taxonomy. The presented approach is compared with two other non-vectorised approaches. Using the frequently exploited CDF 9/7 wavelet, the achieved speedup is about 2.6× compared to naive implementation.

Research paper thumbnail of Comparison of Lossless Image Formats

Computer Science Research Notes, 2021

In recent years, a bag with image and video compression formats has been torn. However, most of t... more In recent years, a bag with image and video compression formats has been torn. However, most of them are focused on lossy compression and only marginally support the lossless mode. In this paper, I will focus on lossless formats and the critical question: "Which one is the most efficient?" It turned out that FLIF is currently the most efficient format for lossless image compression. This finding is in contrast to that FLIF developers stopped its development in favor of JPEG XL.

Research paper thumbnail of JPEG2000のための単一ループソフトウェアアーキテクチャ【Powered by NICT】

IEEE Conference Proceedings, 2016

Research paper thumbnail of Vectorisation of Wavelet Lifting

Research paper thumbnail of Diagonal vectorisation of 2-D wavelet lifting

Research paper thumbnail of Experimental lossless data compressor

Microprocessors and Microsystems, Apr 1, 2023

Research paper thumbnail of x3: Lossless Data Compressor

x3 is a lossless optimizing dictionary-based data compressor. The algorithm uses a combination of... more x3 is a lossless optimizing dictionary-based data compressor. The algorithm uses a combination of a dictionary, context modeling, and arithmetic coding. Optimization adds the ability to find the most appropriate parameters for each file. Even without optimization, x3 can compress data with a compression ratio comparable to the best dictionary compression methods like LZMA, zstd, or Brotli.

Research paper thumbnail of Multiplication Algorithm Based on Collatz Function

Theory of computing systems, May 15, 2020

This article presents a new multiplication algorithm based on the Collatz function. Assuming the ... more This article presents a new multiplication algorithm based on the Collatz function. Assuming the validity of the Collatz conjecture, the time complexity of multiplying two n-digit numbers is O(kn), where the k is the number of odd steps in the Collatz trajectory of the first multiplicand. Most likely, the algorithm is only of theoretical interest.

Research paper thumbnail of Real-time wavelet transform for infinite image strips

Journal of Real-time Image Processing, Jul 8, 2020

This article presents a single-loop approach to a 2-D discrete wavelet transform that allows proc... more This article presents a single-loop approach to a 2-D discrete wavelet transform that allows processing infinitely high-image strip-maps. The paper gradually compares several computational strategies to finally show how to deal with a multi-scale wavelet transform of infinite image streams. Besides, the transform is followed by a bit-plane encoder which also processes data in a single loop. The whole machinery is part of a CCSDS 122.0 image codec which manages to process a single pixel in about 33 ns on a contemporary desktop computer, without the contribution of any parallel computing or SIMD vectorization.

Research paper thumbnail of Vectorization and parallelization of 2-D wavelet lifting

Journal of Real-time Image Processing, Jan 24, 2015

With the start of the widespread use of discrete wavelet transform in image processing, the need ... more With the start of the widespread use of discrete wavelet transform in image processing, the need for its efficient implementation is becoming increasingly more important. This work presents several novel SIMD-vectorized algorithms of 2-D discrete wavelet transform, using a lifting scheme. At the beginning, a stand-alone core of an already known single-loop approach is extracted. This core is further simplified by an appropriate reorganization of operations. Furthermore, the influence of the CPU cache on a 2-D processing order is examined. Finally, SIMD-vectorizations and parallelizations of the proposed approaches are evaluated. The best of the proposed algorithms scale almost linearly with the number of threads. For all of the platforms used in the tests, these algorithms are significantly faster than other known methods, as shown in the experimental sections of the paper.

Research paper thumbnail of Convergence verification of the Collatz problem

The Journal of Supercomputing, Jul 1, 2020

Research paper thumbnail of 2-D Discrete Wavelet Transform Using GPU

With the wide spread of the discrete wavelet transform, the need for its efficient implementation... more With the wide spread of the discrete wavelet transform, the need for its efficient implementation becomes increasingly important. This work presents an improved version of an algorithm suitable to compute the 2-D discrete wavelet transform on GPU. Depending on the GPU platform, it is suitable to split the 2-D transform computation into separated horizontal and vertical passes. Considering the horizontal passes, we have examined and chosen the best performing method among the already known ones. Furthermore, we have adapted this method for an existing algorithm computing the vertical transform pass. This step helps to reduce several synchronizations and arithmetic operations in the utilized computation scheme. For large data, the proposed vertical method achieves speed-up about 30% compared to the current state of the art methods. In contrast to previously published works, the presented approach is built on the OpenCL parallel programming framework.

Research paper thumbnail of <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>7</mn><mi>x</mi><mo>±</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">7x\pm1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7278em;vertical-align:-0.0833em;"></span><span class="mord">7</span><span class="mord mathnormal">x</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">±</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">1</span></span></span></span>: Close Relative of Collatz Problem

arXiv (Cornell University), Jul 2, 2018

We show an iterated function of which iterates oscillate wildly and grow at a dizzying pace. We c... more We show an iterated function of which iterates oscillate wildly and grow at a dizzying pace. We conjecture that the orbit of arbitrary positive integer always returns to 1, as in the case of Collatz function. The conjecture is supported by a heuristic argument and computational results.

Research paper thumbnail of Comparison of Lossless Image Formats

arXiv (Cornell University), Jun 25, 2021

In recent years, a bag with image and video compression formats has been torn. However, most of t... more In recent years, a bag with image and video compression formats has been torn. However, most of them are focused on lossy compression and only marginally support the lossless mode. In this paper, I will focus on lossless formats and the critical question: "Which one is the most efficient?" It turned out that FLIF is currently the most efficient format for lossless image compression. This finding is in contrast to that FLIF developers stopped its development in favor of JPEG XL.

Research paper thumbnail of Gabor Wavelets in Image Processing

arXiv (Cornell University), Feb 10, 2016

This work shows the use of a two-dimensional Gabor wavelets in image processing. Convolution with... more This work shows the use of a two-dimensional Gabor wavelets in image processing. Convolution with such a two-dimensional wavelet can be separated into two series of one-dimensional ones. The key idea of this work is to utilize a Gabor wavelet as a multiscale partial differential operator of a given order. Gabor wavelets are used here to detect edges, corners and blobs. A performance of such an interest point detector is compared to detectors utilizing a Haar wavelet and a derivative of a Gaussian function. The proposed approach may be useful when a fast implementation of the Gabor transform is available or when the transform is already precomputed.

Research paper thumbnail of The Parallel Algorithm for the 2-D Discrete Wavelet Transform

arXiv (Cornell University), Aug 25, 2017

The discrete wavelet transform can be found at the heart of many image-processing algorithms. Unt... more The discrete wavelet transform can be found at the heart of many image-processing algorithms. Until now, the transform on general-purpose processors (CPUs) was mostly computed using a separable lifting scheme. As the lifting scheme consists of a small number of operations, it is preferred for processing using single-core CPUs. However, considering a parallel processing using multi-core processors, this scheme is inappropriate due to a large number of steps. On such architectures, the number of steps corresponds to the number of points that represent the exchange of data. Consequently, these points often form a performance bottleneck. Our approach appropriately rearranges calculations inside the transform, and thereby reduces the number of steps. In other words, we propose a new scheme that is friendly to parallel environments. When evaluating on multi-core CPUs, we consistently overcome the original lifting scheme. The evaluation was performed on 61-core Intel Xeon Phi and 8-core Intel Xeon processors.

Research paper thumbnail of Accelerating Discrete Wavelet Transforms on Parallel Architectures

arXiv (Cornell University), Apr 27, 2017

The 2-D discrete wavelet transform (DWT) can be found in the heart of many image-processing algor... more The 2-D discrete wavelet transform (DWT) can be found in the heart of many image-processing algorithms. Until recently, several studies have compared the performance of such transform on various shared-memory parallel architectures, especially on graphics processing units (GPUs). All these studies, however, considered only separable calculation schemes. We show that corresponding separable parts can be merged into non-separable units, which halves the number of steps. In addition, we introduce an optional optimization approach leading to a reduction in the number of arithmetic operations. The discussed schemes were adapted on the OpenCL framework and pixel shaders, and then evaluated using GPUs of two biggest vendors. We demonstrate the performance of the proposed non-separable methods by comparison with existing separable schemes. The non-separable schemes outperform their separable counterparts on numerous setups, especially considering the pixel shaders.

Research paper thumbnail of New non-separable lifting scheme for images

We have reduced the number of lifting steps in the calculation of the two-dimensional discrete wa... more We have reduced the number of lifting steps in the calculation of the two-dimensional discrete wavelet transform by factoring the underlying lifting scheme into a new spatial form. Compared with recently proposed non-separable structure, we have reduced also the number of operations. Our scheme is primarily designed for CDF 5/3 and CDF 9/7 wavelets employed in JPEG 2000 image compression standard. In the result, our scheme requires only two steps for 2-D CDF 5/3 transform compared to four steps in the original separable form or three steps in the recent non-separable scheme.

Research paper thumbnail of Single-Loop Approach to 2-D Wavelet Lifting with JPEG 2000 Compatibility

A novel approach to 2-D single-loop wavelet lifting with compatibility to JPEG 2000 is presented ... more A novel approach to 2-D single-loop wavelet lifting with compatibility to JPEG 2000 is presented in this paper. A newly developed 2-D core of CDF 5/3 wavelet filter is presented that, using a new sequence of operations, simplify the design. Moreover, the proposed approach, that uses one pass for 2-D transform, directly produces final output and reduces significantly the need for storing intermediate results into memory. All the proposed structures can be efficiently pipelined in hardware. This paper describes the proposed approach, its implementation in FPGA, cost of such implementation, and brings an experimental evaluation as well as discussion of the features of the approach.

Research paper thumbnail of Simple signal extension method for discrete wavelet transform

Discrete wavelet transform of finite-length signals must necessarily handle the signal boundaries... more Discrete wavelet transform of finite-length signals must necessarily handle the signal boundaries. The state-ofthe-art approaches treat such boundaries in a complicated and inflexible way, using special prolog or epilog phases. This holds true in particular for images decomposed into a number of scales, exemplary in JPEG 2000 coding system. In this paper, the state-of-the-art approaches are extended to perform the treatment using a compact streaming core, possibly in multiscale fashion. We present the core focused on CDF 5/3 wavelet and the symmetric border extension method, both employed in the JPEG 2000. As a result of our work, every input sample is visited only once, while the results are produced immediately, i.e. without buffering.

Research paper thumbnail of Wavelet Lifting on Application Specific Vector Processor

With the start of the widespread use of discrete wavelet transform the need for its efficient imp... more With the start of the widespread use of discrete wavelet transform the need for its efficient implementation is becoming increasingly more important. This work presents a general approach of discrete wavelet transform scheme vectorisation evaluated on an FPGAbased Application-Specific Vector Processor (ASVP). This unit can be classified as SIMD computer in Flynn's taxonomy. The presented approach is compared with two other non-vectorised approaches. Using the frequently exploited CDF 9/7 wavelet, the achieved speedup is about 2.6× compared to naive implementation.