Mirek Bober - Academia.edu (original) (raw)
Papers by Mirek Bober
4th International Workshop on Mobile and Wireless Communications Network
ABSTRACT An effective and computationally efficient method for the recovery of lost motion vector... more ABSTRACT An effective and computationally efficient method for the recovery of lost motion vectors in video codecs is proposed. The method clusters motion vectors into groups exhibiting coherent motion. The cluster with the majority of motion vectors is selected and a statistical method is applied to the group to estimate the lost motion vector. Simulation results are presented, including comparison with existing methods, such as the average or median of the motion vectors in the neighbouring blocks.
Proceedings of 1st International Conference on Image Processing
ABSTRACT Motion prediction and spatial coding are the two main techniques used to construct algor... more ABSTRACT Motion prediction and spatial coding are the two main techniques used to construct algorithms for image sequence compression. We present an approach which merges a robust motion estimation technique, based on the Hough transform and robust statistics, with multiple stage vector quantization. MSVQ uses global optimization and multipath searching. The algorithm is capable of segmenting multiple motions and uses line segments to code real motion boundaries. This significantly improves the subjective quality of coded sequence on the motion edges. Experimental results show that the proposed algorithm can achieve better PSNR without the increase in bitrate
2014 IEEE International Conference on Image Processing (ICIP), 2014
Compact locally aggregated binary features have shown great advantages in image search. As the ex... more Compact locally aggregated binary features have shown great advantages in image search. As the exhaustive linear search in Hamming space still entails too much computational complexity for large datasets, recent works proposed to directly use binary codes as hash indices, yielding a dramatic increase in speedup. However, these methods cannot be directly applied to variable-length binary features. In this paper, we propose a Component Hashing (CoHash) algorithm to handle the variable-length binary aggregated descriptors indexing for fast image search. The main idea is to decompose the distance measure between variable-length descriptors into aligned component-to-component matching problems independently, and build multiple hash tables for the visual word components. Given a query, its candidate neighbors are found by using the query binary sub-vectors as indices into their corresponding hash tables. In particular, a bit selection based on conditional mutual information maximization is proposed to reduce the dimensionality of visual word components, which provides a light storage of indices and balances the retrieval accuracy and search cost. Extensive experiments on benchmark datasets show that our approach is 20~25 times faster than linear search, without any noticeable retrieval performance loss.
Computational Imaging and Vision, 2003
This chapter introduces a multi-scale shape representation, referred to as the Torsion Scale Spac... more This chapter introduces a multi-scale shape representation, referred to as the Torsion Scale Space Image (TSS), for space curves. It is argued that space curves are useful for representing 3-D surfaces and objects. Experiments show that the representation is robust and suitable for recognition of noisy curves at any scale or orientation.
Computational Imaging and Vision, 2003
The final chapter is concerned with the generalization of the CSS representation to free-form 3-D... more The final chapter is concerned with the generalization of the CSS representation to free-form 3-D surfaces.
Lecture Notes in Computer Science, 2005
ABSTRACT We present a novel, yet simple algorithm for clustering large collections of digital ima... more ABSTRACT We present a novel, yet simple algorithm for clustering large collections of digital images. The method is applicable to consumer digital photo libraries, where it can be used to organise a photo-album, enhancing the search/browse capability and simplifying the interface in the process. The method is based on standard MPEG-7 visual content descriptors, which, when combined with date and time metadata, provide powerful cues to the semantic structure of the photo collection. Experiments are presented showing how the proposed method closely matches consensus human judgements of cluster structure.
Past decades have seen an exponential growth in usage of digital media. Early solutions to the ma... more Past decades have seen an exponential growth in usage of digital media. Early solutions to the management of these massive amounts of digital media fell short of expectations, stimulating intensive research in areas such as Content Based Image Retrieval (CBIR) and, most recently, Visual Search (VS) and Mobile Visual Search (MVS). The field of Visual Search has been researched for more than a decade leading to recent deployments in the marketplace. As many companies are coming up with proprietary solutions to address the VS challenges, resulting in a fragmented technological landscape and a plethora of non-interoperable systems, MPEG introduces a new worldwide standard for the VS and MVS technology. MPEG’s Compact Descriptors for Visual Search (CDVS) aims to standardize technologies, in order to enable an interoperable, efficient and cross-platform solution for internet-scale visual search applications and services. The forthcoming CDVS standard is particularly important because it w...
Geometric Properties for Incomplete data
ABSTRACT In this paper, we propose a cascade of Dual-LDA (DLDA) operators for Face Recognition. W... more ABSTRACT In this paper, we propose a cascade of Dual-LDA (DLDA) operators for Face Recognition. We show that such an approach results in efficient and low-dimensional feature space for face representation with enhanced discriminatory power. Comparative results to classical LDA and cascade of classical LDA algorithms are presented, showing significantly improved performance. A theoretical analysis for Fisher and DLDA is also presented. Experimental evaluation of the proposed FR algorithm, conducted on MPEG test set with over 8000 images of 929 individuals, shows state-of-the-art performance.
SPIE Proceedings, 2001
ABSTRACT The soon to be released MPEG-7 standard provides a Multimedia Content Description Interf... more ABSTRACT The soon to be released MPEG-7 standard provides a Multimedia Content Description Interface. In other words, it provides a rich set of tools to describe the content with a view to facilitating applications such as content based querying, browsing and searching of multimedia content. In this paper, we describe practical applications of MPEG-7 tools. We use descriptors of features such as color, shape and motion to both index and analyze the content. The aforementioned descriptors stem from our previous work and are currently in the draft international MPEG-7 standard. In our previous work, we have shown the efficacy of each of the descriptors individually. In this paper, we show how we combine color and motion to effectively browse video in our first application. In our second application, we show how we can combine shape and color to recognize objects in real time. We will present a demonstration of our system at the conference. We have already successfully demonstrated it to the Japanese press.
Transmission of compressed video over error prone chan-nels such as mobile networks is a challeng... more Transmission of compressed video over error prone chan-nels such as mobile networks is a challenging issue. Main-taining an acceptable quality of service in such an environ-ment demands additional post-processing tools to limit the impact of uncorrected transmission errors. Significant vi-sual degradation of a video stream occurs when the mo-tion vector component is corrupted. In this paper, an ef-fective and computationally efficient method for the recov-ery of lost motion vectors (MVs) is proposed. The novel idea selects a neighbouring block MV that has the minimum distance from an estimated MV. Simulation results are pre-sented, including comparison with existing methods. Our method follows the performance of the best existing method by approximately 0.1-0.5 dB. However, it has a significant advantage in that it is 50% computationally simpler. This makes our method ideal for use in mobile handsets and other applications with limited processing power.
This paper introduces an efficient image identification method designed to be robust to various i... more This paper introduces an efficient image identification method designed to be robust to various image modifications such as scaling, rotation, compression, flip and grey scale conversion. Our method uses trace transform to extract a 1D representation of an image, from which a binary string is extracted using a Fourier transform. Multiple component descriptors are extracted and combined to boost the robustness of the identifier.. Experimental evaluation was carried out on a set of over 60,000 unique images and one billion image pairs. Results show detection rate of over 92% at false-positive rate below 1 per million, with matching speed exceeding 4 million images per second.
A method of representing an object appearing in a still or video image, by processing signals cor... more A method of representing an object appearing in a still or video image, by processing signals corresponding to the image, comprises deriving the peak values in CSS space for the object outline and applying a non-linear transformation to said peak values to arrive at a representation of the outline
A method of representing an object appearing in a still or video image for use in searching, wher... more A method of representing an object appearing in a still or video image for use in searching, wherein the object appears in the image with a first two-dimensional outline, by processing signals corresponding to the image, comprises deriving a view descriptor of the first outline of the object and deriving at least one additional view descriptor of the outline of the object in a different view, and associating the two or more view descriptors to form an object descriptor
Image and Signal Processing for Remote Sensing XII, 2006
ABSTRACT In this paper we address the problem of registering images acquired under unknown condit... more ABSTRACT In this paper we address the problem of registering images acquired under unknown conditions including acquisition at different times, from different points of view and possibly with different type of sensors, where conventional approaches based on feature correspondence or area correlation are likely to fail or provide unreliable estimates. The result of image registration can be used as initial step for many remote sensing applications such as change detection, terrain reconstruction and image-based sensor navigation. The key idea of the proposed method is to estimate a global parametric transformation between images (e.g. perspective or affine transformation) from a set of local, region-based estimates of rotation-scale-translation (RST) transformation. These RST-transformations form a cluster in rotation-scale space. Each RST-transformation is registered by matching in log-polar space the regions centered at locations of the corresponding interest points. Estimation of the correspondence between interest points is performed simultaneously with registration of the local RST-transformations. Then a sub-set of corresponding points or, equivalently, a sub-set of local RST-transformations is selected by a robust estimation method and a global transformation, which is not biased by outliers, is computed from it. The method is capable of registering images without any a priori knowledge about the transformation between them. The method was tested on many images taken under different conditions by different sensors and on thousands of calibrated image pairs. In all cases the method shows very accurate registration results. We demonstrate the performance of our approach using several datasets and compare it with another state-of-the-art method based on the SIFT descriptor.
4th International Workshop on Mobile and Wireless Communications Network
ABSTRACT An effective and computationally efficient method for the recovery of lost motion vector... more ABSTRACT An effective and computationally efficient method for the recovery of lost motion vectors in video codecs is proposed. The method clusters motion vectors into groups exhibiting coherent motion. The cluster with the majority of motion vectors is selected and a statistical method is applied to the group to estimate the lost motion vector. Simulation results are presented, including comparison with existing methods, such as the average or median of the motion vectors in the neighbouring blocks.
Proceedings of 1st International Conference on Image Processing
ABSTRACT Motion prediction and spatial coding are the two main techniques used to construct algor... more ABSTRACT Motion prediction and spatial coding are the two main techniques used to construct algorithms for image sequence compression. We present an approach which merges a robust motion estimation technique, based on the Hough transform and robust statistics, with multiple stage vector quantization. MSVQ uses global optimization and multipath searching. The algorithm is capable of segmenting multiple motions and uses line segments to code real motion boundaries. This significantly improves the subjective quality of coded sequence on the motion edges. Experimental results show that the proposed algorithm can achieve better PSNR without the increase in bitrate
2014 IEEE International Conference on Image Processing (ICIP), 2014
Compact locally aggregated binary features have shown great advantages in image search. As the ex... more Compact locally aggregated binary features have shown great advantages in image search. As the exhaustive linear search in Hamming space still entails too much computational complexity for large datasets, recent works proposed to directly use binary codes as hash indices, yielding a dramatic increase in speedup. However, these methods cannot be directly applied to variable-length binary features. In this paper, we propose a Component Hashing (CoHash) algorithm to handle the variable-length binary aggregated descriptors indexing for fast image search. The main idea is to decompose the distance measure between variable-length descriptors into aligned component-to-component matching problems independently, and build multiple hash tables for the visual word components. Given a query, its candidate neighbors are found by using the query binary sub-vectors as indices into their corresponding hash tables. In particular, a bit selection based on conditional mutual information maximization is proposed to reduce the dimensionality of visual word components, which provides a light storage of indices and balances the retrieval accuracy and search cost. Extensive experiments on benchmark datasets show that our approach is 20~25 times faster than linear search, without any noticeable retrieval performance loss.
Computational Imaging and Vision, 2003
This chapter introduces a multi-scale shape representation, referred to as the Torsion Scale Spac... more This chapter introduces a multi-scale shape representation, referred to as the Torsion Scale Space Image (TSS), for space curves. It is argued that space curves are useful for representing 3-D surfaces and objects. Experiments show that the representation is robust and suitable for recognition of noisy curves at any scale or orientation.
Computational Imaging and Vision, 2003
The final chapter is concerned with the generalization of the CSS representation to free-form 3-D... more The final chapter is concerned with the generalization of the CSS representation to free-form 3-D surfaces.
Lecture Notes in Computer Science, 2005
ABSTRACT We present a novel, yet simple algorithm for clustering large collections of digital ima... more ABSTRACT We present a novel, yet simple algorithm for clustering large collections of digital images. The method is applicable to consumer digital photo libraries, where it can be used to organise a photo-album, enhancing the search/browse capability and simplifying the interface in the process. The method is based on standard MPEG-7 visual content descriptors, which, when combined with date and time metadata, provide powerful cues to the semantic structure of the photo collection. Experiments are presented showing how the proposed method closely matches consensus human judgements of cluster structure.
Past decades have seen an exponential growth in usage of digital media. Early solutions to the ma... more Past decades have seen an exponential growth in usage of digital media. Early solutions to the management of these massive amounts of digital media fell short of expectations, stimulating intensive research in areas such as Content Based Image Retrieval (CBIR) and, most recently, Visual Search (VS) and Mobile Visual Search (MVS). The field of Visual Search has been researched for more than a decade leading to recent deployments in the marketplace. As many companies are coming up with proprietary solutions to address the VS challenges, resulting in a fragmented technological landscape and a plethora of non-interoperable systems, MPEG introduces a new worldwide standard for the VS and MVS technology. MPEG’s Compact Descriptors for Visual Search (CDVS) aims to standardize technologies, in order to enable an interoperable, efficient and cross-platform solution for internet-scale visual search applications and services. The forthcoming CDVS standard is particularly important because it w...
Geometric Properties for Incomplete data
ABSTRACT In this paper, we propose a cascade of Dual-LDA (DLDA) operators for Face Recognition. W... more ABSTRACT In this paper, we propose a cascade of Dual-LDA (DLDA) operators for Face Recognition. We show that such an approach results in efficient and low-dimensional feature space for face representation with enhanced discriminatory power. Comparative results to classical LDA and cascade of classical LDA algorithms are presented, showing significantly improved performance. A theoretical analysis for Fisher and DLDA is also presented. Experimental evaluation of the proposed FR algorithm, conducted on MPEG test set with over 8000 images of 929 individuals, shows state-of-the-art performance.
SPIE Proceedings, 2001
ABSTRACT The soon to be released MPEG-7 standard provides a Multimedia Content Description Interf... more ABSTRACT The soon to be released MPEG-7 standard provides a Multimedia Content Description Interface. In other words, it provides a rich set of tools to describe the content with a view to facilitating applications such as content based querying, browsing and searching of multimedia content. In this paper, we describe practical applications of MPEG-7 tools. We use descriptors of features such as color, shape and motion to both index and analyze the content. The aforementioned descriptors stem from our previous work and are currently in the draft international MPEG-7 standard. In our previous work, we have shown the efficacy of each of the descriptors individually. In this paper, we show how we combine color and motion to effectively browse video in our first application. In our second application, we show how we can combine shape and color to recognize objects in real time. We will present a demonstration of our system at the conference. We have already successfully demonstrated it to the Japanese press.
Transmission of compressed video over error prone chan-nels such as mobile networks is a challeng... more Transmission of compressed video over error prone chan-nels such as mobile networks is a challenging issue. Main-taining an acceptable quality of service in such an environ-ment demands additional post-processing tools to limit the impact of uncorrected transmission errors. Significant vi-sual degradation of a video stream occurs when the mo-tion vector component is corrupted. In this paper, an ef-fective and computationally efficient method for the recov-ery of lost motion vectors (MVs) is proposed. The novel idea selects a neighbouring block MV that has the minimum distance from an estimated MV. Simulation results are pre-sented, including comparison with existing methods. Our method follows the performance of the best existing method by approximately 0.1-0.5 dB. However, it has a significant advantage in that it is 50% computationally simpler. This makes our method ideal for use in mobile handsets and other applications with limited processing power.
This paper introduces an efficient image identification method designed to be robust to various i... more This paper introduces an efficient image identification method designed to be robust to various image modifications such as scaling, rotation, compression, flip and grey scale conversion. Our method uses trace transform to extract a 1D representation of an image, from which a binary string is extracted using a Fourier transform. Multiple component descriptors are extracted and combined to boost the robustness of the identifier.. Experimental evaluation was carried out on a set of over 60,000 unique images and one billion image pairs. Results show detection rate of over 92% at false-positive rate below 1 per million, with matching speed exceeding 4 million images per second.
A method of representing an object appearing in a still or video image, by processing signals cor... more A method of representing an object appearing in a still or video image, by processing signals corresponding to the image, comprises deriving the peak values in CSS space for the object outline and applying a non-linear transformation to said peak values to arrive at a representation of the outline
A method of representing an object appearing in a still or video image for use in searching, wher... more A method of representing an object appearing in a still or video image for use in searching, wherein the object appears in the image with a first two-dimensional outline, by processing signals corresponding to the image, comprises deriving a view descriptor of the first outline of the object and deriving at least one additional view descriptor of the outline of the object in a different view, and associating the two or more view descriptors to form an object descriptor
Image and Signal Processing for Remote Sensing XII, 2006
ABSTRACT In this paper we address the problem of registering images acquired under unknown condit... more ABSTRACT In this paper we address the problem of registering images acquired under unknown conditions including acquisition at different times, from different points of view and possibly with different type of sensors, where conventional approaches based on feature correspondence or area correlation are likely to fail or provide unreliable estimates. The result of image registration can be used as initial step for many remote sensing applications such as change detection, terrain reconstruction and image-based sensor navigation. The key idea of the proposed method is to estimate a global parametric transformation between images (e.g. perspective or affine transformation) from a set of local, region-based estimates of rotation-scale-translation (RST) transformation. These RST-transformations form a cluster in rotation-scale space. Each RST-transformation is registered by matching in log-polar space the regions centered at locations of the corresponding interest points. Estimation of the correspondence between interest points is performed simultaneously with registration of the local RST-transformations. Then a sub-set of corresponding points or, equivalently, a sub-set of local RST-transformations is selected by a robust estimation method and a global transformation, which is not biased by outliers, is computed from it. The method is capable of registering images without any a priori knowledge about the transformation between them. The method was tested on many images taken under different conditions by different sensors and on thousands of calibrated image pairs. In all cases the method shows very accurate registration results. We demonstrate the performance of our approach using several datasets and compare it with another state-of-the-art method based on the SIFT descriptor.