Adrian Bors | University of York (original) (raw)

Papers by Adrian Bors

IEEE Transactions on Neural Networks and Learning Systems, 2021

Variational autoencoders (VAEs) are one of the most popular unsupervised generative models which ... more Variational autoencoders (VAEs) are one of the most popular unsupervised generative models which rely on learning latent representations of data. In this paper, we extend the classical concept of Gaussian mixtures into the deep variational framework by proposing a mixture of VAEs (MVAE). Each component in the MVAE model is implemented by a variational encoder and has an associated sub-decoder. The separation between the latent spaces modelled by different encoders is enforced using the d-variable Hilbert-Schmidt Independence Criterion (dHSIC) criterion. Each component would capture different data variational features. We also propose a mechanism for finding the appropriate number of VAE components for a given task, leading to an optimal architecture. The differen-tiable categorical Gumbel-Softmax distribution is used in order to generate dropout masking parameters within the end-to-end backpropagation training framework. Extensive experiments show that the proposed MAVE model learns a rich latent data representation and is able to discover additional underlying data factors.

Information Sciences, 2021

Learning disentangled and interpretable representations is an important aspect of information und... more Learning disentangled and interpretable representations is an important aspect of information understanding. In this paper, we propose a novel deep learning model representing both discrete and continuous latent variable spaces which can be used in either supervised or unsupervised learning. The proposed model is trained using an optimization function employing the mutual information maximization criterion. For the unsupervised learning setting we define a lower bound to the mutual information between the joint distribution of the latent variables corresponding to the real data and those generated by the model. The maximization of this lower bound during the training induces the learning of disentangled and interpretable data representations. Such representations can be used for attribute manipulation and image editing tasks.

Proc. ICPR's Int. Workshop on Human and Vehicle Analysis for Intelligent Urban Computing (IUC), vol. LNCS , 2021

Vehicle re-identification (Re-ID) is the task aiming to identify the same vehicle from images cap... more Vehicle re-identification (Re-ID) is the task aiming to identify the same vehicle from images captured by different cameras. Recent years have seen various appearance-based approaches focusing only on global features or exploring local features to obtain more subtle details which can alleviate the subtle inter-instance problem. However, few emphasize the spatial geometrical structure relationship among local regions or between the global region and local regions. To explore above-mentioned spatial structure relationship, this paper proposes a hierarchical spatial structural graph convolutional network (HSS-GCN) for vehicle Re-ID, in which we firstly construct a hierarchical spatial structural graph with the global region and local regions as nodes and a two-hierarchy relationship as edges, and later learning discriminative structure features with a GCN module under the constraints of metric learning. To augment the performance of our proposed network, we jointly combine the classification loss with metric learning loss. Extensive experiments conducted on the public VehicleID and VeRi-776 datasets validate the effectiveness of our approach in comparison with recent works.

Proc. International Conference on Pattern Recognition (ICPR), 2020

Convolutional Neural Networks (CNNs) model long-range dependencies by deeply stacking convolution... more Convolutional Neural Networks (CNNs) model long-range dependencies by deeply stacking convolution operations with small window sizes, which makes the optimizations difficult. This paper presents region-based non-local (RNL) operations as a family of self-attention mechanisms, which can directly capture long-range dependencies without using a deep stack of local operations. Given an intermediate feature map, our method recalibrates the feature at a position by aggregating the information from the neighboring regions of all positions. By combining a channel attention module with the proposed RNL, we design an attention chain, which can be integrated into the off-the-shelf CNNs for end-to-end training. We evaluate our method on two video classification benchmarks. The experimental results of our method outperform other attention mechanisms, and we achieve state-of-the-art performance on the Something-Something V1 dataset.

Proc. International Conference on Image Processing Theory, Tools and Applications (IPTA), 2020

Existing machine learning systems are trained to adapt to a single database and their ability to ... more Existing machine learning systems are trained to adapt to a single database and their ability to acquire additional information is limited. Catastrophic forgetting occurs in all deep learning systems when attempting to train them with additional databases. The information learnt previously is forgotten and no longer recognized when such a learning systems is trained using a new database. In this paper, we develop a new image generation approach defined under the lifelong learning framework which prevents forgetting. We employ the mutual information maximization between the latent variable space and the outputs of the generator network in order to learn interpretable representations, when learning using the data from a series of databases sequentially. We also provide the theoretical framework for the generative replay mechanism, under the lifelong learning setting. We perform a series of experiments showing that the proposed approach is able to learn a set of disjoint data distributions in a sequential manner while also capturing meaningful data representations across domains.

Proc. International Conference on Image Processing Theory, Tools and Applications (IPTA), 2020

In this paper, we develop a new deep mixture learning framework, aiming to learn underlying compl... more In this paper, we develop a new deep mixture learning framework, aiming to learn underlying complex data structures. Each component in the mixture model is implemented using a Variational Autoencoder (VAE). VAE is a well known deep learning model which models a latent space data representation on a variational manifold. The mixing parameters are estimated from a Dirichlet distribution modelled by each encoder. In order to train this mixture model, named M-VAE, we derive a mixture evidence lower bound on the sample log-likelihood, which is optimized in order to jointly estimate all mixture components. We further propose to use the d-variables Hilbert-Schmidt Independence Criterion (dHSIC) as a regularization criterion in order to enforce the independence among the encoders' distributions. This criterion encourages the proposed mixture components to learn different data distributions and represent them in the latent space. During the experiments with the proposed M-VAE model we observe that it can be used for discovering disentangled data representations which can not be achieved with a single VAE.

Proc. British Machine Vision Computer Vision Conference, 2020

Deep learning based feature extraction combined with visual attention mechanism is shown to provi... more Deep learning based feature extraction combined with visual attention mechanism is shown to provide good results in content-based image retrieval (CBIR). Ideally, CBIR should rely on regions which contain objects of interest that appear in the query image. However, most existing attention models just predict the most likely region of interest based on the knowledge learned from the training dataset regardless of the content in the query image. As a result, they may look towards contexts outside the object of interest, especially when there are multiple potential objects of interest in a given image. In this paper, we propose a conditional attention model which is sensitive to the input query image content and can generate more accurate attention maps. A key-point detection and description based method is proposed for training data generation. Consequently, our model does not require any additional attention label for training. The proposed attention model enables the spatial pooling feature extraction method (generalized mean pooling) improves image feature representation and leads to better image retrieval performance. The proposed framework is tested on a series of databases where it is shown to perform well in challenging situations.

ACM Turing Celebration Conference, 2019

In order to improve the detection of hidden information in signals , additional features are cons... more In order to improve the detection of hidden information in signals , additional features are considered as inputs for steganalysers. This research study proposes a feature selection method based on Weighted Inner-Inter class Distance and Dispersion (W2ID) criterion in order to reduce the steganalytic feature dimensionality. The definition of W2ID criterion and an algorithm determining the weight for the W2ID criterion based on the frequency statistical weighting method are proposed. Then, the W2ID criterion is applied in the decision rough set α-positive domain reduction, producing the W2ID-based feature selection method. Experimental results show that the proposed method can reduce the dimension of the feature space and memory requirements of Gabor Filter Resid-uals (GFR) feature while maintaining or improving the detection accuracy. CCS CONCEPTS • Management of Computing and Information System → Security and Protection; • Pattern Recognition → Applications; Signal processing.

IEEE Trans. on Pattern Analysis and Machine Learning, 2020

Memorability of an image is a characteristic determined by the human observers' ability to rememb... more Memorability of an image is a characteristic determined by the human observers' ability to remember images they have seen. Yet recent work on image memorability defines it as an intrinsic property that can be obtained independent of the observer. The current study aims to enhance our understanding and prediction of image memorability, improving upon existing approaches by incorporating the properties of cumulative human annotations. We propose a new concept called the Visual Memory Schema (VMS) referring to an organization of image components human observers share when encoding and recognizing images. The concept of VMS is operationalised by asking human observers to define memorable regions of images they were asked to remember during an episodic memory test. We then statistically assess the consistency of VMSs across observers for either correctly or incorrectly recognised images. The associations of the VMSs with eye fixations and saliency are analysed separately as well. Lastly, we adapt various deep learning architectures for the reconstruction and prediction of memorable regions in images and analyse the results when using transfer learning at the outputs of different convolutional network layers.

Information Sciences, 2020

3D steganalysis aims to find the information hidden in 3D models and graphical objects. It is ass... more 3D steganalysis aims to find the information hidden in 3D models and graphical objects. It is assumed that the information was hidden by 3D steganography or watermarking algorithms. A new set of 3D steganalysis features, derived by using multiresolution 3D wavelet analysis, is proposed in this research study. 3D wavelets relate a given mesh representation with its lower and higher graph resolutions by means of a set of Wavelet Coefficient Vectors (WCVs). The 3D steganalysis features are derived from transformations between a given mesh and its corresponding higher and lower resolutions. They correspond to geometric measures such as ratios and angles between various geometric measures. These features are shown to significantly increase the steganalysis accuracy when detecting watermarks which have been embedded by 3D wavelet-based watermarking algorithms. The proposed features, when used in combination with a previously proposed feature set, is shown to provide the best results in detecting the hidden information embedded by other information hiding algorithms.

Journal of Real Time Image Processing, 2020

The development of multimedia and deep learning technology bring new challenges to steganography ... more The development of multimedia and deep learning technology bring new challenges to steganography and steganalysis techniques. Meanwhile, robust steganography, as a class of new techniques aiming to solve the problem of covert communication under lossy channels, has become a new research hotspot in the field of information hiding. To improve the communication reliability and efficiency for current real-time robust steganography methods, a concatenated code, composed of Syndrome-Trellis codes (STC) and cyclic redundancy check (CRC) codes, is proposed in this paper. The enhanced robust adaptive steganography framework proposed is this paper is characterized by a strong error detection capability, high coding efficiency, and low embedding costs. On this basis, three adaptive steganographic methods resisting JPEG compression and detection are proposed. Then, the fault tolerance of the proposed steganography methods is analyzed using the residual model of JPEG compression, thus obtaining the appropriate coding parameters. Experimental results show that the proposed methods have a significantly stronger robustness against compression, and are more difficult to be detected by statistical based steganalytic methods.

British Machine Vision Conference (BMVC), 2019

Visual memory schema (VMS) maps show which regions of an image cause that image to be remembered ... more Visual memory schema (VMS) maps show which regions of an image cause that image to be remembered or falsely remembered. Previous work has succeeded in generating low resolution VMS maps using convolutional neural networks. We instead approach this problem as an image-to-image translation task making use of a variational autoencoder. This approach allows us to generate higher resolution dual channel images that represent visual memory schemas, allowing us to evaluate predicted true memorability and false memorability separately. We also evaluate the relationship between VMS maps, predicted VMS maps, ground truth memorability scores, and predicted memorability scores.

Proc. IEEE Int. Conf. on Acoustics Systems and Signal Proc. (ICASSP), 2020

In this paper, we propose a new video representation learning method, named Temporal Squeeze (TS)... more In this paper, we propose a new video representation learning method, named Temporal Squeeze (TS) pooling, which can extract the essential movement information from a long sequence of video frames and map it into a set of few images , named Squeezed Images. By embedding the Temporal Squeeze pooling as a layer into off-the-shelf Convolution Neural Networks (CNN), we design a new video classification model, named Temporal Squeeze Network (TeSNet). The resulting Squeezed Images contain the essential movement information from the video frames, corresponding to the optimization of the video classification task. We evaluate our architecture on two video classification benchmarks, and the results achieved are compared to the state-of-the-art.

Proc. European Conference on Computer Vision (ECCV), 2020

The problem of catastrophic forgetting occurs in deep learning models trained on multiple databas... more The problem of catastrophic forgetting occurs in deep learning models trained on multiple databases in a sequential manner. Recently , generative replay mechanisms (GRM), have been proposed to reproduce previously learned knowledge aiming to reduce the forgetting. However, such approaches lack an appropriate inference model and therefore can not provide latent representations of data. In this paper, we propose a novel lifelong learning approach, namely the Lifelong VAEGAN (L-VAEGAN), which not only induces a powerful generative replay network but also learns meaningful latent representations, benefiting representation learning. L-VAEGAN can allow to automatically embed the information associated with different domains into several clusters in the latent space, while also capturing semantically meaningful shared latent variables, across different data domains. The proposed model supports many downstream tasks that traditional generative replay methods can not, including interpolation and inference across different data domains.

IEEE Transactions on Cybernetics, 2020

While 3-D steganography and digital watermarking represent methods for embedding information into... more While 3-D steganography and digital watermarking represent methods for embedding information into 3-D objects, 3-D steganalysis aims to find the hidden information. Previous research studies have shown that by estimating the parameters modeling the statistics of 3-D features and feeding them into a classifier we can identify whether a 3-D object carries secret information. For training the steganalyzer, such features are extracted from cover and stego pairs, representing the original 3-D objects and those carrying hidden information. However, in practical applications, the steganalyzer would have to distinguish stego-objects from cover-objects, which most likely have not been used during the training. This represents a significant challenge for existing steganalyzers, raising a challenge known as the cover source mismatch (CSM) problem, which is due to the significant limitation of their generalization ability. This paper proposes a novel feature selection algorithm taking into account both feature robustness and relevance in order to mitigate the CSM problem in 3-D steganalysis. In the context of the proposed methodology, new shapes are generated by distorting those used in the training. Then a subset of features is selected from a larger given set, by assessing their effectiveness in separating cover-objects from stego-objects among the generated sets of objects. Two different measures are used for selecting the appropriate features: 1) the Pearson correlation coefficient and 2) the mutual information criterion. Index Terms-3-D steganalysis, cover source mismatch, feature selection.

Digital Signal Processing, 2018

Human Group Activity Kullback–Leibler divergence Kernel Density Estimation Gaussian Mixture Model... more Human Group Activity Kullback–Leibler divergence Kernel Density Estimation Gaussian Mixture Models In this research study we adopt a probabilistic modelling of interactions in groups of people, using video sequences, leading to the recognition of their activities. Firstly, we model short smooth streams of localised movement. Afterwards, we partition the scene in regions of distinct movement, by using maximum a posteriori estimation, by fitting Gaussian Mixture Models (GMM) to the movement statistics. Interactions between moving regions are modelled using the Kullback–Leibler (KL) divergence between pairs of statistical representations of moving regions. Such interactions are considered with respect to the relative movement, moving region location and relative size, as well as to the dynamics of the movement and location inter-dependencies, respectively. The proposed methodology is assessed on two different data sets showing different categories of human interactions and group activities.

This chapter outlines a content based image retrieval (CBIR) methodology that takes into account ... more This chapter outlines a content based image retrieval (CBIR) methodology that takes into account the saliency in images. Natural images are depictions of real-life objects and scenes, usually set in cluttered environments. The performance of image retrieval in these scenarios may suffer because there is no way of knowing which parts of the image are of interest to the user. The human visual system provides a clue to what would of interest in the image, by involuntarily shifting the focus of attention to salient image areas. The application of computational models of selective visual attention to image understanding can produce better, unsupervised retrieval results by identifying perceptually important areas of the image that usually correspond to its semantic meaning, whilst discarding irrelevant information. This chapter explores the construction of a retrieval system incorporating a visual attention model and proposes a new method for selecting salient image regions, as well as embedding an improved representation for salient image edges for determining global image saliency.

This paper proposes an ortho-diffusion decomposition of graphs for estimating motion from image s... more This paper proposes an ortho-diffusion decomposition of graphs for estimating motion from image sequences. Orthonormal de-compositions of the adjacency matrix representations of image data are alternated with diffusions and data subsampling in order to robustly represent image features using undirected graphs. Modified Gram-Schmidt with pivoting the columns algorithm is applied recursively for the or-thonormal decompositions at various scales. This processing produces a set of ortho-diffusion bases and residual diffusion wavelets at each image representation scale. The optical flow is estimated using the similarity in the ortho-diffusion bases space extracted from regions of two different image frames.

In this paper, we propose an interpolation algorithm using a mathematical morphology morphing app... more In this paper, we propose an interpolation algorithm using a mathematical morphology morphing approach. The aim of this algorithm is to reconstruct the n-dimensional object from a group of (n - 1)-dimensional sets representing sections of that object. The morphing transformation modifies pairs of consecutive sets such that they approach in shape and size. The interpolated set is achieved when the two consecutive sets are made idempotent by the morphing transformation. We prove the convergence of the morphological morphing. The entire object is modeled by successively interpolating a certain number of intermediary sets between each two consecutive given sets. We apply the interpolation algorithm for three-dimensional tooth reconstruction.

In this research study, we propose an automatic group activity recognition approach by modelling ... more In this research study, we propose an automatic group activity recognition approach by modelling the interdepen-dencies of group activity features over time. Unlike in simple human activity recognition approaches, the distinguishing characteristics of group activities are often determined by how the movement of people are influenced by one another. We propose to model the group interdependences in both motion and location spaces. These spaces are extended to time-space and time-movement spaces and modelled using Kernel Density Estimation (KDE). Such representations are then fed into a machine learning classifier which identifies the group activity. Unlike other approaches to group activity recognition, we do not rely on the manual annotation of pedestrian tracks from the video sequence.

IEEE Transactions on Neural Networks and Learning Systems, 2021

Information Sciences, 2021

Proc. ICPR's Int. Workshop on Human and Vehicle Analysis for Intelligent Urban Computing (IUC), vol. LNCS , 2021

Proc. International Conference on Pattern Recognition (ICPR), 2020

Proc. International Conference on Image Processing Theory, Tools and Applications (IPTA), 2020

Proc. British Machine Vision Computer Vision Conference, 2020

ACM Turing Celebration Conference, 2019

IEEE Trans. on Pattern Analysis and Machine Learning, 2020

Information Sciences, 2020

Journal of Real Time Image Processing, 2020

British Machine Vision Conference (BMVC), 2019

Proc. IEEE Int. Conf. on Acoustics Systems and Signal Proc. (ICASSP), 2020

Proc. European Conference on Computer Vision (ECCV), 2020

IEEE Transactions on Cybernetics, 2020

Digital Signal Processing, 2018