Ashkan Panahi - Academia.edu (original) (raw)
Papers by Ashkan Panahi
Optical Fiber Communication Conference (OFC) 2023
MLaaS is introduced in the context of optical networks, and an architecture to take advantage of ... more MLaaS is introduced in the context of optical networks, and an architecture to take advantage of its potential is proposed. A use case of QoT classification using MLaaS techniques is benchmarked against state-of-the-art methods.
arXiv (Cornell University), Jun 13, 2019
Zero-shot learning (ZSL) which aims to recognize unseen object classes by only training on seen o... more Zero-shot learning (ZSL) which aims to recognize unseen object classes by only training on seen object classes, has increasingly been of great interest in Machine Learning, and has registered with some successes. Most existing ZSL methods typically learn a projection map between the visual feature space and the semantic space and mainly suffer which is prone to a projection domain shift primarily due to a large domain gap between seen and unseen classes. In this paper, we propose a novel inductive ZSL model based on projecting both visual and semantic features into a common distinct latent space with class-specific knowledge, and on reconstructing both visual and semantic features by such a distinct common space to narrow the domain shift gap. We show that all these constraints on the latent space, class-specific knowledge, reconstruction of features and their combinations enhance the robustness against the projection domain shift problem, and improve the generalization ability to unseen object classes. Comprehensive experiments on four benchmark datasets demonstrate that our proposed method is superior to state-of-the-art algorithms.
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018
Parametric approaches to Learning, such as deep learning (DL), are highly popular in nonlinear re... more Parametric approaches to Learning, such as deep learning (DL), are highly popular in nonlinear regression, in spite of their extremely difficult training with their increasing complexity (e.g. number of layers in DL). In this paper, we present an alternative semi-parametric framework which foregoes the ordinarily required feedback, by introducing the novel idea of geometric regularization. We show that certain deep learning techniques such as residual network (ResNet) architecture are closely related to our approach. Hence, our technique can be used to analyze these types of deep learning. Moreover, we present preliminary results which confirm that our approach can be easily trained to obtain complex structures.
Cornell University - arXiv, Oct 2, 2022
In this paper, we address the problem of unsupervised Domain Adaptation. The need for such an ada... more In this paper, we address the problem of unsupervised Domain Adaptation. The need for such an adaptation arises when the distribution of the target data differs from that which is used to develop the model and the ground truth information of the target data is unknown. We propose an algorithm that uses optimal transport theory with a verifiably efficient and implementable solution to learn the best latent feature representation. This is achieved by minimizing the cost of transporting the samples from the target domain to the distribution of the source domain.
The neighborhood function measures node centrality in graphs by measuring how many nodes a given ... more The neighborhood function measures node centrality in graphs by measuring how many nodes a given node can reach in a certain number of steps. The neighborhood function can for example be used to find central nodes or the degree of separation. The state-of-the-art algorithm, called HyperANF (Hyper Approximate Neighborhood Function), can calculate an approximate neighborhood function for graphs with billions of nodes within hours using a standard workstation [P. Boldi, M. Rosa, and S. Vigna, “Hyperanf: Approximating the neighbourhood function of very large graphs on a budget,” CoRR, vol. abs/1011.5599, 2010]. However, it only supports static graphs. If the neighborhood function should be calculated on a dynamic graph, the algorithm has to be re-run at any change in the graph. We develop a novel algorithm called Dynamic Approximate Neighborhood Function (DANF) which extends HyperANF to support dynamic graphs. In our algorithm, all relevant nodes are updated when new edges are added to ...
We consider the problem of estimating a variable number of parameters with a dynamic nature. A fa... more We consider the problem of estimating a variable number of parameters with a dynamic nature. A familiar example is finding the position of moving targets using sensor array observations. The problem is challenging in cases where either the observations are not reliable or the parameters evolve rapidly. Inspired by the sparsity based techniques, we introduce a novel Bayesian model for the problems of interest and study its associated recursive Bayesian filter. We propose an algorithm approximating the Bayesian filter, maintaining a reasonable amount of calculations. We compare by numerical evaluation the resulting technique to state-of-the-art algorithms in different scenarios. In a scenario with a low SNR, the proposed method outperforms other complex techniques. Index Terms Recursive Bayesian filter, Target tracking, Sparse estimation, Compressed sensing I. INTRODUCTION Estimating a dynamic set of parameters is a highly useful and wide area of research, with a long and fruitful history [1]. Indeed, noticing the ever increasing application of the Kalman filter and its variants to many newly developed technologies is enough to understand the importance of this topic. In this context, the quest for modified techniques usually concerns cases where either the currently existing methods fail to meet the computational limitations, or result in an insufficient precision. The latter may also be either due to an inconsistent model, on which the technique is based, or simply A.
Proceedings of the 31st ACM International Conference on Information & Knowledge Management
Knowledge transfer is shown to be a very successful technique for training neural classifiers: to... more Knowledge transfer is shown to be a very successful technique for training neural classifiers: together with the ground truth data, it uses the "privileged information" (PI) obtained by a "teacher" network to train a "student" network. It has been observed that classifiers learn much faster and more reliably via knowledge transfer. However, there has been little or no theoretical analysis of this phenomenon. To bridge this gap, we propose to approach the problem of knowledge transfer by regularizing the fit between the teacher and the student with PI provided by the teacher. Using tools from dynamical systems theory, we show that when the student is an extremely wide two layer network, we can analyze it in the kernel regime and show that it is able to interpolate between PI and the given data. This characterization sheds new light on the relation between the training error and capacity of the student relative to the teacher. Another contribution of the paper is a quantitative statement on the convergence of student network. We prove that the teacher reduces the number of required iterations for a student to learn, and consequently improves the generalization power of the student. We give corresponding experimental analysis that validates the theoretical results and yield additional insights. CCS CONCEPTS • Computing methodologies → Neural networks.
We propose a new regularizer for optimal transport (OT) which is tailored to better preserve the ... more We propose a new regularizer for optimal transport (OT) which is tailored to better preserve the class structure of the subjected process. Accordingly, we provide the first theoretical guarantees for an OT scheme that respects class structure. We derive an accelerated proximal algorithm with a closed form projection and proximal operator scheme thereby affording a highly scalable algorithm for computing optimal transport plans. We provide a novel argument for the uniqueness of the optimum even in the absence of strong convexity. Our experiments show that the new regularizer does not only result in a better preservation of the class structure but also in additional robustness relative to previous regularizers.
arXiv: Learning, Sep 13, 2019
We present a novel adversarial framework for training deep belief networks (DBNs), which includes... more We present a novel adversarial framework for training deep belief networks (DBNs), which includes replacing the generator network in the methodology of generative adversarial networks (GANs) with a DBN and developing a highly parallelizable numerical algorithm for training the resulting architecture in a stochastic manner. Unlike the existing techniques, this framework can be applied to the most general form of DBNs with no requirement for back propagation. As such, it lays a new foundation for developing DBNs on a par with GANs with various regularization units, such as pooling and normalization. Foregoing back-propagation, our framework also exhibits superior scalability as compared to other DBN and GAN learning techniques. We present a number of numerical experiments in computer vision as well as neurosciences to illustrate the main advantages of our approach.
2018 26th European Signal Processing Conference (EUSIPCO), 2018
Target detection embedded in a complex interference background such as jamming or strong clutter ... more Target detection embedded in a complex interference background such as jamming or strong clutter is an important problem in signal processing. Traditionally, statistical adaptive detection processes are built from a binary hypothesis test performing on a grid of steering vectors. This usually involves the estimation of the noise-plus-interference covariance matrix using i.i.d. samples assumed to be target-free. Moving from this paradigm, we exploit the fact that the interference (clutter and/or jammers) lies in a union of low-dimensional subspaces. Hence, the matrix of concatenated samples can be modeled as a sum of low-rank matrices (union of subspaces containing interferences) plus a sparse matrix times a dictionary of steering-vectors (representing the targets contribution). Recovering this factorization from the observation matrix allows to build detection maps for each sample. To perform such recovery, we propose a generalized version of the robust subspace recovery via bi-sparsity pursuit algorithm [1]. Experimental results on a real data set highlight the interest of the approach.
The SPS-LASSO has recently been introduced as a solution to the problem of regularization paramet... more The SPS-LASSO has recently been introduced as a solution to the problem of regularization parameter selection in the complex-valued LASSO problem. Still, the dependence on the grid size and the polynomial time of performing convex optimization technique in each iteration, in addition to the deficiencies in the low noise regime, confines its performance for Direction of Arrival (DOA) estimation. This work presents methods to apply LASSO without grid size limitation and with less complexity. As we show by simulations, the proposed methods loose a negligible performance compared to the Maximum Likelihood (ML) estimator, which needs a combinatorial search We also show by simulations that compared to practical implementations of ML, the proposed techniques are less sensitive to the source power difference.
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019
We propose a deep learning architecture capable of performing up to 8× single image super-resolut... more We propose a deep learning architecture capable of performing up to 8× single image super-resolution. Our architecture incorporates an adversarial component from the super-resolution generative adversarial networks (SRGANs) and a multi-scale learning component from the multiple scale super-resolution network (MSSRNet), which only together can recover smaller structures inherent in satellite images. To further enhance our performance, we integrate progressive growing and training to our network. This, aided by feed forwarding connections in the network to move along and enrich information from previous inputs, produces super-resolved images at scaling factors of 2, 4, and 8. To ensure and enhance the stability of GANs, we employ Wasserstein GANs (WGANs) during training. Experimentally, we find that our architecture can recover small objects in satellite images during super-resolution whereas previous methods cannot.
2018 26th European Signal Processing Conference (EUSIPCO), 2018
Union of Subspaces (UoS) is a new paradigm for signal modeling and processing, which is capable o... more Union of Subspaces (UoS) is a new paradigm for signal modeling and processing, which is capable of identifying more complex trends in data sets than simple linear models. Relying on a bi-sparsity pursuit framework and advanced nonsmooth optimization techniques, the Robust Subspace Recovery (RoSuRe) algorithm was introduced in the recent literature as a reliable and numerically efficient algorithm to unfold unions of subspaces. In this study, we apply RoSuRe to prospect the structure of a data type (e.g. sensed data on vehicle through passive audio and magnetic observations). Applying RoSuRe to the observation data set, we obtain a new representation of the time series, respecting an underlying UoS model. We subsequently employ Spectral Clustering on the new representations of the data set. The classification performance on the dataset shows a considerable improvement compared to direct application of other unsupervised clustering methods.
arXiv: Computer Vision and Pattern Recognition, 2019
Zero-shot learning (ZSL) has been widely researched and achieved a great success in machine learn... more Zero-shot learning (ZSL) has been widely researched and achieved a great success in machine learning, which aims to recognize unseen object classes by only training on seen object classes. Most existing ZSL methods are typically to learn a projection function between visual feature space and semantic space and mainly suffer a projection domain shift problem, as there is often a large domain gap between seen and unseen classes. In this paper, we proposed a novel inductive ZSL model based on project both visual and semantic features into a common distinct latent space with class-specific knowledge and reconstruct both visual and semantic features by such a distinct common space to narrow the domain shift gap. We show that all these constraints of the latent space, class-specific knowledge, reconstruction of features and their combinations enhance the robustness against the projection domain shift problem and improve the generalization ability to unseen object classes. Comprehensive ex...
2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018
We consider subspace clustering under sparse noise, for which a non-convex optimization framework... more We consider subspace clustering under sparse noise, for which a non-convex optimization framework based on sparse data representations has been recently developed. This setup is suitable for a large variety of applications with high dimensional data, such as image processing, which is naturally decomposed into a sparse unstructured foreground and a background residing in a union of low-dimensional subspaces. In this framework, we further discuss both performance and implementation of the key optimization problem. We provide an analysis of this optimization problem demonstrating that our approach is capable of recovering linear subspaces as a local optimal solution for sufficiently large data sets and sparse noise vectors. We also propose a sequential algorithmic solution, which is particularly useful for extremely large data sets and online vision applications such as video processing.
ArXiv, 2020
Knowledge distillation (KD), i.e. one classifier being trained on the outputs of another classifi... more Knowledge distillation (KD), i.e. one classifier being trained on the outputs of another classifier, is an empirically very successful technique for knowledge transfer between classifiers. It has even been observed that classifiers learn much faster and more reliably if trained with the outputs of another classifier as soft labels, instead of from ground truth data. However, there has been little or no theoretical analysis of this phenomenon. We provide the first theoretical analysis of KD in the setting of extremely wide two layer non-linear networks in model and regime in (Arora et al., 2019; Du & Hu, 2019; Cao & Gu, 2019). We prove results on what the student network learns and on the rate of convergence for the student network. Intriguingly, we also confirm the lottery ticket hypothesis (Frankle & Carbin, 2019) in this model. To prove our results, we extend the repertoire of techniques from linear systems dynamics. We give corresponding experimental analysis that validates the t...
Standard clustering methods such as K-means, Gaussian mixture models, and hierarchical clustering... more Standard clustering methods such as K-means, Gaussian mixture models, and hierarchical clustering, arc beset by local minima, which are sometimes drastically suboptimal. Moreover the number of clusters K must be known in advance. The recently introduced sum-of-norms (SON) or Clusterpath convex relaxation of k-means and hierarchical clustering shrinks cluster centroids toward one another and ensure a unique global minimizer. We give a scalable stochastic incremental algorithm based on proximal iterations to solve the SON problem with convergence guarantees. We also show that the algorithm recovers clusters under quite general conditions which have a similar form to the unifying proximity condition introduced in the approximation algorithms community (that covers paradigm cases such as Gaussian mixtures and planted partition models). We give experimental results to confirm that our algorithm scales much better than previous methods while producing clusters of comparable quality.
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021
The purpose of this paper is to infer a global (collective) model of time-varying responses of a ... more The purpose of this paper is to infer a global (collective) model of time-varying responses of a set of nodes as a dynamic graph, where the individual time series are respectively observed at each of the nodes. The motivation of this work lies in the search for a connectome model which properly captures brain functionality upon observing activities in different regions of the brain and possibly of individual neurons. We formulate the problem as a quadratic objective functional of observed node signals over short time intervals, subjected to the proper regularization reflecting the graph smoothness and other dynamics involving the underlying graph's Laplacian, as well as the time evolution smoothness of the underlying graph. The resulting joint optimization is solved by a continuous relaxation and an introduced novel gradient-projection scheme. We apply our algorithm to a real-world dataset comprising recorded activities of individual brain cells. The resulting model is shown to not only be viable but also efficiently computable.
A problem that has been of recent interest in statistical inference, machine learning and signal ... more A problem that has been of recent interest in statistical inference, machine learning and signal processing is that of understanding the asymptotic behavior of regularized least squares solutions under random measurement matrices (or dictionaries). The Least Absolute Shrinkage and Selection Operator (LASSO or least-squares with ell_1\ell_1ell_1 regularization) is perhaps one of the most interesting examples. Precise expressions for the asymptotic performance of LASSO have been obtained for a number of different cases, in particular when the elements of the dictionary matrix are sampled independently from a Gaussian distribution. It has also been empirically observed that the resulting expressions remain valid when the entries of the dictionary matrix are independently sampled from certain non-Gaussian distributions. In this paper, we confirm these observations theoretically when the distribution is sub-Gaussian. We further generalize the previous expressions for a broader family of regulari...
ArXiv, 2019
We introduce a new regularizer for optimal transport (OT) which is tailored to better preserve th... more We introduce a new regularizer for optimal transport (OT) which is tailored to better preserve the class structure. We give the first theoretical guarantees for an OT scheme that respects class structure. We give an accelerated proximal--projection scheme for this formulation with the proximal operator in closed form to give a highly scalable algorithm for computing optimal transport plans. We give a novel argument for the uniqueness of the optimum even in the absence of strong convexity. Our experiments show that the new regularizer preserves class structure better and is more robust compared to previous regularizers.
Optical Fiber Communication Conference (OFC) 2023
MLaaS is introduced in the context of optical networks, and an architecture to take advantage of ... more MLaaS is introduced in the context of optical networks, and an architecture to take advantage of its potential is proposed. A use case of QoT classification using MLaaS techniques is benchmarked against state-of-the-art methods.
arXiv (Cornell University), Jun 13, 2019
Zero-shot learning (ZSL) which aims to recognize unseen object classes by only training on seen o... more Zero-shot learning (ZSL) which aims to recognize unseen object classes by only training on seen object classes, has increasingly been of great interest in Machine Learning, and has registered with some successes. Most existing ZSL methods typically learn a projection map between the visual feature space and the semantic space and mainly suffer which is prone to a projection domain shift primarily due to a large domain gap between seen and unseen classes. In this paper, we propose a novel inductive ZSL model based on projecting both visual and semantic features into a common distinct latent space with class-specific knowledge, and on reconstructing both visual and semantic features by such a distinct common space to narrow the domain shift gap. We show that all these constraints on the latent space, class-specific knowledge, reconstruction of features and their combinations enhance the robustness against the projection domain shift problem, and improve the generalization ability to unseen object classes. Comprehensive experiments on four benchmark datasets demonstrate that our proposed method is superior to state-of-the-art algorithms.
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018
Parametric approaches to Learning, such as deep learning (DL), are highly popular in nonlinear re... more Parametric approaches to Learning, such as deep learning (DL), are highly popular in nonlinear regression, in spite of their extremely difficult training with their increasing complexity (e.g. number of layers in DL). In this paper, we present an alternative semi-parametric framework which foregoes the ordinarily required feedback, by introducing the novel idea of geometric regularization. We show that certain deep learning techniques such as residual network (ResNet) architecture are closely related to our approach. Hence, our technique can be used to analyze these types of deep learning. Moreover, we present preliminary results which confirm that our approach can be easily trained to obtain complex structures.
Cornell University - arXiv, Oct 2, 2022
In this paper, we address the problem of unsupervised Domain Adaptation. The need for such an ada... more In this paper, we address the problem of unsupervised Domain Adaptation. The need for such an adaptation arises when the distribution of the target data differs from that which is used to develop the model and the ground truth information of the target data is unknown. We propose an algorithm that uses optimal transport theory with a verifiably efficient and implementable solution to learn the best latent feature representation. This is achieved by minimizing the cost of transporting the samples from the target domain to the distribution of the source domain.
The neighborhood function measures node centrality in graphs by measuring how many nodes a given ... more The neighborhood function measures node centrality in graphs by measuring how many nodes a given node can reach in a certain number of steps. The neighborhood function can for example be used to find central nodes or the degree of separation. The state-of-the-art algorithm, called HyperANF (Hyper Approximate Neighborhood Function), can calculate an approximate neighborhood function for graphs with billions of nodes within hours using a standard workstation [P. Boldi, M. Rosa, and S. Vigna, “Hyperanf: Approximating the neighbourhood function of very large graphs on a budget,” CoRR, vol. abs/1011.5599, 2010]. However, it only supports static graphs. If the neighborhood function should be calculated on a dynamic graph, the algorithm has to be re-run at any change in the graph. We develop a novel algorithm called Dynamic Approximate Neighborhood Function (DANF) which extends HyperANF to support dynamic graphs. In our algorithm, all relevant nodes are updated when new edges are added to ...
We consider the problem of estimating a variable number of parameters with a dynamic nature. A fa... more We consider the problem of estimating a variable number of parameters with a dynamic nature. A familiar example is finding the position of moving targets using sensor array observations. The problem is challenging in cases where either the observations are not reliable or the parameters evolve rapidly. Inspired by the sparsity based techniques, we introduce a novel Bayesian model for the problems of interest and study its associated recursive Bayesian filter. We propose an algorithm approximating the Bayesian filter, maintaining a reasonable amount of calculations. We compare by numerical evaluation the resulting technique to state-of-the-art algorithms in different scenarios. In a scenario with a low SNR, the proposed method outperforms other complex techniques. Index Terms Recursive Bayesian filter, Target tracking, Sparse estimation, Compressed sensing I. INTRODUCTION Estimating a dynamic set of parameters is a highly useful and wide area of research, with a long and fruitful history [1]. Indeed, noticing the ever increasing application of the Kalman filter and its variants to many newly developed technologies is enough to understand the importance of this topic. In this context, the quest for modified techniques usually concerns cases where either the currently existing methods fail to meet the computational limitations, or result in an insufficient precision. The latter may also be either due to an inconsistent model, on which the technique is based, or simply A.
Proceedings of the 31st ACM International Conference on Information & Knowledge Management
Knowledge transfer is shown to be a very successful technique for training neural classifiers: to... more Knowledge transfer is shown to be a very successful technique for training neural classifiers: together with the ground truth data, it uses the "privileged information" (PI) obtained by a "teacher" network to train a "student" network. It has been observed that classifiers learn much faster and more reliably via knowledge transfer. However, there has been little or no theoretical analysis of this phenomenon. To bridge this gap, we propose to approach the problem of knowledge transfer by regularizing the fit between the teacher and the student with PI provided by the teacher. Using tools from dynamical systems theory, we show that when the student is an extremely wide two layer network, we can analyze it in the kernel regime and show that it is able to interpolate between PI and the given data. This characterization sheds new light on the relation between the training error and capacity of the student relative to the teacher. Another contribution of the paper is a quantitative statement on the convergence of student network. We prove that the teacher reduces the number of required iterations for a student to learn, and consequently improves the generalization power of the student. We give corresponding experimental analysis that validates the theoretical results and yield additional insights. CCS CONCEPTS • Computing methodologies → Neural networks.
We propose a new regularizer for optimal transport (OT) which is tailored to better preserve the ... more We propose a new regularizer for optimal transport (OT) which is tailored to better preserve the class structure of the subjected process. Accordingly, we provide the first theoretical guarantees for an OT scheme that respects class structure. We derive an accelerated proximal algorithm with a closed form projection and proximal operator scheme thereby affording a highly scalable algorithm for computing optimal transport plans. We provide a novel argument for the uniqueness of the optimum even in the absence of strong convexity. Our experiments show that the new regularizer does not only result in a better preservation of the class structure but also in additional robustness relative to previous regularizers.
arXiv: Learning, Sep 13, 2019
We present a novel adversarial framework for training deep belief networks (DBNs), which includes... more We present a novel adversarial framework for training deep belief networks (DBNs), which includes replacing the generator network in the methodology of generative adversarial networks (GANs) with a DBN and developing a highly parallelizable numerical algorithm for training the resulting architecture in a stochastic manner. Unlike the existing techniques, this framework can be applied to the most general form of DBNs with no requirement for back propagation. As such, it lays a new foundation for developing DBNs on a par with GANs with various regularization units, such as pooling and normalization. Foregoing back-propagation, our framework also exhibits superior scalability as compared to other DBN and GAN learning techniques. We present a number of numerical experiments in computer vision as well as neurosciences to illustrate the main advantages of our approach.
2018 26th European Signal Processing Conference (EUSIPCO), 2018
Target detection embedded in a complex interference background such as jamming or strong clutter ... more Target detection embedded in a complex interference background such as jamming or strong clutter is an important problem in signal processing. Traditionally, statistical adaptive detection processes are built from a binary hypothesis test performing on a grid of steering vectors. This usually involves the estimation of the noise-plus-interference covariance matrix using i.i.d. samples assumed to be target-free. Moving from this paradigm, we exploit the fact that the interference (clutter and/or jammers) lies in a union of low-dimensional subspaces. Hence, the matrix of concatenated samples can be modeled as a sum of low-rank matrices (union of subspaces containing interferences) plus a sparse matrix times a dictionary of steering-vectors (representing the targets contribution). Recovering this factorization from the observation matrix allows to build detection maps for each sample. To perform such recovery, we propose a generalized version of the robust subspace recovery via bi-sparsity pursuit algorithm [1]. Experimental results on a real data set highlight the interest of the approach.
The SPS-LASSO has recently been introduced as a solution to the problem of regularization paramet... more The SPS-LASSO has recently been introduced as a solution to the problem of regularization parameter selection in the complex-valued LASSO problem. Still, the dependence on the grid size and the polynomial time of performing convex optimization technique in each iteration, in addition to the deficiencies in the low noise regime, confines its performance for Direction of Arrival (DOA) estimation. This work presents methods to apply LASSO without grid size limitation and with less complexity. As we show by simulations, the proposed methods loose a negligible performance compared to the Maximum Likelihood (ML) estimator, which needs a combinatorial search We also show by simulations that compared to practical implementations of ML, the proposed techniques are less sensitive to the source power difference.
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019
We propose a deep learning architecture capable of performing up to 8× single image super-resolut... more We propose a deep learning architecture capable of performing up to 8× single image super-resolution. Our architecture incorporates an adversarial component from the super-resolution generative adversarial networks (SRGANs) and a multi-scale learning component from the multiple scale super-resolution network (MSSRNet), which only together can recover smaller structures inherent in satellite images. To further enhance our performance, we integrate progressive growing and training to our network. This, aided by feed forwarding connections in the network to move along and enrich information from previous inputs, produces super-resolved images at scaling factors of 2, 4, and 8. To ensure and enhance the stability of GANs, we employ Wasserstein GANs (WGANs) during training. Experimentally, we find that our architecture can recover small objects in satellite images during super-resolution whereas previous methods cannot.
2018 26th European Signal Processing Conference (EUSIPCO), 2018
Union of Subspaces (UoS) is a new paradigm for signal modeling and processing, which is capable o... more Union of Subspaces (UoS) is a new paradigm for signal modeling and processing, which is capable of identifying more complex trends in data sets than simple linear models. Relying on a bi-sparsity pursuit framework and advanced nonsmooth optimization techniques, the Robust Subspace Recovery (RoSuRe) algorithm was introduced in the recent literature as a reliable and numerically efficient algorithm to unfold unions of subspaces. In this study, we apply RoSuRe to prospect the structure of a data type (e.g. sensed data on vehicle through passive audio and magnetic observations). Applying RoSuRe to the observation data set, we obtain a new representation of the time series, respecting an underlying UoS model. We subsequently employ Spectral Clustering on the new representations of the data set. The classification performance on the dataset shows a considerable improvement compared to direct application of other unsupervised clustering methods.
arXiv: Computer Vision and Pattern Recognition, 2019
Zero-shot learning (ZSL) has been widely researched and achieved a great success in machine learn... more Zero-shot learning (ZSL) has been widely researched and achieved a great success in machine learning, which aims to recognize unseen object classes by only training on seen object classes. Most existing ZSL methods are typically to learn a projection function between visual feature space and semantic space and mainly suffer a projection domain shift problem, as there is often a large domain gap between seen and unseen classes. In this paper, we proposed a novel inductive ZSL model based on project both visual and semantic features into a common distinct latent space with class-specific knowledge and reconstruct both visual and semantic features by such a distinct common space to narrow the domain shift gap. We show that all these constraints of the latent space, class-specific knowledge, reconstruction of features and their combinations enhance the robustness against the projection domain shift problem and improve the generalization ability to unseen object classes. Comprehensive ex...
2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018
We consider subspace clustering under sparse noise, for which a non-convex optimization framework... more We consider subspace clustering under sparse noise, for which a non-convex optimization framework based on sparse data representations has been recently developed. This setup is suitable for a large variety of applications with high dimensional data, such as image processing, which is naturally decomposed into a sparse unstructured foreground and a background residing in a union of low-dimensional subspaces. In this framework, we further discuss both performance and implementation of the key optimization problem. We provide an analysis of this optimization problem demonstrating that our approach is capable of recovering linear subspaces as a local optimal solution for sufficiently large data sets and sparse noise vectors. We also propose a sequential algorithmic solution, which is particularly useful for extremely large data sets and online vision applications such as video processing.
ArXiv, 2020
Knowledge distillation (KD), i.e. one classifier being trained on the outputs of another classifi... more Knowledge distillation (KD), i.e. one classifier being trained on the outputs of another classifier, is an empirically very successful technique for knowledge transfer between classifiers. It has even been observed that classifiers learn much faster and more reliably if trained with the outputs of another classifier as soft labels, instead of from ground truth data. However, there has been little or no theoretical analysis of this phenomenon. We provide the first theoretical analysis of KD in the setting of extremely wide two layer non-linear networks in model and regime in (Arora et al., 2019; Du & Hu, 2019; Cao & Gu, 2019). We prove results on what the student network learns and on the rate of convergence for the student network. Intriguingly, we also confirm the lottery ticket hypothesis (Frankle & Carbin, 2019) in this model. To prove our results, we extend the repertoire of techniques from linear systems dynamics. We give corresponding experimental analysis that validates the t...
Standard clustering methods such as K-means, Gaussian mixture models, and hierarchical clustering... more Standard clustering methods such as K-means, Gaussian mixture models, and hierarchical clustering, arc beset by local minima, which are sometimes drastically suboptimal. Moreover the number of clusters K must be known in advance. The recently introduced sum-of-norms (SON) or Clusterpath convex relaxation of k-means and hierarchical clustering shrinks cluster centroids toward one another and ensure a unique global minimizer. We give a scalable stochastic incremental algorithm based on proximal iterations to solve the SON problem with convergence guarantees. We also show that the algorithm recovers clusters under quite general conditions which have a similar form to the unifying proximity condition introduced in the approximation algorithms community (that covers paradigm cases such as Gaussian mixtures and planted partition models). We give experimental results to confirm that our algorithm scales much better than previous methods while producing clusters of comparable quality.
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021
The purpose of this paper is to infer a global (collective) model of time-varying responses of a ... more The purpose of this paper is to infer a global (collective) model of time-varying responses of a set of nodes as a dynamic graph, where the individual time series are respectively observed at each of the nodes. The motivation of this work lies in the search for a connectome model which properly captures brain functionality upon observing activities in different regions of the brain and possibly of individual neurons. We formulate the problem as a quadratic objective functional of observed node signals over short time intervals, subjected to the proper regularization reflecting the graph smoothness and other dynamics involving the underlying graph's Laplacian, as well as the time evolution smoothness of the underlying graph. The resulting joint optimization is solved by a continuous relaxation and an introduced novel gradient-projection scheme. We apply our algorithm to a real-world dataset comprising recorded activities of individual brain cells. The resulting model is shown to not only be viable but also efficiently computable.
A problem that has been of recent interest in statistical inference, machine learning and signal ... more A problem that has been of recent interest in statistical inference, machine learning and signal processing is that of understanding the asymptotic behavior of regularized least squares solutions under random measurement matrices (or dictionaries). The Least Absolute Shrinkage and Selection Operator (LASSO or least-squares with ell_1\ell_1ell_1 regularization) is perhaps one of the most interesting examples. Precise expressions for the asymptotic performance of LASSO have been obtained for a number of different cases, in particular when the elements of the dictionary matrix are sampled independently from a Gaussian distribution. It has also been empirically observed that the resulting expressions remain valid when the entries of the dictionary matrix are independently sampled from certain non-Gaussian distributions. In this paper, we confirm these observations theoretically when the distribution is sub-Gaussian. We further generalize the previous expressions for a broader family of regulari...
ArXiv, 2019
We introduce a new regularizer for optimal transport (OT) which is tailored to better preserve th... more We introduce a new regularizer for optimal transport (OT) which is tailored to better preserve the class structure. We give the first theoretical guarantees for an OT scheme that respects class structure. We give an accelerated proximal--projection scheme for this formulation with the proximal operator in closed form to give a highly scalable algorithm for computing optimal transport plans. We give a novel argument for the uniqueness of the optimum even in the absence of strong convexity. Our experiments show that the new regularizer preserves class structure better and is more robust compared to previous regularizers.