Mehran Khodabandeh - Academia.edu (original) (raw)
Uploads
Papers by Mehran Khodabandeh
arXiv (Cornell University), Jun 7, 2017
We propose a general purpose active learning algorithm for structured prediction-gathering labele... more We propose a general purpose active learning algorithm for structured prediction-gathering labeled data for training a model that outputs a set of related labels for an image/video. Active learning starts with a limited initial training set, then iterates querying a user for labels on unlabeled data and retraining the model. We propose a novel algorithm for selecting data for labeling, choosing examples to maximize expected information gain based on belief propagation inference. This is a general purpose method and can be applied to a variety of tasks/models. As a specific example we demonstrate this framework for learning to recognize human actions and group activities in video sequences. Experiments show that our proposed algorithm outperforms previous active learning methods and can achieve accuracy comparable to fully supervised methods while utilizing significantly less labeled data.
ArXiv, 2018
Discriminative learning machines often need a large set of labeled samples for training. Active l... more Discriminative learning machines often need a large set of labeled samples for training. Active learning (AL) settings assume that the learner has the freedom to ask an oracle to label its desired samples. Traditional AL algorithms heuristically choose query samples about which the current learner is uncertain. This strategy does not make good use of the structure of the dataset at hand and is prone to be misguided by outliers. To alleviate this problem, we propose to distill the structural information into a probabilistic generative model which acts as a \emph{teacher} in our model. The active \emph{learner} uses this information effectively at each cycle of active learning. The proposed method is generic and does not depend on the type of learner and teacher. We then suggest a query criterion for active learning that is aware of distribution of data and is more robust against outliers. Our method can be combined readily with several other query criteria for active learning. We pro...
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2018
The recent successes in applying deep learning techniques to solve standard computer vision probl... more The recent successes in applying deep learning techniques to solve standard computer vision problems has aspired researchers to propose new computer vision problems in different domains. As previously established in the field, training data itself plays a significant role in the machine learning process, especially deep learning approaches which are data hungry. In order to solve each new problem and get a decent performance, a large amount of data needs to be captured which may in many cases pose logistical difficulties. Therefore, the ability to generate de novo data or expand an existing data set, however small, in order to satisfy data requirement of current networks may be invaluable. Herein, we introduce a novel way to partition an action video clip into action, subject and context. Each part is manipulated separately and reassembled with our proposed video generation technique. Furthermore, our novel human skeleton trajectory generation along with our proposed video generation technique, enables us to generate unlimited action recognition training data. These techniques enables us to generate video action clips from an small set without costly and time-consuming data acquisition. Lastly, we prove through extensive set of experiments on two small human action recognition data sets, that this new data generation technique can improve the performance of current action recognition neural nets.
ABSTRACTBackgroundStrongestPath is a Cytoscape 3 application that enables to look for one or more... more ABSTRACTBackgroundStrongestPath is a Cytoscape 3 application that enables to look for one or more cascades of interactions connecting two single or groups of proteins in a collection of protein-protein interaction (PPI) network or signaling network databases. When there are different levels of confidence over the interactions, it is able to process them and identify the cascade of interactions having the highest total confidence score. Given a set of proteins, StrongestPath can extract and show the network of interactions among them from the given databases, and expand the network by adding new proteins having the most interactions with highest total confidence to the current proteins. The application can also identify any activation or inhibition regulatory paths between two distinct sets of transcription factors and target genes. This application can be either used with a set of built-in human and mouse PPI or signaling databases, or any user-provided database for some organism.Re...
2016 23rd International Conference on Pattern Recognition (ICPR), 2016
We present an algorithm for learning a feature representation for video segmentation. Standard vi... more We present an algorithm for learning a feature representation for video segmentation. Standard video segmentation algorithms utilize similarity measurements in order to group related pixels. The contribution of our paper is an unsupervised method for learning the feature representation used for this similarity. The feature representation is defined over video supervoxels. An embedding framework learns a feature mapping for supervoxels in an unsupervised fashion such that supervoxels with similar context have similar embeddings. Based on the learned representation, we can merge similar supervoxels into spatio-temporal segments. Experimental results demonstrate the effectiveness of this learned supervoxel embedding on standard benchmark data.
2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2015
We present a novel approach for discovering human interactions in videos. Activity understanding ... more We present a novel approach for discovering human interactions in videos. Activity understanding techniques usually require a large number of labeled examples, which are not available in many practical cases. Here, we focus on recovering semantically meaningful clusters of human-human and human-object interaction in an unsupervised fashion. A new iterative solution is introduced based on Maximum Margin Clustering (MMC), which also accepts user feedback to refine clusters. This is achieved by formulating the whole process as a unified constrained latent max-margin clustering problem. Extensive experiments have been carried out over three challenging datasets, Collective Activity, VIRAT, and UT-interaction. Empirical results demonstrate that the proposed algorithm can efficiently discover perfect semantic clusters of human interactions with only a small amount of labeling effort.
2015 14th IAPR International Conference on Machine Vision Applications (MVA), 2015
Detecting objects such as humans or vehicles is a central problem in surveillance video. Myriad s... more Detecting objects such as humans or vehicles is a central problem in surveillance video. Myriad standard approaches exist for this problem. At their core, approaches consider either the appearance of people, patterns of their motion, or differences from the background. In this paper we build on dense trajectories, a state-of-the-art approach for describing spatiotemporal patterns in video sequences. We demonstrate an application of dense trajectories to object detection in surveillance video, showing that they can be used to both regress estimates of object locations and accurately classify objects.
2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019
Domain shift is unavoidable in real-world applications of object detection. For example, in self-... more Domain shift is unavoidable in real-world applications of object detection. For example, in self-driving cars, the target domain consists of unconstrained road environments which cannot all possibly be observed in training data. Similarly, in surveillance applications sufficiently representative training data may be lacking due to privacy regulations. In this paper, we address the domain adaptation problem from the perspective of robust learning and show that the problem may be formulated as training with noisy labels. We propose a robust object detection framework that is resilient to noise in bounding box class labels, locations and size annotations. To adapt to the domain shift, the model is trained on the target domain using a set of noisy object bounding boxes that are obtained by a detection model trained only in the source domain. We evaluate the accuracy of our approach in various source/target domain pairs and demonstrate that the model significantly improves the state-of-the-art on multiple domain adaptation scenarios on the SIM10K, Cityscapes and KITTI datasets.
arXiv (Cornell University), Jun 7, 2017
We propose a general purpose active learning algorithm for structured prediction-gathering labele... more We propose a general purpose active learning algorithm for structured prediction-gathering labeled data for training a model that outputs a set of related labels for an image/video. Active learning starts with a limited initial training set, then iterates querying a user for labels on unlabeled data and retraining the model. We propose a novel algorithm for selecting data for labeling, choosing examples to maximize expected information gain based on belief propagation inference. This is a general purpose method and can be applied to a variety of tasks/models. As a specific example we demonstrate this framework for learning to recognize human actions and group activities in video sequences. Experiments show that our proposed algorithm outperforms previous active learning methods and can achieve accuracy comparable to fully supervised methods while utilizing significantly less labeled data.
ArXiv, 2018
Discriminative learning machines often need a large set of labeled samples for training. Active l... more Discriminative learning machines often need a large set of labeled samples for training. Active learning (AL) settings assume that the learner has the freedom to ask an oracle to label its desired samples. Traditional AL algorithms heuristically choose query samples about which the current learner is uncertain. This strategy does not make good use of the structure of the dataset at hand and is prone to be misguided by outliers. To alleviate this problem, we propose to distill the structural information into a probabilistic generative model which acts as a \emph{teacher} in our model. The active \emph{learner} uses this information effectively at each cycle of active learning. The proposed method is generic and does not depend on the type of learner and teacher. We then suggest a query criterion for active learning that is aware of distribution of data and is more robust against outliers. Our method can be combined readily with several other query criteria for active learning. We pro...
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2018
The recent successes in applying deep learning techniques to solve standard computer vision probl... more The recent successes in applying deep learning techniques to solve standard computer vision problems has aspired researchers to propose new computer vision problems in different domains. As previously established in the field, training data itself plays a significant role in the machine learning process, especially deep learning approaches which are data hungry. In order to solve each new problem and get a decent performance, a large amount of data needs to be captured which may in many cases pose logistical difficulties. Therefore, the ability to generate de novo data or expand an existing data set, however small, in order to satisfy data requirement of current networks may be invaluable. Herein, we introduce a novel way to partition an action video clip into action, subject and context. Each part is manipulated separately and reassembled with our proposed video generation technique. Furthermore, our novel human skeleton trajectory generation along with our proposed video generation technique, enables us to generate unlimited action recognition training data. These techniques enables us to generate video action clips from an small set without costly and time-consuming data acquisition. Lastly, we prove through extensive set of experiments on two small human action recognition data sets, that this new data generation technique can improve the performance of current action recognition neural nets.
ABSTRACTBackgroundStrongestPath is a Cytoscape 3 application that enables to look for one or more... more ABSTRACTBackgroundStrongestPath is a Cytoscape 3 application that enables to look for one or more cascades of interactions connecting two single or groups of proteins in a collection of protein-protein interaction (PPI) network or signaling network databases. When there are different levels of confidence over the interactions, it is able to process them and identify the cascade of interactions having the highest total confidence score. Given a set of proteins, StrongestPath can extract and show the network of interactions among them from the given databases, and expand the network by adding new proteins having the most interactions with highest total confidence to the current proteins. The application can also identify any activation or inhibition regulatory paths between two distinct sets of transcription factors and target genes. This application can be either used with a set of built-in human and mouse PPI or signaling databases, or any user-provided database for some organism.Re...
2016 23rd International Conference on Pattern Recognition (ICPR), 2016
We present an algorithm for learning a feature representation for video segmentation. Standard vi... more We present an algorithm for learning a feature representation for video segmentation. Standard video segmentation algorithms utilize similarity measurements in order to group related pixels. The contribution of our paper is an unsupervised method for learning the feature representation used for this similarity. The feature representation is defined over video supervoxels. An embedding framework learns a feature mapping for supervoxels in an unsupervised fashion such that supervoxels with similar context have similar embeddings. Based on the learned representation, we can merge similar supervoxels into spatio-temporal segments. Experimental results demonstrate the effectiveness of this learned supervoxel embedding on standard benchmark data.
2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2015
We present a novel approach for discovering human interactions in videos. Activity understanding ... more We present a novel approach for discovering human interactions in videos. Activity understanding techniques usually require a large number of labeled examples, which are not available in many practical cases. Here, we focus on recovering semantically meaningful clusters of human-human and human-object interaction in an unsupervised fashion. A new iterative solution is introduced based on Maximum Margin Clustering (MMC), which also accepts user feedback to refine clusters. This is achieved by formulating the whole process as a unified constrained latent max-margin clustering problem. Extensive experiments have been carried out over three challenging datasets, Collective Activity, VIRAT, and UT-interaction. Empirical results demonstrate that the proposed algorithm can efficiently discover perfect semantic clusters of human interactions with only a small amount of labeling effort.
2015 14th IAPR International Conference on Machine Vision Applications (MVA), 2015
Detecting objects such as humans or vehicles is a central problem in surveillance video. Myriad s... more Detecting objects such as humans or vehicles is a central problem in surveillance video. Myriad standard approaches exist for this problem. At their core, approaches consider either the appearance of people, patterns of their motion, or differences from the background. In this paper we build on dense trajectories, a state-of-the-art approach for describing spatiotemporal patterns in video sequences. We demonstrate an application of dense trajectories to object detection in surveillance video, showing that they can be used to both regress estimates of object locations and accurately classify objects.
2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019
Domain shift is unavoidable in real-world applications of object detection. For example, in self-... more Domain shift is unavoidable in real-world applications of object detection. For example, in self-driving cars, the target domain consists of unconstrained road environments which cannot all possibly be observed in training data. Similarly, in surveillance applications sufficiently representative training data may be lacking due to privacy regulations. In this paper, we address the domain adaptation problem from the perspective of robust learning and show that the problem may be formulated as training with noisy labels. We propose a robust object detection framework that is resilient to noise in bounding box class labels, locations and size annotations. To adapt to the domain shift, the model is trained on the target domain using a set of noisy object bounding boxes that are obtained by a detection model trained only in the source domain. We evaluate the accuracy of our approach in various source/target domain pairs and demonstrate that the model significantly improves the state-of-the-art on multiple domain adaptation scenarios on the SIM10K, Cityscapes and KITTI datasets.