pritish sahu - Academia.edu (original) (raw)
Papers by pritish sahu
Conventional data visualization methods are very narrow in terms of the data typeson which they a... more Conventional data visualization methods are very narrow in terms of the data typeson which they are applicable. We present a novel way of viewing multi-attributeddataset by grouping subsets of attributes into facets. Cube-Maze interface visually rep-resents each data "entity" as a cube in three dimensional space. Similarity among "datacubes" correspond to 0, 1, and 2 dimensional adjacencies. Our current implementationprovides different modes of "EgoNet" navigation and several interaction filters. Thegraph counterpart for this cube maze representation is a "Labeled Multi Digraph".
ArXiv, 2021
This paper targets the problem of procedural multimodal machine comprehension (M3C). This task re... more This paper targets the problem of procedural multimodal machine comprehension (M3C). This task requires an AI to comprehend given steps of multimodal instructions and then answer questions. Compared to vanilla machine comprehension tasks where an AI is required only to understand a textual input, procedural M3C is more challenging as the AI needs to comprehend both the temporal and causal factors along with multimodal inputs. Recently Yagcioglu et al. [35] introduced RecipeQA dataset to evaluate M3C. Our first contribution is the introduction of two new M3C datasets- WoodworkQA and DecorationQA with 16K and 10K instructional procedures, respectively. We then evaluate M3C using a textual cloze style question-answering task and highlight an inherent bias in the question answer generation method from [35] that enables a naive baseline to cheat by learning from only answer choices. This naive baseline performs similar to a popular method used in question answering- Impatient Reader [6] ...
ArXiv, 2019
We propose a novel VAE-based deep auto-encoder model that can learn disentangled latent represent... more We propose a novel VAE-based deep auto-encoder model that can learn disentangled latent representations in a fully unsupervised manner, endowed with the ability to identify all meaningful sources of variation and their cardinality. Our model, dubbed Relevance-Factor-VAE, leverages the total correlation (TC) in the latent space to achieve the disentanglement goal, but also addresses the key issue of existing approaches which cannot distinguish between meaningful and nuisance factors of latent variation, often the source of considerable degradation in disentanglement performance. We tackle this issue by introducing the so-called relevance indicator variables that can be automatically learned from data, together with the VAE parameters. Our model effectively focuses the TC loss onto the relevant factors only by tolerating large prior KL divergences, a desideratum justified by our semi-parametric theoretical analysis. Using a suite of disentanglement metrics, including a newly proposed ...
ArXiv, 2021
Computational learning approaches to solving visual reasoning tests, such as Raven’s Progressive ... more Computational learning approaches to solving visual reasoning tests, such as Raven’s Progressive Matrices (RPM), critically depend on the ability of the computational approach to identify the visual concepts used in the test (i.e., the representation) as well as the latent rules based on those concepts (i.e., the reasoning). However, learning of representation and reasoning is a challenging and ill-posed task, often approached in a stage-wise manner (first representation, then reasoning). In this work, we propose an endto-end joint representation-reasoning learning framework, which leverages a weak form of inductive bias to improve both tasks together. Specifically, we propose a general generative graphical model for RPMs, GM-RPM, and apply it to solve the reasoning test. We accomplish this using a novel learning framework Disentangling based Abstract Reasoning Network (DAReN) based on the principles of GM-RPM. We perform an empirical evaluation of DAReN over several benchmark datas...
Current pre-trained language models have lots of knowledge, but a more limited ability to use tha... more Current pre-trained language models have lots of knowledge, but a more limited ability to use that knowledge. Bloom’s Taxonomy helps educators teach children how to use knowledge by categorizing comprehension skills, so we use it to analyze and improve the comprehension skills of large pre-trained language models. Our experiments focus on zero-shot question answering, using the taxonomy to provide proximal context that helps the model answer questions by being relevant to those questions. We show targeting context in this manner improves performance across 4 popular common sense question answer datasets.
We improve zero-shot learning (ZSL) by incorporating common-sense knowledge in DNNs. We propose C... more We improve zero-shot learning (ZSL) by incorporating common-sense knowledge in DNNs. We propose Common-Sense based Neuro-Symbolic Loss (CSNL) that formulates prior knowledge as novel neuro-symbolic loss functions that regularize visual-semantic embedding. CSNL forces visual features in the VSE to obey common-sense rules relating to hypernyms and attributes. We introduce two key novelties for improved learning: (1) enforcement of rules for a group instead of a single concept to take into account class-wise relationships, and (2) confidence margins inside logical operators that enable implicit curriculum learning and prevent premature overfitting. We evaluate the advantages of incorporating each knowledge source and show consistent gains over prior state-of-art methods in both conventional and generalized ZSL e.g. 11.5%, 5.5%, and 11.6% improvements on AWA2, CUB, and Kinetics respectively.
We focus on Multimodal Machine Reading Comprehension (M3C) where a model is expected to answer qu... more We focus on Multimodal Machine Reading Comprehension (M3C) where a model is expected to answer questions based on given passage (or context), and the context and the questions can be in different modalities. Previous works such as RecipeQA have proposed datasets and cloze-style tasks for evaluation. However, we identify three critical biases stemming from the question-answer generation process and memorization capabilities of large deep models. These biases makes it easier for a model to overfit by relying on spurious correlations or naive data patterns. We propose a systematic framework to address these biases through three Control-Knobs that enable us to generate a test bed of datasets of progressive difficulty levels. We believe that our benchmark (referred to as MetaRecipeQA) will provide, for the first time, a fine grained estimate of a model’s generalization capabilities. We also propose a general MC model that is used to realize several prior SOTA models and motivate a novel ...
2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Oct 1, 2019
Domain Adaptation (DA), the process of effectively adapting task models learned on one domain, th... more Domain Adaptation (DA), the process of effectively adapting task models learned on one domain, the source, to other related but distinct domains, the targets, with no or minimal retraining, is typically accomplished using the process of source-to-target manifold alignment. However, this process often leads to unsatisfactory adaptation performance, in part because it ignores the task-specific structure of the data. In this paper, we improve the performance of DA by introducing a discriminative discrepancy measure which takes advantage of auxiliary information available in the source and the target domains to better align the source and target distributions. Specifically, we leverage the cohesive clustering structure within individual data manifolds, associated with different tasks, to improve the alignment. This structure is explicit in the source, where the task labels are available, but is implicit in the target, making the problem challenging. We address the challenge by devising a deep DA framework, which combines a new task-driven domain alignment discriminator with domain regularizers that encourage the shared features as task-specific and domain invariant, and prompt the task model to be data structure preserving, guiding its decision boundaries through the low density data regions. We validate our framework on standard benchmarks, including Digits (MNIST, USPS, SVHN, MNIST-M), PACS, and VisDA. Our results show that our proposed model consistently outperforms the state-of-the-art in unsupervised domain adaptation.
The wok implemented describes a study of approaches to restore the nonlinear life mixture of imag... more The wok implemented describes a study of approaches to restore the nonlinear life mixture of images, which occurs when we scan or photograph and the back page shows through. We generally see this to occur mainly with old documents and low quality paper. With the presence of increased bleed-through,reading and deciphering the text becomes tedious. This project executes algorithms to reduce bleed-through distortion using techniques in digital image processing. We study the algorithm knowing the fact that in images the high frequency components are sparse and stronger on one side of the paper than on the other one. Bleed-through effect and show-through effect was removed in one time processing, with no iteration. Here the sources need not require to be independent or the mixture to be invariant.Hence it is suitable for separating mixtures such as those produced by bleed-through.
Data Visualisation and Analytics plays a key role in providing a complete view and discovering th... more Data Visualisation and Analytics plays a key role in providing a complete view and discovering the global/local patterns hidden in the data. Conventional data visualization methods as well as the extension of some conventional method are very narrow in terms of the data type on which it is applicable. We present a novel way of visualising data which can be generalized to any kind of data format. Data Units Multi Digraph Model can encompass all varieties of data and will be able give global/local view unlike others where data is mapped to nodes in a graph or shown in charts. This research project is a novel way of representing abstract data on the facets of a cube. It involves visualization and navigation of abstract data mapped to the facets of a cube. I. PROJECT DESCRIPTION We view a multimedial data collection as a “labeled multidigraph” over a finite set of ranked “data units”. Each data unit is an ordered set of data components each of which posses an identifier, a string name, ...
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
In unsupervised domain adaptation, it is widely known that the target domain error can be provabl... more In unsupervised domain adaptation, it is widely known that the target domain error can be provably reduced by having a shared input representation that makes the source and target domains indistinguishable from each other. Very recently it has been studied that not just matching the marginal input distributions, but the alignment of output (class) distributions is also critical. The latter can be achieved by minimizing the maximum discrepancy of predictors (classifiers). In this paper, we adopt this principle, but propose a more systematic and effective way to achieve hypothesis consistency via Gaussian processes (GP). The GP allows us to define/induce a hypothesis space of the classifiers from the posterior distribution of the latent random functions, turning the learning into a simple large-margin posterior separation problem, far easier to solve than previous approaches based on adversarial minimax optimization. We formulate a learning objective that effectively pushes the posterior to minimize the maximum discrepancy. This is further shown to be equivalent to maximizing margins and minimizing uncertainty of the class predictions in the target domain, a well-established principle in classical (semi-)supervised learning. Empirical results demonstrate that our approach is comparable or superior to the existing methods on several benchmark domain adaptation datasets.
2019 IEEE/CVF International Conference on Computer Vision (ICCV)
We propose a family of novel hierarchical Bayesian deep auto-encoder models capable of identifyin... more We propose a family of novel hierarchical Bayesian deep auto-encoder models capable of identifying disentangled factors of variability in data. While many recent attempts at factor disentanglement have focused on sophisticated learning objectives within the VAE framework, their choice of a standard normal as the latent factor prior is both suboptimal and detrimental to performance. Our key observation is that the disentangled latent variables responsible for major sources of variability, the relevant factors, can be more appropriately modeled using long-tail distributions. The typical Gaussian priors are, on the other hand, better suited for modeling of nuisance factors. Motivated by this, we extend the VAE to a hierarchical Bayesian model by introducing hyper-priors on the variances of Gaussian latent priors, mimicking an infinite mixture, while maintaining tractable learning and inference of the traditional VAEs. This analysis signifies the importance of partitioning and treating in a different manner the latent dimensions corresponding to relevant factors and nuisances. Our proposed models, dubbed Bayes-Factor-VAEs, are shown to outperform existing methods both quantitatively and qualitatively in terms of latent disentanglement across several challenging benchmark tasks.
2016 IEEE Winter Applications of Computer Vision Workshops (WACVW), 2016
Conventional data visualization methods are very narrow in terms of the data typeson which they a... more Conventional data visualization methods are very narrow in terms of the data typeson which they are applicable. We present a novel way of viewing multi-attributeddataset by grouping subsets of attributes into facets. Cube-Maze interface visually rep-resents each data "entity" as a cube in three dimensional space. Similarity among "datacubes" correspond to 0, 1, and 2 dimensional adjacencies. Our current implementationprovides different modes of "EgoNet" navigation and several interaction filters. Thegraph counterpart for this cube maze representation is a "Labeled Multi Digraph".
ArXiv, 2021
This paper targets the problem of procedural multimodal machine comprehension (M3C). This task re... more This paper targets the problem of procedural multimodal machine comprehension (M3C). This task requires an AI to comprehend given steps of multimodal instructions and then answer questions. Compared to vanilla machine comprehension tasks where an AI is required only to understand a textual input, procedural M3C is more challenging as the AI needs to comprehend both the temporal and causal factors along with multimodal inputs. Recently Yagcioglu et al. [35] introduced RecipeQA dataset to evaluate M3C. Our first contribution is the introduction of two new M3C datasets- WoodworkQA and DecorationQA with 16K and 10K instructional procedures, respectively. We then evaluate M3C using a textual cloze style question-answering task and highlight an inherent bias in the question answer generation method from [35] that enables a naive baseline to cheat by learning from only answer choices. This naive baseline performs similar to a popular method used in question answering- Impatient Reader [6] ...
ArXiv, 2019
We propose a novel VAE-based deep auto-encoder model that can learn disentangled latent represent... more We propose a novel VAE-based deep auto-encoder model that can learn disentangled latent representations in a fully unsupervised manner, endowed with the ability to identify all meaningful sources of variation and their cardinality. Our model, dubbed Relevance-Factor-VAE, leverages the total correlation (TC) in the latent space to achieve the disentanglement goal, but also addresses the key issue of existing approaches which cannot distinguish between meaningful and nuisance factors of latent variation, often the source of considerable degradation in disentanglement performance. We tackle this issue by introducing the so-called relevance indicator variables that can be automatically learned from data, together with the VAE parameters. Our model effectively focuses the TC loss onto the relevant factors only by tolerating large prior KL divergences, a desideratum justified by our semi-parametric theoretical analysis. Using a suite of disentanglement metrics, including a newly proposed ...
ArXiv, 2021
Computational learning approaches to solving visual reasoning tests, such as Raven’s Progressive ... more Computational learning approaches to solving visual reasoning tests, such as Raven’s Progressive Matrices (RPM), critically depend on the ability of the computational approach to identify the visual concepts used in the test (i.e., the representation) as well as the latent rules based on those concepts (i.e., the reasoning). However, learning of representation and reasoning is a challenging and ill-posed task, often approached in a stage-wise manner (first representation, then reasoning). In this work, we propose an endto-end joint representation-reasoning learning framework, which leverages a weak form of inductive bias to improve both tasks together. Specifically, we propose a general generative graphical model for RPMs, GM-RPM, and apply it to solve the reasoning test. We accomplish this using a novel learning framework Disentangling based Abstract Reasoning Network (DAReN) based on the principles of GM-RPM. We perform an empirical evaluation of DAReN over several benchmark datas...
Current pre-trained language models have lots of knowledge, but a more limited ability to use tha... more Current pre-trained language models have lots of knowledge, but a more limited ability to use that knowledge. Bloom’s Taxonomy helps educators teach children how to use knowledge by categorizing comprehension skills, so we use it to analyze and improve the comprehension skills of large pre-trained language models. Our experiments focus on zero-shot question answering, using the taxonomy to provide proximal context that helps the model answer questions by being relevant to those questions. We show targeting context in this manner improves performance across 4 popular common sense question answer datasets.
We improve zero-shot learning (ZSL) by incorporating common-sense knowledge in DNNs. We propose C... more We improve zero-shot learning (ZSL) by incorporating common-sense knowledge in DNNs. We propose Common-Sense based Neuro-Symbolic Loss (CSNL) that formulates prior knowledge as novel neuro-symbolic loss functions that regularize visual-semantic embedding. CSNL forces visual features in the VSE to obey common-sense rules relating to hypernyms and attributes. We introduce two key novelties for improved learning: (1) enforcement of rules for a group instead of a single concept to take into account class-wise relationships, and (2) confidence margins inside logical operators that enable implicit curriculum learning and prevent premature overfitting. We evaluate the advantages of incorporating each knowledge source and show consistent gains over prior state-of-art methods in both conventional and generalized ZSL e.g. 11.5%, 5.5%, and 11.6% improvements on AWA2, CUB, and Kinetics respectively.
We focus on Multimodal Machine Reading Comprehension (M3C) where a model is expected to answer qu... more We focus on Multimodal Machine Reading Comprehension (M3C) where a model is expected to answer questions based on given passage (or context), and the context and the questions can be in different modalities. Previous works such as RecipeQA have proposed datasets and cloze-style tasks for evaluation. However, we identify three critical biases stemming from the question-answer generation process and memorization capabilities of large deep models. These biases makes it easier for a model to overfit by relying on spurious correlations or naive data patterns. We propose a systematic framework to address these biases through three Control-Knobs that enable us to generate a test bed of datasets of progressive difficulty levels. We believe that our benchmark (referred to as MetaRecipeQA) will provide, for the first time, a fine grained estimate of a model’s generalization capabilities. We also propose a general MC model that is used to realize several prior SOTA models and motivate a novel ...
2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Oct 1, 2019
Domain Adaptation (DA), the process of effectively adapting task models learned on one domain, th... more Domain Adaptation (DA), the process of effectively adapting task models learned on one domain, the source, to other related but distinct domains, the targets, with no or minimal retraining, is typically accomplished using the process of source-to-target manifold alignment. However, this process often leads to unsatisfactory adaptation performance, in part because it ignores the task-specific structure of the data. In this paper, we improve the performance of DA by introducing a discriminative discrepancy measure which takes advantage of auxiliary information available in the source and the target domains to better align the source and target distributions. Specifically, we leverage the cohesive clustering structure within individual data manifolds, associated with different tasks, to improve the alignment. This structure is explicit in the source, where the task labels are available, but is implicit in the target, making the problem challenging. We address the challenge by devising a deep DA framework, which combines a new task-driven domain alignment discriminator with domain regularizers that encourage the shared features as task-specific and domain invariant, and prompt the task model to be data structure preserving, guiding its decision boundaries through the low density data regions. We validate our framework on standard benchmarks, including Digits (MNIST, USPS, SVHN, MNIST-M), PACS, and VisDA. Our results show that our proposed model consistently outperforms the state-of-the-art in unsupervised domain adaptation.
The wok implemented describes a study of approaches to restore the nonlinear life mixture of imag... more The wok implemented describes a study of approaches to restore the nonlinear life mixture of images, which occurs when we scan or photograph and the back page shows through. We generally see this to occur mainly with old documents and low quality paper. With the presence of increased bleed-through,reading and deciphering the text becomes tedious. This project executes algorithms to reduce bleed-through distortion using techniques in digital image processing. We study the algorithm knowing the fact that in images the high frequency components are sparse and stronger on one side of the paper than on the other one. Bleed-through effect and show-through effect was removed in one time processing, with no iteration. Here the sources need not require to be independent or the mixture to be invariant.Hence it is suitable for separating mixtures such as those produced by bleed-through.
Data Visualisation and Analytics plays a key role in providing a complete view and discovering th... more Data Visualisation and Analytics plays a key role in providing a complete view and discovering the global/local patterns hidden in the data. Conventional data visualization methods as well as the extension of some conventional method are very narrow in terms of the data type on which it is applicable. We present a novel way of visualising data which can be generalized to any kind of data format. Data Units Multi Digraph Model can encompass all varieties of data and will be able give global/local view unlike others where data is mapped to nodes in a graph or shown in charts. This research project is a novel way of representing abstract data on the facets of a cube. It involves visualization and navigation of abstract data mapped to the facets of a cube. I. PROJECT DESCRIPTION We view a multimedial data collection as a “labeled multidigraph” over a finite set of ranked “data units”. Each data unit is an ordered set of data components each of which posses an identifier, a string name, ...
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
In unsupervised domain adaptation, it is widely known that the target domain error can be provabl... more In unsupervised domain adaptation, it is widely known that the target domain error can be provably reduced by having a shared input representation that makes the source and target domains indistinguishable from each other. Very recently it has been studied that not just matching the marginal input distributions, but the alignment of output (class) distributions is also critical. The latter can be achieved by minimizing the maximum discrepancy of predictors (classifiers). In this paper, we adopt this principle, but propose a more systematic and effective way to achieve hypothesis consistency via Gaussian processes (GP). The GP allows us to define/induce a hypothesis space of the classifiers from the posterior distribution of the latent random functions, turning the learning into a simple large-margin posterior separation problem, far easier to solve than previous approaches based on adversarial minimax optimization. We formulate a learning objective that effectively pushes the posterior to minimize the maximum discrepancy. This is further shown to be equivalent to maximizing margins and minimizing uncertainty of the class predictions in the target domain, a well-established principle in classical (semi-)supervised learning. Empirical results demonstrate that our approach is comparable or superior to the existing methods on several benchmark domain adaptation datasets.
2019 IEEE/CVF International Conference on Computer Vision (ICCV)
We propose a family of novel hierarchical Bayesian deep auto-encoder models capable of identifyin... more We propose a family of novel hierarchical Bayesian deep auto-encoder models capable of identifying disentangled factors of variability in data. While many recent attempts at factor disentanglement have focused on sophisticated learning objectives within the VAE framework, their choice of a standard normal as the latent factor prior is both suboptimal and detrimental to performance. Our key observation is that the disentangled latent variables responsible for major sources of variability, the relevant factors, can be more appropriately modeled using long-tail distributions. The typical Gaussian priors are, on the other hand, better suited for modeling of nuisance factors. Motivated by this, we extend the VAE to a hierarchical Bayesian model by introducing hyper-priors on the variances of Gaussian latent priors, mimicking an infinite mixture, while maintaining tractable learning and inference of the traditional VAEs. This analysis signifies the importance of partitioning and treating in a different manner the latent dimensions corresponding to relevant factors and nuisances. Our proposed models, dubbed Bayes-Factor-VAEs, are shown to outperform existing methods both quantitatively and qualitatively in terms of latent disentanglement across several challenging benchmark tasks.
2016 IEEE Winter Applications of Computer Vision Workshops (WACVW), 2016