Shan JS - Academia.edu (original) (raw)

Papers by Shan JS

Research paper thumbnail of TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNs

Attention modules for Convolutional Neural Networks (CNNs) are an effective method to enhance per... more Attention modules for Convolutional Neural Networks (CNNs) are an effective method to enhance performance on multiple computervision tasks. While existing methods appropriately model channel-, spatialand self-attention, they primarily operate in a feedforward bottom-up manner. Consequently, the attention mechanism strongly depends on the local information of a single input feature map and does not incorporate relatively semantically-richer contextual information available at higher layers that can specify "what and where to look" in lower-level feature maps through top-down information flow. Accordingly, in this work, we propose a lightweight top-down attention module (TDAM) that iteratively generates a "visual searchlight" to perform channel and spatial modulation of its inputs and outputs more contextually-relevant feature maps at each computation step. Our experiments indicate that TDAM enhances the performance of CNNs across multiple object-recognition benchmarks and outperforms prominent attention modules while being more parameter and memory efficient. Further, TDAM-based models learn to "shift attention" by localizing individual objects or features at each computation step without any explicit supervision resulting in a 5% improvement for ResNet50 on weakly-supervised object localization.

Research paper thumbnail of What do CNNs gain by imitating the visual development of primate infants?

British Machine Vision Conference, 2020

Deep convolutional neural networks have emerged as strong candidates for a model of human vision,... more Deep convolutional neural networks have emerged as strong candidates for a model of human vision, often outperforming competing models on both computer vision benchmarks and computational neuroscience benchmarks of neural response correspondence. The design of these models has undergone several refinements in recent years drawing on both statistical and cognitive insights and, in the process, shown increasing correspondence to primate visual processing representations. However, their training methodology still remains in contrast to the process of primate visual development, and we believe that it can benefit from being more aligned with this natural process. Primate visual development is characterized by low visual acuity and colour sensitivity as well as high plasticity and neuronal growth in the first year of infancy, prior to the development of specific visual-cognitive functions such as visual object recognition. In this work, we investigate the synergy between the gradual variation in the distribution of visual input and the concurrent growth of a statistical model of vision on the task of large-scale object classification, and discuss how it may yield better approaches to training deep convolutional neural networks. The experiments we performed across multiple object classification benchmarks indicate that a growing statistical model trained with a gradually varying visual input distribution converges to a better generalization at a faster rate than traditional, more static training setups.

Research paper thumbnail of TDAN: Top-Down Attention Networks for Enhanced Feature Selectivity in CNNs

Attention modules for Convolutional Neural Networks (CNNs) are an effective method to enhance per... more Attention modules for Convolutional Neural Networks (CNNs) are an effective method to enhance performance of networks on multiple computer-vision tasks. While many works focus on building more effective modules through appropriate modelling of channel-, spatialand self-attention, they primarily operate in a feedfoward manner. Consequently, the attention mechanism strongly depends on the representational capacity of a single input feature activation, and can benefit from incorporation of semanticallyricher higher-level activations that can specify “what and where to look” through top-down information flow. Such feedback connections are also prevalent in the primate visual cortex and recognized by neuroscientists as a key component in primate visual attention. Accordingly, in this work, we propose a lightweight topdown (TD) attention module that iteratively generates a “visual searchlight” to perform top-down channel and spatial modulation of its inputs and consequently outputs more s...

Research paper thumbnail of How does simulating aspects of primate infant visual development inform training of CNNs?

Research paper thumbnail of graph2vec: Learning Distributed Representations of Graphs

ArXiv, 2017

Recent works on representation learning for graph structured data predominantly focus on learning... more Recent works on representation learning for graph structured data predominantly focus on learning distributed representations of graph substructures such as nodes and subgraphs. However, many graph analytics tasks such as graph classification and clustering require representing entire graphs as fixed length feature vectors. While the aforementioned approaches are naturally unequipped to learn such representations, graph kernels remain as the most effective way of obtaining them. However, these graph kernels use handcrafted features (e.g., shortest paths, graphlets, etc.) and hence are hampered by problems such as poor generalization. To address this limitation, in this work, we propose a neural embedding framework named graph2vec to learn data-driven distributed representations of arbitrary sized graphs. graph2vec's embeddings are learnt in an unsupervised manner and are task agnostic. Hence, they could be used for any downstream task such as graph classification, clustering and...

Research paper thumbnail of What do CNNs gain by imitating the visual development of primate infants?

Deep convolutional neural networks have emerged as strong candidates for a model of human vision,... more Deep convolutional neural networks have emerged as strong candidates for a model of human vision, often outperforming competing models on both computer vision benchmarks and computational neuroscience benchmarks of neural response correspondence. The design of these models has undergone several refinements in recent years drawing on both statistical and cognitive insights and, in the process, shown increasing correspondence to primate visual processing representations. However, their training methodology still remains in contrast to the process of primate visual development, and we believe that it can benefit from being more aligned with this natural process. Primate visual development is characterized by low visual acuity and colour sensitivity as well as high plasticity and neuronal growth in the first year of infancy, prior to the development of specific visual-cognitive functions such as visual object recognition. In this work, we investigate the synergy between the gradual vari...

Research paper thumbnail of TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNs

Attention modules for Convolutional Neural Networks (CNNs) are an effective method to enhance per... more Attention modules for Convolutional Neural Networks (CNNs) are an effective method to enhance performance on multiple computervision tasks. While existing methods appropriately model channel-, spatialand self-attention, they primarily operate in a feedforward bottom-up manner. Consequently, the attention mechanism strongly depends on the local information of a single input feature map and does not incorporate relatively semantically-richer contextual information available at higher layers that can specify "what and where to look" in lower-level feature maps through top-down information flow. Accordingly, in this work, we propose a lightweight top-down attention module (TDAM) that iteratively generates a "visual searchlight" to perform channel and spatial modulation of its inputs and outputs more contextually-relevant feature maps at each computation step. Our experiments indicate that TDAM enhances the performance of CNNs across multiple object-recognition benchmarks and outperforms prominent attention modules while being more parameter and memory efficient. Further, TDAM-based models learn to "shift attention" by localizing individual objects or features at each computation step without any explicit supervision resulting in a 5% improvement for ResNet50 on weakly-supervised object localization.

Research paper thumbnail of What do CNNs gain by imitating the visual development of primate infants?

British Machine Vision Conference, 2020

Deep convolutional neural networks have emerged as strong candidates for a model of human vision,... more Deep convolutional neural networks have emerged as strong candidates for a model of human vision, often outperforming competing models on both computer vision benchmarks and computational neuroscience benchmarks of neural response correspondence. The design of these models has undergone several refinements in recent years drawing on both statistical and cognitive insights and, in the process, shown increasing correspondence to primate visual processing representations. However, their training methodology still remains in contrast to the process of primate visual development, and we believe that it can benefit from being more aligned with this natural process. Primate visual development is characterized by low visual acuity and colour sensitivity as well as high plasticity and neuronal growth in the first year of infancy, prior to the development of specific visual-cognitive functions such as visual object recognition. In this work, we investigate the synergy between the gradual variation in the distribution of visual input and the concurrent growth of a statistical model of vision on the task of large-scale object classification, and discuss how it may yield better approaches to training deep convolutional neural networks. The experiments we performed across multiple object classification benchmarks indicate that a growing statistical model trained with a gradually varying visual input distribution converges to a better generalization at a faster rate than traditional, more static training setups.

Research paper thumbnail of TDAN: Top-Down Attention Networks for Enhanced Feature Selectivity in CNNs

Attention modules for Convolutional Neural Networks (CNNs) are an effective method to enhance per... more Attention modules for Convolutional Neural Networks (CNNs) are an effective method to enhance performance of networks on multiple computer-vision tasks. While many works focus on building more effective modules through appropriate modelling of channel-, spatialand self-attention, they primarily operate in a feedfoward manner. Consequently, the attention mechanism strongly depends on the representational capacity of a single input feature activation, and can benefit from incorporation of semanticallyricher higher-level activations that can specify “what and where to look” through top-down information flow. Such feedback connections are also prevalent in the primate visual cortex and recognized by neuroscientists as a key component in primate visual attention. Accordingly, in this work, we propose a lightweight topdown (TD) attention module that iteratively generates a “visual searchlight” to perform top-down channel and spatial modulation of its inputs and consequently outputs more s...

Research paper thumbnail of How does simulating aspects of primate infant visual development inform training of CNNs?

Research paper thumbnail of graph2vec: Learning Distributed Representations of Graphs

ArXiv, 2017

Recent works on representation learning for graph structured data predominantly focus on learning... more Recent works on representation learning for graph structured data predominantly focus on learning distributed representations of graph substructures such as nodes and subgraphs. However, many graph analytics tasks such as graph classification and clustering require representing entire graphs as fixed length feature vectors. While the aforementioned approaches are naturally unequipped to learn such representations, graph kernels remain as the most effective way of obtaining them. However, these graph kernels use handcrafted features (e.g., shortest paths, graphlets, etc.) and hence are hampered by problems such as poor generalization. To address this limitation, in this work, we propose a neural embedding framework named graph2vec to learn data-driven distributed representations of arbitrary sized graphs. graph2vec's embeddings are learnt in an unsupervised manner and are task agnostic. Hence, they could be used for any downstream task such as graph classification, clustering and...

Research paper thumbnail of What do CNNs gain by imitating the visual development of primate infants?

Deep convolutional neural networks have emerged as strong candidates for a model of human vision,... more Deep convolutional neural networks have emerged as strong candidates for a model of human vision, often outperforming competing models on both computer vision benchmarks and computational neuroscience benchmarks of neural response correspondence. The design of these models has undergone several refinements in recent years drawing on both statistical and cognitive insights and, in the process, shown increasing correspondence to primate visual processing representations. However, their training methodology still remains in contrast to the process of primate visual development, and we believe that it can benefit from being more aligned with this natural process. Primate visual development is characterized by low visual acuity and colour sensitivity as well as high plasticity and neuronal growth in the first year of infancy, prior to the development of specific visual-cognitive functions such as visual object recognition. In this work, we investigate the synergy between the gradual vari...