Thusitha Chandrapala | Hong Kong University of Science and Technology (original) (raw)
Uploads
Papers by Thusitha Chandrapala
Neural development in the visual cortex depends on the visual experience during the so-called cri... more Neural development in the visual cortex depends on the visual experience during the so-called critical period. Recent experiments have shown that under normal conditions rodents develop binocular receptive fields which have similar orientation preferences for the left and right eyes. In contrast, under conditions of monocular deprivation during the critical period, this orientation alignment does not happen. Here we propose a computational model to explain the process of orientation alignment, its underlying mechanisms, and its failure in case of monocular deprivation or uncorrelated binocular inputs. Our model is based on the recently proposed Active Efficient Coding framework that jointly develops eye movement control and sensory representations. Our model suggests that the active maintenance of a binocular visual field, which leads to correlated visual inputs from the two eyes, is essential for the process of orientation alignment. This behavior is analogous to vergence control in primates. However, due to the fact that rodents have large receptive fields with low spatial frequency tuning, the coordination of the eyes need not be very precise. The model also suggests that it is not necessary that coordinated binocular vision be maintained continuously in order for orientation alignment to develop.
A number of unsupervised learning algorithms seeking to account for the receptive field propertie... more A number of unsupervised learning algorithms seeking to account for the receptive field properties of simple cells in the mammalian primary visual cortex have been proposed. Among these are principal component analysis and sparse coding. While it appears that the receptive field properties learned by sparse coding match those measured in cortical cells better than those learned by principal component analysis, it is still not clear why biological neural systems might prefer to use sparse codes. In this paper we explore another reason why sparse representations might be preferred over principal component analysis by studying the utility of different coding schemes in an adaptive behaving agent. We suggest that the qualitative properties of representations based on sparse coding are more stable in the presence of changes in the input statistics than those of representations based on principal component analysis. We demonstrate this by examining representations learned on binocular visual input with different disparity distributions. Our results show that in encoding retinal disparity, the properties of sparse codes are more stable, and that this has important implications in adaptive agents, where the statistics change over time. In particular, in an agent who jointly learns a representation for binocular visual inputs along with a vergence control policy, the learned behavior is unstable when actions are driven by PCA based representations, but stable and self calibrating when driven by sparse coding based representations.
Primary visual cortical complex cells are thought to serve as invariant feature detectors and to ... more Primary visual cortical complex cells are thought to serve as invariant feature detectors and to provide input to higher cortical areas. We propose a single model for learning the connectivity required by complex cells that integrates two factors that have been hypothesized to play a role in the development of invariant feature detectors: temporal slowness and sparsity. This model, the generative adaptive subspace self-organizing map (GASSOM), extends Kohonen’s adaptive subspace self-organizing map (ASSOM) with a generative model of the input. Each observation is assumed to be generated by one among many nodes in the network, each being associated with a different subspace in the space of all observations. The generating nodes evolve according to a first-order Markov chain and generate inputs that lie close to the associated subspace. This model differs from prior approaches in that temporal slowness is not an externally imposed criterion to be maximized during learning but, rather, an emergent property of the model structure as it seeks a good model of the input statistics. Unlike the ASSOM, the GASSOM does not require an explicit segmentation of the input training vectors into separate episodes. This enables us to apply this model to an unlabeled naturalistic image sequence generated by a realistic eye movement model. We show that the emergence of temporal slowness within the model improves the invariance of feature detectors trained on this input.
The Adaptive Subspace Self Organized Map (ASSOM) is a model that incorporates sparsity, nonline... more The Adaptive Subspace Self Organized Map
(ASSOM) is a model that incorporates sparsity, nonlinear
pooling, topological organization and temporal continuity to
learn invariant feature detectors, each corresponding to one node
of the network. Temporal continuity is implemented by grouping
inputs into “training episodes”. Each episode contains samples
from one invariance class and is mapped to a particular node
during training. However, this explicit grouping makes
application of this algorithm for natural image sequences
difficult, since the grouping is generally not known a priori. This
work proposes a probabilistic generative model of the ASSOM
that addresses this problem. Each node of the ASSOM generates
input vectors from one invariance class. Training sequences are
generated by nodes that are chosen according to a Markov
process. We demonstrate that this model can learn invariant
feature detectors similar to those found in the primary visual
cortex from an unlabeled sequence of input images generated by
a realistic model of eye movements. Performance is comparable
to the original ASSOM algorithm, but without the need for
explicit grouping into training episodes.
github: thusithaC/gassom
Motion Blur due to the relative motion between the camera and object can seriously degrade image ... more Motion Blur due to the relative motion between the camera and object can seriously degrade image quality. We present an FPGA based blur detection and correction algorithm which is implemented on top of a configurable soft-processor based architecture. The system consists of two main modules. The blur detection module identifies the blur length and angle, and the restoration module uses regularized inverse filtering to remove the blur. The Processing algorithms are implemented as separate cores on the FPGA fabric where the soft processor core is only used for managing system configuration. The system can achieve a frame rate of 15fps for a 720p HD video stream.
Although many recent stereo vision algorithms have been able to create disparity maps with high a... more Although many recent stereo vision algorithms have been able to create disparity maps with high accuracy, because of the sequential nature it is difficult to adopt them for real time applications. Biologically motivated algorithms involving Gabor filters demonstrate inherent parallelism and could be effectively implemented in parallel hardware such as Graphics Processing Units(GPUs). We present a real time stereo vision algorithm based on Gabor filters which effectively use the memory hierarchy and the threading resources of the Graphics Processing Unit(GPU). Since the 2D filtering process is a critical activity which takes upto 50% of the total time to create the
disparity map, we evaluate the GPU implementation of three filtering methods. Using the optimal filtering method out of them, we were able to achieve a frame rate of 76 fps for a 512x512 image stream on a NVIDIA GTX 480 GPU, and a 170x speed-up compared to the conventional CPU based implementation.
Thesis Chapters by Thusitha Chandrapala
n this thesis, we propose a novel framework, the Generative Adaptive Subspace Self Organizing Map... more n this thesis, we propose a novel framework, the Generative Adaptive Subspace Self Organizing Map (GASSOM), which utilizes sparsity and temporal slowness in learning invariant feature detectors. Sparsity and temporal slowness have been identified as two critical components in shaping visual receptive fields of neurons in the primary visual cortex of
animals with a developed vision processing system, such as primates. Sparsity is inspired by Barlow’s efficient coding hypothesis, which posits that neural population responses represent sensory data using as few active neurons as possible. The principle of temporal slowness
assumes that neurons adapt to encode information about the environment, which is relatively stable in comparison to the raw sensory signals. Using the GASSOM framework we show that temporal slowness can emerge in the model as it tries to learn a better representation of sensory signals, and that incorporating slowness results in representations that exhibit better invariance.
We validate the applicability of the GASSOM framework in tasks that require the learning of invariant visual representations. We incorporate the GASSOM in a framework that jointly learns a neural representation and a behavior, and use it to analyze the functional utility of sparsity. We also use this joint learning framework to explain neurophysiological findings about binocular neurons and coordinated eye movements in rodents. We propose the applicability of the GASSOM as a generic learning algorithm that could be used to form hierarchical organizations of feature extractors that model the information flow in the visual
cortex. Specifically, we study the development of motion in depth sensitive units. Finally we extend the GASSOM to the event domain, by constructing a framework for learning invariant feature detectors from stimuli generated using event-driven neuromorphic vision sensors.
Neural development in the visual cortex depends on the visual experience during the so-called cri... more Neural development in the visual cortex depends on the visual experience during the so-called critical period. Recent experiments have shown that under normal conditions rodents develop binocular receptive fields which have similar orientation preferences for the left and right eyes. In contrast, under conditions of monocular deprivation during the critical period, this orientation alignment does not happen. Here we propose a computational model to explain the process of orientation alignment, its underlying mechanisms, and its failure in case of monocular deprivation or uncorrelated binocular inputs. Our model is based on the recently proposed Active Efficient Coding framework that jointly develops eye movement control and sensory representations. Our model suggests that the active maintenance of a binocular visual field, which leads to correlated visual inputs from the two eyes, is essential for the process of orientation alignment. This behavior is analogous to vergence control in primates. However, due to the fact that rodents have large receptive fields with low spatial frequency tuning, the coordination of the eyes need not be very precise. The model also suggests that it is not necessary that coordinated binocular vision be maintained continuously in order for orientation alignment to develop.
A number of unsupervised learning algorithms seeking to account for the receptive field propertie... more A number of unsupervised learning algorithms seeking to account for the receptive field properties of simple cells in the mammalian primary visual cortex have been proposed. Among these are principal component analysis and sparse coding. While it appears that the receptive field properties learned by sparse coding match those measured in cortical cells better than those learned by principal component analysis, it is still not clear why biological neural systems might prefer to use sparse codes. In this paper we explore another reason why sparse representations might be preferred over principal component analysis by studying the utility of different coding schemes in an adaptive behaving agent. We suggest that the qualitative properties of representations based on sparse coding are more stable in the presence of changes in the input statistics than those of representations based on principal component analysis. We demonstrate this by examining representations learned on binocular visual input with different disparity distributions. Our results show that in encoding retinal disparity, the properties of sparse codes are more stable, and that this has important implications in adaptive agents, where the statistics change over time. In particular, in an agent who jointly learns a representation for binocular visual inputs along with a vergence control policy, the learned behavior is unstable when actions are driven by PCA based representations, but stable and self calibrating when driven by sparse coding based representations.
Primary visual cortical complex cells are thought to serve as invariant feature detectors and to ... more Primary visual cortical complex cells are thought to serve as invariant feature detectors and to provide input to higher cortical areas. We propose a single model for learning the connectivity required by complex cells that integrates two factors that have been hypothesized to play a role in the development of invariant feature detectors: temporal slowness and sparsity. This model, the generative adaptive subspace self-organizing map (GASSOM), extends Kohonen’s adaptive subspace self-organizing map (ASSOM) with a generative model of the input. Each observation is assumed to be generated by one among many nodes in the network, each being associated with a different subspace in the space of all observations. The generating nodes evolve according to a first-order Markov chain and generate inputs that lie close to the associated subspace. This model differs from prior approaches in that temporal slowness is not an externally imposed criterion to be maximized during learning but, rather, an emergent property of the model structure as it seeks a good model of the input statistics. Unlike the ASSOM, the GASSOM does not require an explicit segmentation of the input training vectors into separate episodes. This enables us to apply this model to an unlabeled naturalistic image sequence generated by a realistic eye movement model. We show that the emergence of temporal slowness within the model improves the invariance of feature detectors trained on this input.
The Adaptive Subspace Self Organized Map (ASSOM) is a model that incorporates sparsity, nonline... more The Adaptive Subspace Self Organized Map
(ASSOM) is a model that incorporates sparsity, nonlinear
pooling, topological organization and temporal continuity to
learn invariant feature detectors, each corresponding to one node
of the network. Temporal continuity is implemented by grouping
inputs into “training episodes”. Each episode contains samples
from one invariance class and is mapped to a particular node
during training. However, this explicit grouping makes
application of this algorithm for natural image sequences
difficult, since the grouping is generally not known a priori. This
work proposes a probabilistic generative model of the ASSOM
that addresses this problem. Each node of the ASSOM generates
input vectors from one invariance class. Training sequences are
generated by nodes that are chosen according to a Markov
process. We demonstrate that this model can learn invariant
feature detectors similar to those found in the primary visual
cortex from an unlabeled sequence of input images generated by
a realistic model of eye movements. Performance is comparable
to the original ASSOM algorithm, but without the need for
explicit grouping into training episodes.
github: thusithaC/gassom
Motion Blur due to the relative motion between the camera and object can seriously degrade image ... more Motion Blur due to the relative motion between the camera and object can seriously degrade image quality. We present an FPGA based blur detection and correction algorithm which is implemented on top of a configurable soft-processor based architecture. The system consists of two main modules. The blur detection module identifies the blur length and angle, and the restoration module uses regularized inverse filtering to remove the blur. The Processing algorithms are implemented as separate cores on the FPGA fabric where the soft processor core is only used for managing system configuration. The system can achieve a frame rate of 15fps for a 720p HD video stream.
Although many recent stereo vision algorithms have been able to create disparity maps with high a... more Although many recent stereo vision algorithms have been able to create disparity maps with high accuracy, because of the sequential nature it is difficult to adopt them for real time applications. Biologically motivated algorithms involving Gabor filters demonstrate inherent parallelism and could be effectively implemented in parallel hardware such as Graphics Processing Units(GPUs). We present a real time stereo vision algorithm based on Gabor filters which effectively use the memory hierarchy and the threading resources of the Graphics Processing Unit(GPU). Since the 2D filtering process is a critical activity which takes upto 50% of the total time to create the
disparity map, we evaluate the GPU implementation of three filtering methods. Using the optimal filtering method out of them, we were able to achieve a frame rate of 76 fps for a 512x512 image stream on a NVIDIA GTX 480 GPU, and a 170x speed-up compared to the conventional CPU based implementation.
n this thesis, we propose a novel framework, the Generative Adaptive Subspace Self Organizing Map... more n this thesis, we propose a novel framework, the Generative Adaptive Subspace Self Organizing Map (GASSOM), which utilizes sparsity and temporal slowness in learning invariant feature detectors. Sparsity and temporal slowness have been identified as two critical components in shaping visual receptive fields of neurons in the primary visual cortex of
animals with a developed vision processing system, such as primates. Sparsity is inspired by Barlow’s efficient coding hypothesis, which posits that neural population responses represent sensory data using as few active neurons as possible. The principle of temporal slowness
assumes that neurons adapt to encode information about the environment, which is relatively stable in comparison to the raw sensory signals. Using the GASSOM framework we show that temporal slowness can emerge in the model as it tries to learn a better representation of sensory signals, and that incorporating slowness results in representations that exhibit better invariance.
We validate the applicability of the GASSOM framework in tasks that require the learning of invariant visual representations. We incorporate the GASSOM in a framework that jointly learns a neural representation and a behavior, and use it to analyze the functional utility of sparsity. We also use this joint learning framework to explain neurophysiological findings about binocular neurons and coordinated eye movements in rodents. We propose the applicability of the GASSOM as a generic learning algorithm that could be used to form hierarchical organizations of feature extractors that model the information flow in the visual
cortex. Specifically, we study the development of motion in depth sensitive units. Finally we extend the GASSOM to the event domain, by constructing a framework for learning invariant feature detectors from stimuli generated using event-driven neuromorphic vision sensors.