Aritra Mukherjee - Profile on Academia.edu (original) (raw)

Uploads

Papers by Aritra Mukherjee

Object Mapping from Disparity Map by Fast Clustering

2020 IEEE Calcutta Conference (CALCON), 2020

3D object bounding box detection is one of the most important aspects of robot vision for autonom... more 3D object bounding box detection is one of the most important aspects of robot vision for autonomous navigation. In this work, we propose a stereo vision based methodology for the purpose. The work relies on disparity map. First of all, pixels with the same disparity in the continuous space form the components. Detected components are then filtered based on size and density criteria. Finally, the filtered components are combined based on adjacency, connectivity strength and depth proximity. Thus, 2D object proposals are obtained and mapped to 3D bounding boxes. A dataset has been prepared to test the methodology. Performance has been compared with another system developed by Computer Vision Lab at INHA University, Incheon, South Korea. It is observed that the detection capability of the proposed system is superior. Furthermore, the computational speed makes the work suitable for robotic applications such as SLAM.

Segmentation of Natural Image Based on Colour Cohesion and Spatial Criteria

2017 Ninth International Conference on Advances in Pattern Recognition (ICAPR), 2017

Segmenting a natural image is a complex task. Different semantic units may share similar visual f... more Segmenting a natural image is a complex task. Different semantic units may share similar visual features. On the other hand, such features can have variations even within a single unit. Proposed methodology relies on colour cohesion and spatial relationship between the components with cohesive colour. At first image colour space is clustered to map the original colour to a reduced set. Number of cluster is automatically detected by analyzing the intensity histograms of the colour channels. Based on the similarity in terms of mapped colours, pixels are grouped. Subsequently, the spatial inclusiveness criteria is considered to merge the pixels groups where one group is contained within another. Finally, an attempt is made to merge the adjacent regions based on colour gradient. Colour cohesion is conceptualized by the process of colour space clustering, grouping of pixels in terms of colour similarity and region merging based on colour gradient. The spatial criteria is taken into accou...

Two Stage Semantic Segmentation by SEEDS and Fork Net

2020 IEEE Calcutta Conference (CALCON), 2020

Semantic segmentation of image is one of the most challenging and researched topic in the field o... more Semantic segmentation of image is one of the most challenging and researched topic in the field of computer vision. Statistical methods can be employed for the task with low computational resources, but in a diverse natural environment, it fails to label many complicated objects. Deep learning methods are quite popular now for high accuracy but dense semantic segmentation at pixel level accuracy is very resource-intensive and not suitable for robot vision. Proposed methodology merges the best of both worlds to semantically label superpixels computed by a statistical method, with a deep net. The deep convolution network is novel in its use of superpixels in different fields of vision. The methodology is tested on the Pascal VOC dataset and compared with recent popular approaches. The results show that the proposed methodology is on par with the best results.

Lecture Notes in Computer Science, 2017

Closing loops for pose graph optimization, by recognising previously mapped places is an essentia... more Closing loops for pose graph optimization, by recognising previously mapped places is an essential step for performing Simultaneous Localisation and Mapping (SLAM). The traditional approaches for recognising known places follow a feature-based bag-of-words model while discarding certain geometric and structural information. In order to improve real-time query performance, we take a slightly different approach by learning low-dimensional global representation vectors using a deconvolution net. Proposed 12-layer deconvolution net encodes and decodes an image to itself and in the process learns a representation of the image in a reduced feature space, it is then used for comparing one image with another to identify loop closures. Sequences from KITTI Visual Odometry dataset are used for evaluation and performance is compared with state-of-the-art techniques. Perceptual aliasing common in most place recognition approaches, is considerably less in ours.

Lecture Notes in Computer Science, 2019

Mapping the environment has been an important task for robot navigation and Simultaneous Localiza... more Mapping the environment has been an important task for robot navigation and Simultaneous Localization And Mapping (SLAM). LIDAR provides a fast and accurate 3D point cloud map of the environment which helps in map building. However, processing millions of points in the point cloud becomes a computationally expensive task. In this paper, a methodology is presented to generate the segmented surfaces in real time and these can be used in modeling the 3D objects. At first an algorithm is proposed for efficient map building from single shot data of spinning Lidar. It is based on fast meshing and sub-sampling. It exploits the physical design and the working principle of the spinning Lidar sensor. The generated mesh surfaces are then segmented by estimating the normal and considering their homogeneity. The segmented surfaces can be used as proposals for predicting geometrically accurate model of objects in the robots activity environment. The proposed methodology is compared with some popular point cloud segmentation methods to highlight the efficacy in terms of accuracy and speed.

Applied Soft Computing, 2019

h i g h l i g h t s • The work presents a deconvnet based methodology to detect loop closures in ... more h i g h l i g h t s • The work presents a deconvnet based methodology to detect loop closures in SLAM. • The network utilizes LCA in between conv and deconv to reduce size of image vector. • LCA retains contextual and structural information essential for reconstruction. • The reduced vector enables fast comparison to detect loop closure. • Experiments show proposed methodology outperforms state of the art techniques.

Segmentation of natural images based on super pixel and graph merging

IET Computer Vision, 2020

Multimedia Tools and Applications

In the field of SLAM (Simultaneous Localization And Mapping) for robot navigation, mapping the en... more In the field of SLAM (Simultaneous Localization And Mapping) for robot navigation, mapping the environment is an important task. In this regard the Lidar sensor can produce near accurate 3D map of the environment in the format of point cloud, in real time. Though the data is adequate for extracting information related to SLAM, processing millions of points in the point cloud is computationally quite expensive. The methodology presented proposes a fast algorithm that can be used to extract semantically labelled surface segments from the cloud, in real time, for direct navigational use or higher level contextual scene reconstruction. First, a single scan from a spinning Lidar is used to generate a mesh of subsampled cloud points online. The generated mesh is further used for surface normal computation of those points on the basis of which surface segments are estimated. A novel descriptor to represent the surface segments is proposed and utilized to determine the surface class of the segments (semantic label) with the help of classifier. These semantic surface segments can be further utilized for geometric reconstruction of objects in the scene, or can be used for optimized trajectory planning by a robot. The proposed methodology is compared with number of point cloud segmentation methods and state of the art semantic segmentation methods to emphasize its efficacy in terms of speed and accuracy.

Presenting a brand new idea for the tech-greedy masses who wish to keep up with the world at a hu... more Presenting a brand new idea for the tech-greedy masses who wish to keep up with the world at a humble price - ADD ON TOUCH SCREEN SYSTEM. The basic idea is to use a frame with an invisible grid of IR beams. When the user touches the screen the beams (vertical and horizontal) at that position will be eclipsed by the finger thus triggering a click event for the area in question. The frame has an array of LEDs and LDRs, aligned as couple each, for both X and Y axes. The number of LEDs, that can be stuffed in the frame gives the resolution of the system. The overall setup is strap-on in nature and highly scalable.

This paper describes about an approach for object localization using fuzzy mask of thresholded di... more This paper describes about an approach for object localization using fuzzy mask
of thresholded difference images of training data normalized by Gaussian. By convolving
the mask in different scales in a sliding window approach with sectional selectivity trained
according to variance of values across different classes, the area which returns a likelihood
above a trained threshold is shown as the image location. The idea is based on the idea that
humans detect objects based on their edges and their relative scale.

Object Mapping from Disparity Map by Fast Clustering

2020 IEEE Calcutta Conference (CALCON), 2020

Segmentation of Natural Image Based on Colour Cohesion and Spatial Criteria

2017 Ninth International Conference on Advances in Pattern Recognition (ICAPR), 2017

Two Stage Semantic Segmentation by SEEDS and Fork Net

2020 IEEE Calcutta Conference (CALCON), 2020

Lecture Notes in Computer Science, 2017

Lecture Notes in Computer Science, 2019

Applied Soft Computing, 2019

Segmentation of natural images based on super pixel and graph merging

IET Computer Vision, 2020

Multimedia Tools and Applications