Asish Bera - Academia.edu (original) (raw)

Papers by Asish Bera

Artificial Intelligence, 2022

Lecture notes in electrical engineering, 2023

Advances in Intelligent Systems and Computing, 2014

The use of general descriptive names, registered names, trademarks, service marks, etc. in this p... more The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.

IEEE transactions on image processing, 2022

Over the past few years, a significant progress has been made in deep convolutional neural networ... more Over the past few years, a significant progress has been made in deep convolutional neural networks (CNNs)-based image recognition. This is mainly due to the strong ability of such networks in mining discriminative object pose and parts information from texture and shape. This is often inappropriate for fine-grained visual classification (FGVC) since it exhibits high intra-class and low inter-class variances due to occlusions, deformation, illuminations, etc. Thus, an expressive feature representation describing global structural information is a key to characterize an object/ scene. To this end, we propose a method that effectively captures subtle changes by aggregating contextaware features from most relevant image-regions and their importance in discriminating fine-grained categories avoiding the bounding-box and/or distinguishable part annotations. Our approach is inspired by the recent advancement in self-attention and graph neural networks (GNNs) approaches to include a simple yet effective relation-aware feature transformation and its refinement using a context-aware attention mechanism to boost the discriminability of the transformed feature in an end-to-end learning process. Our model is evaluated on eight benchmark datasets consisting of fine-grained objects and human-object interactions. It outperforms the state-of-the-art approaches by a significant margin in recognition accuracy.

arXiv (Cornell University), Oct 23, 2021

Convolutional Neural Networks (CNNs) have revolutionized the understanding of visual content. Thi... more Convolutional Neural Networks (CNNs) have revolutionized the understanding of visual content. This is mainly due to their ability to break down an image into smaller pieces, extract multi-scale localized features and compose them to construct highly expressive representations for decision making. However, the convolution operation is unable to capture long-range dependencies such as arbitrary relations between pixels since it operates on a fixed-size window. Therefore, it may not be suitable for discriminating subtle changes (e.g. fine-grained visual recognition). To this end, our proposed method captures the high-level long-range dependencies by exploring Graph Convolutional Networks (GCNs), which aggregate information by establishing relationships among multiscale hierarchical regions. These regions consist of smaller (closer look) to larger (far look), and the dependency between regions is modeled by an innovative attention-driven message propagation, guided by the graph structure to emphasize the neighborhoods of a given region. Our approach is simple yet extremely effective in solving both the finegrained and generic visual classification problems. It outperforms the state-of-the-arts with a significant margin on three and is very competitive on other two datasets.

IEEE Transactions on Image Processing

Proceedings of the AAAI Conference on Artificial Intelligence

Deep convolutional neural networks (CNNs) have shown a strong ability in mining discriminative ob... more Deep convolutional neural networks (CNNs) have shown a strong ability in mining discriminative object pose and parts information for image recognition. For fine-grained recognition, context-aware rich feature representation of object/scene plays a key role since it exhibits a significant variance in the same subcategory and subtle variance among different subcategories. Finding the subtle variance that fully characterizes the object/scene is not straightforward. To address this, we propose a novel context-aware attentional pooling (CAP) that effectively captures subtle changes via sub-pixel gradients, and learns to attend informative integral regions and their importance in discriminating different subcategories without requiring the bounding-box and/or distinguishable part annotations. We also introduce a novel feature encoding by considering the intrinsic consistency between the informativeness of the integral regions and their spatial structures to capture the semantic correlatio...

2021 IEEE International Intelligent Transportation Systems Conference (ITSC), 2021

This paper integrates human driver factors with a model-based Collision Avoidance System (CAS) to... more This paper integrates human driver factors with a model-based Collision Avoidance System (CAS) to enhance the safety of semi-autonomous vehicles. Driver Activity Recognition (DAR) through Driver Distraction States (DDS) has been used as the key component to trigger the CAS so that collisions can be averted. DDS has been generated using realistic normal driving scenarios and suitably integrated with a Full State Feedback (FSF) controller-based CAS. The integrated algorithm has been tested using a Hardware in Loop (HiL) setup, which is interfaced with the vehicle dynamics software IPG TruckMaker ®. The performance of the algorithm has been evaluated for various on-road scenarios and found to be effective in avoiding rear-end collisions.

IEEE Transactions on Image Processing, 2021

This paper presents a novel keypoints-based attention mechanism for visual recognition in still i... more This paper presents a novel keypoints-based attention mechanism for visual recognition in still images. Deep Convolutional Neural Networks (CNNs) for recognizing images with distinctive classes have shown great success, but their performance in discriminating fine-grained changes is not at the same level. We address this by proposing an end-to-end CNN model, which learns meaningful features linking fine-grained changes using our novel attention mechanism. It captures the spatial structures in images by identifying semantic regions (SRs) and their spatial distributions, and is proved to be the key to modelling subtle changes in images. We automatically identify these SRs by grouping the detected keypoints in a given image. The "usefulness" of these SRs for image recognition is measured using our innovative attentional mechanism focusing on parts of the image that are most relevant to a given task. This framework applies to traditional and fine-grained image recognition tasks and does not require manually annotated regions (e.g. boundingbox of body parts, objects, etc.) for learning and prediction. Moreover, the proposed keypoints-driven attention mechanism can be easily integrated into the existing CNN models. The framework is evaluated on six diverse benchmark datasets. The model outperforms the state-of-the-art approaches by a considerable margin using Distracted Driver V1 (Acc: 3.39%), Distracted Driver V2 (Acc: 6.58%), Stanford-40 Actions (mAP: 2.15%), People Playing Musical Instruments (mAP: 16.05%), Food-101 (Acc: 6.30%) and Caltech-256 (Acc: 2.59%) datasets.

Multimedia Systems, Mar 30, 2023

Completely Automated Public Turing Test to Tell Computers and Humans Apart (CAPTCHA) is widely us... more Completely Automated Public Turing Test to Tell Computers and Humans Apart (CAPTCHA) is widely used to prevent malicious automated attacks on various online services. Text-and image-CAPTCHAs have shown broader acceptability due to usability and security factors. However, recent progress in deep learning implies that text-CAPTCHAs can easily be exposed to various fraudulent attacks. Thus, image-CAPTCHAs are getting research attention to enhance usability and security. In this work, the Neural Style Transfer (NST) is adapted for designing an image-CAPTCHA algorithm to enhance security while maintaining human performance. In NST-rendered image-CAPTCHAs, existing methods inquire a user to identify or localize the salient object (e.g., content) which is solvable effortlessly by off-the-shelf intelligent tools. Contrarily, we propose a Style Matching CAPTCHA (SMC) that asks a user to select the style image which is applied in the NST method. A user can solve a random SMC challenge by understanding the semantic correlation between the content and style output as a cue. The performance in solving SMC is evaluated based on the 1368 responses collected from 152 participants through a web-application. The average solving accuracy in three sessions is 95.61%; and the average response time for each challenge per user is 6.52 seconds, respectively. Likewise, a Smartphone Application (SMC-App) is devised using the proposed method. The average solving accuracy through SMC-App is 96.33%, and the average solving time is 5.13 seconds. To evaluate the vulnerability of SMC, deep learning-based attack schemes using Convolutional Neural Networks (CNN), such as ResNet-50 and Inception-v3 are simulated. The average accuracy of attacks considering various studies on SMC using ResNet-50 and Inception-v3 is 37%, which is improved over

Lecture notes in networks and systems, 2023

IEEE Transactions on Instrumentation and Measurement

Multimedia Systems

Hand biometrics is globally deployed for automated human identification based on the discriminati... more Hand biometrics is globally deployed for automated human identification based on the discriminative geometric characteristics of hand. Advancements in hand biometric technologies are accomplished over several decades. The key objectives of this paper are two-fold. Firstly, it presents a comprehensive study on the state-of-the-art methods based on the hand images collected in an unconstraint environment. Secondly, a pose-invariant hand geometry system is excogitated. The experiments are conducted with the weighted geometric features computed from the fingers. The feature weighted k-nearest neighbor (fwk-NN) classifier is applied on the right- and left-hand images of the 500 subjects of the Bosphorus database for performance evaluation. The classification accuracy of 97% has been achieved for both of the hands using the fwk-NN classifier. Equal error rates (EER) of 5.94% and 6.08% are achieved for the right- and left-hand 500 subjects, respectively.

Expert Systems with Applications, 2021

Multimedia Tools and Applications, 2021

Recent research on biometrics focuses on achieving a high success rate of authentication and addr... more Recent research on biometrics focuses on achieving a high success rate of authentication and addressing the concern of various spoofing attacks. Although hand geometry recognition provides adequate security over unauthorized access, it is susceptible to presentation attack. This paper presents an anti-spoofing method toward hand biometrics. A presentation attack detection approach is addressed by assessing the visual quality of genuine and fake hand images. A thresholdbased gradient magnitude similarity quality metric is proposed to discriminate between the real and spoofed hand samples. The visual hand images of 255 subjects from the Bogazici University hand database are considered as original samples. Correspondingly, from each genuine sample, we acquire a forged image using a Canon EOS 700D camera. Such fake hand images with natural degradation are considered for elec

Advances in Intelligent Systems and Computing, 2018

An approach for hand biometric recognition with the hand image-based CAPTCHA verification is pres... more An approach for hand biometric recognition with the hand image-based CAPTCHA verification is presented in this paper. A new method for CAPTCHA generation is implemented based on the genuine and fake hand images which are embedded in a complex textured color background image. The HandCaptcha is a useful application to differentiate between the human and automated scripts. The first level of security is achieved by the HandCaptcha against the malicious threats and attacks. After solving the HandCaptcha correctly, the identity of a person is authenticated based on the contact-less hand geometric verification approach in the second level. A set of 300 unique HandCaptcha is created randomly and solved by at least 100 persons with the accuracy of 98.34%. Next, the left-hand images of the legitimate users are normalized, and sixteen geometric features are computed from every normalized hand. Experiments are conducted on the 200 subjects of the Bosporus left-hand database. Classification accuracy of 99.5% has been achieved using the kNN classifier, and the equal error rate is 3.93%.

IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2017

A finger biometric system at an unconstrained environment is presented in this paper. A technique... more A finger biometric system at an unconstrained environment is presented in this paper. A technique for hand image normalization is implemented at the preprocessing stage that decomposes the main hand contour into finger-level shape representation. This normalization technique follows subtraction of transformed binary image from binary hand contour image to generate the left-side of finger profiles (LSFPs). Then, XOR is applied to LSFP image and hand contour image to produce the right side of finger profiles. During feature extraction, initially, 30 geometric features are computed from every normalized finger. The rank-based forward-backward greedy algorithm is followed to select relevant features and to enhance classification accuracy. Two different subsets of features containing 9 and 12 discriminative features per finger are selected for two separate experimentations those use the k-nearest neighbor and the random forest (RF) for classification on the Bosphorus hand database. The experiments with the selected features of four fingers except the thumb have obtained improved performances compared to features extracted from five fingers and also other existing methods evaluated on the Bosphorus database. The best identification accuracies of 96.56% and 95.92% using the RF classifier have been achieved for the right-and left-hand images of 638 subjects, respectively. An equal error rate of 0.078 is obtained for both types of the hand images.

Multimedia Tools and Applications, 2016

This paper presents a contactless hand biometric system at unrestricted hand pose environment. A ... more This paper presents a contactless hand biometric system at unrestricted hand pose environment. A new preprocessing technique is proposed for defining the finger contour profiles (FCP). It mainly consists of simple grayscale image transformation, subtraction, and logical XOR operation. This hand prototyping method logically decomposes global hand contour into the left and right contour profiles of each finger. A set of twenty pose-invariant geometric features is extracted from the FCP and normalized global hand shape. Experiments are conducted on two publicly available hand databases namely, the Bosphorus and IIT Delhi (IITD) databases to validate the system using the kNN, minimum distance, and random forest (RF) classifiers. Satisfactory identification accuracy of 97.82 % using the RF classifier has been achieved for the Bosphorus database with 320 subjects; and in verification, 3.28 % equal error rate (EER) is reported. The kNN classifier has been found to produce good identification success of 95.22 % for the IITD database of 230 subjects; and 4.76 % EER is obtained in verification. The average execution time of this approach is lesser than 2 s, that implies its suitability in real-world applications.

International Journal of Pattern Recognition and Artificial Intelligence, 2015

This paper presents a new technique for user identification and recognition based on the fusion o... more This paper presents a new technique for user identification and recognition based on the fusion of hand geometric features of both hands without any pose restrictions. All the features are extracted from normalized left and right hand images. Fusion is applied at feature and also at decision level. Two probability-based algorithms are proposed for classification. The first algorithm computes the maximum probability for nearest three neighbors. The second algorithm determines the maximum probability of the number of matched features with respect to a thresholding on distances. Based on these two highest probabilities initial decisions are made. The final decision is considered according to the highest probability as calculated by the Dempster–Shafer theory of evidence. Depending on the various combinations of the initial decisions, three schemes are experimented with 201 subjects for identification and verification. The correct identification rate is found to be 99.5%, and the false ...