Tapan Gandhi - Academia.edu (original) (raw)
Papers by Tapan Gandhi
ArXiv, 2021
This paper proposes an efficient video summarization framework that will give a gist of the entir... more This paper proposes an efficient video summarization framework that will give a gist of the entire video in a few key-frames or video skims. Existing video summarization frameworks are based on algorithms that utilize computer vision low-level feature extraction or high-level domain level extraction. However, being the ultimate user of the summarized video, humans remain the most neglected aspect. Therefore, the proposed paper considers human’s role in summarization and introduces human visual attention-based summarization techniques. To understand human attention behavior, we have designed and performed experiments with human participants using electroencephalogram (EEG) and eye-tracking technology. The EEG and eye-tracking data obtained from the experimentation are processed simultaneously and used to segment frames containing useful information from a considerable video volume. Thus, the frame segmentation primarily relies on the cognitive judgments of human beings. Using our app...
2019 16th International Conference on Machine Vision Applications (MVA), 2019
Plant Phenomics based on imaging based techniques can be used to monitor the health and the disea... more Plant Phenomics based on imaging based techniques can be used to monitor the health and the diseases of plants and crops. The use of 3D data for plant phenomics is a recent phenomenon. However, since 3D point cloud contains more information than plant images, in this paper, we compare the performance of different keypoint detectors and local feature descriptors combinations for the plant growth stage and it's growth condition classification based on 3D point clouds of the plants. We have also implemented a modified form of 3D SIFT descriptor, that is invariant to rotation and is computationally less intense than most of the 3D SIFT descriptors reported in the existing literature. The performance is evaluated in terms of the classification accuracy and the results are presented in terms of accuracy tables. We find the ISS-SHOT and the SIFT-SIFT combinations consistently perform better and Fisher Vector (FV) is a better encoder than Vector of Linearly Aggregated (VLAD) for such applications. It can serve as a better modality.
ArXiv, 2021
Facial Expression Recognition from static images is a challenging problem in computer vision appl... more Facial Expression Recognition from static images is a challenging problem in computer vision applications. Convolutional Neural Network (CNN), the state-of-the-art method for various computer vision tasks, has had limited success in predicting expressions from faces having extreme poses, illumination, and occlusion conditions. To mitigate this issue, CNNs are often accompanied by techniques like transfer, multi-task, or ensemble learning that often provide high accuracy at the cost of high computational complexity. In this work, we propose a Part-based Ensemble Transfer Learning network, which models how humans recognize facial expressions by correlating the spatial orientation pattern of the facial features with a specific expression. It consists of 5 sub-networks, in which each sub-network performs transfer learning from one of the five subsets of facial landmarks: eyebrows, eyes, nose, mouth, or jaw to expression classification. We test the proposed network on the CK+, JAFFE, and...
ArXiv, 2019
Plant Phenomics can be used to monitor the health and the growth of plants. Computer vision appli... more Plant Phenomics can be used to monitor the health and the growth of plants. Computer vision applications like stereo reconstruction, image retrieval, object tracking, and object recognition play an important role in imaging based plant phenotyping. This paper offers a comparative evaluation of some popular 3D correspondence grouping algorithms, motivated by the important role that they can play in tasks such as model creation, plant recognition and identifying plant parts. Another contribution of this paper is the extension of 2D maximum likelihood matching to 3D Maximum Likelihood Estimation Sample Consensus (MLEASAC). MLESAC is efficient and is computationally less intense than 3D random sample consensus (RANSAC). We test these algorithms on 3D point clouds of plants along with two standard benchmarks addressing shape retrieval and point cloud registration scenarios. The performance is evaluated in terms of precision and recall.
ArXiv, 2021
To meet the needs of a growing world population, we need to increase the global agricultural yiel... more To meet the needs of a growing world population, we need to increase the global agricultural yields by employing modern, precision, and automated farming methods. In the recent decade, high-throughput plant phenotyping techniques, which combine non-invasive image analysis and machine learning, have been successfully applied to identify and quantify plant health and diseases. However, these image-based machine learning usually do not consider plant stress’s progressive or temporal nature. This time-invariant approach also requires images showing severe signs of stress to ensure high confidence detections, thereby reducing this approach’s feasibility for early detection and recovery of plants under stress. In order to overcome the problem mentioned above, we propose a temporal analysis of the visual changes induced in the plant due to stress and apply it for the specific case of water stress identification in Chickpea plant shoot images. For this, we have considered an image dataset o...
Vision Research, 2021
Early visual deprivation is known to have profound consequences on the subsequent development of ... more Early visual deprivation is known to have profound consequences on the subsequent development of spatial visual processing. However, its impact on temporal processing is not well characterized. We have examined spatial and temporal contrast sensitivity functions following treatment for early and extended bilateral visual deprivation in fifteen children born with congenital cataracts in rural India. The results reveal a marked difference in post-treatment spatial and temporal sensitivities. Whereas spatial processing in newly sighted children is significantly impaired relative to age-matched controls, temporal processing exhibits remarkable resilience and is comparable to that in the control group. This difference in spatial and temporal outcomes is especially surprising given our computational analyses of video sequences which indicate a strong linkage between the spatial and temporal spectral content of natural visual inputs. We consider possible explanations for this discrepancy.
IEEE Journal of Translational Engineering in Health and Medicine
Background: Accurate and fast diagnosis of COVID-19 is very important to manage the medical condi... more Background: Accurate and fast diagnosis of COVID-19 is very important to manage the medical conditions of affected persons. The task is challenging owing to shortage and ineffectiveness of clinical testing kits. However, the existing problems can be improved by employing computational intelligent techniques on radiological images like CT-Scans (Computed Tomography) of lungs. Extensive research has been reported using deep learning models to diagnose the severity of COVID-19 from CT images. This has undoubtedly minimized the manual involvement in abnormality identification but reported detection accuracy is limited. Methods: The present work proposes an expert model based on deep features and Parameter Free BAT (PF-BAT) optimized Fuzzy K-nearest neighbor (PF-FKNN) classifier to diagnose novel coronavirus. In this proposed model, features are extracted from the fully connected layer of transfer learned MobileNetv2 followed by FKNN training. The hyperparameters of FKNN are fine-tuned using PF-BAT. Results: The experimental results on the benchmark COVID CT scan data reveal that the proposed algorithm attains a validation accuracy of 99.38% which is better than the existing state-of-the-art methods proposed in past. Conclusion: The proposed model will help in timely and accurate identification of the coronavirus at the various phases. Such kind of rapid diagnosis will assist clinicians to manage the healthcare condition of patients well and will help in speedy recovery from the diseases. INDEX TERMS COVID-19, diagnosis, deep features, parameter free BAT optimization. Clinical and Translational Impact Statement-The proposed automated system can provide accurate and fast detection of COVID-19 signature from lung radiographs. Also, the usage of lighter MobileNetv2 architecture makes it practical for deployment in real-time.
2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
2021 8th International Conference on Signal Processing and Integrated Networks (SPIN)
Computer Vision and Machine Intelligence in Medical Image Analysis
Translational Vision Science & Technology
Frontiers in Neuroscience
Standard automated perimetry (SAP) is the gold standard for evaluating the presence of visual fie... more Standard automated perimetry (SAP) is the gold standard for evaluating the presence of visual field defects (VFDs). Nevertheless, it has requirements such as prolonged attention, stable fixation, and a need for a motor response that limit application in various patient groups. Therefore, a novel approach using eye movements (EMs) – as a complementary technique to SAP – was developed and tested in clinical settings by our group. However, the original method uses a screen-based eye-tracker which still requires participants to keep their chin and head stable. Virtual reality (VR) has shown much promise in ophthalmic diagnostics – especially in terms of freedom of head movement and precise control over experimental settings, besides being portable. In this study, we set out to see if patients can be screened for VFDs based on their EM in a VR-based framework and if they are comparable to the screen-based eyetracker. Moreover, we wanted to know if this framework can provide an effective ...
2020 International Conference on Contemporary Computing and Applications (IC3A)
IEEE Transactions on Instrumentation and Measurement
Applied Intelligence
Lung abnormality is one of the common diseases in humans of all age group and this disease may ar... more Lung abnormality is one of the common diseases in humans of all age group and this disease may arise due to various reasons. Recently, the lung infection due to SARS-CoV-2 has affected a larger human community globally, and due to its rapidity, the World-Health-Organisation (WHO) declared it as pandemic disease. The COVID-19 disease has adverse effects on the respiratory system, and the infection severity can be detected using a chosen imaging modality. In the proposed research work; the COVID-19 is detected using transfer learning from CT scan images decomposed to three-level using stationary wavelet. A three-phase detection model is proposed to improve the detection accuracy and the procedures are as follows; Phase1-data augmentation using stationary wavelets, Phase2-COVID-19 detection using pre-trained CNN model and Phase3-abnormality localization in CT scan images. This work has considered the well known pre-trained architectures, such as ResNet18, ResNet50, ResNet101, and SqueezeNet for the experimental evaluation. In this work, 70% of images are considered to train the network and 30% images are considered to validate the network. The performance of the considered architectures is evaluated by computing the common performance measures. The result of the experimental evaluation confirms that the ResNet18 pre-trained transfer learning-based model offered better classification accuracy (training=99.82%, validation=97.32%, and testing=99.4%) on the considered image dataset compared with the alternatives.
This paper introduces a novel shallow self-supervised tensor neural network for volumetric segmen... more This paper introduces a novel shallow self-supervised tensor neural network for volumetric segmentation of brain MR images obviating training or supervision. The proposed network is a 3D version of the Quantum-Inspired Self Supervised Neural Network (QIS-Net) architecture and is referred to as 3D Quantum-inspired Self-supervised Tensor Neural Network (3D-QNet). The underlying architecture of 3D-QNet is composed of a trinity of volumetric layers viz. input, intermediate and output layers inter-connected using a 26-connected third-order neighborhood-based topology for voxel-wise processing of 3D MR image data suitable for semantic segmentation. Each of the volumetric layers contains quantum neurons designated by qubits or quantum bits. The incorporationof tensor decomposition in quantum formalism leads to faster convergence of the network operations to preclude the inherent slow convergence problems faced by the self-supervised networks. The segmented volumes are obtained once the net...
In the proposed research work; the COVID-19 is detected using transfer learning from CT scan imag... more In the proposed research work; the COVID-19 is detected using transfer learning from CT scan images decomposed to three-level using stationary wavelet. A three-phase detection model is proposed to improve the detection accuracy and the procedures are as follows; Phase1- data augmentation using stationary wavelets, Phase2- COVID-19 detection using pre-trained CNN model and Phase3- abnormality localization in CT scan images. This work has considered the well known pre-trained architectures, such as ResNet18, ResNet50, ResNet101, and SqueezeNet for the experimental evaluation. In this work, 70% of images are considered to train the network and 30% images are considered to validate the network. The performance of the considered architectures is evaluated by computing the common performance measures.
ArXiv, 2021
This paper proposes an efficient video summarization framework that will give a gist of the entir... more This paper proposes an efficient video summarization framework that will give a gist of the entire video in a few key-frames or video skims. Existing video summarization frameworks are based on algorithms that utilize computer vision low-level feature extraction or high-level domain level extraction. However, being the ultimate user of the summarized video, humans remain the most neglected aspect. Therefore, the proposed paper considers human’s role in summarization and introduces human visual attention-based summarization techniques. To understand human attention behavior, we have designed and performed experiments with human participants using electroencephalogram (EEG) and eye-tracking technology. The EEG and eye-tracking data obtained from the experimentation are processed simultaneously and used to segment frames containing useful information from a considerable video volume. Thus, the frame segmentation primarily relies on the cognitive judgments of human beings. Using our app...
2019 16th International Conference on Machine Vision Applications (MVA), 2019
Plant Phenomics based on imaging based techniques can be used to monitor the health and the disea... more Plant Phenomics based on imaging based techniques can be used to monitor the health and the diseases of plants and crops. The use of 3D data for plant phenomics is a recent phenomenon. However, since 3D point cloud contains more information than plant images, in this paper, we compare the performance of different keypoint detectors and local feature descriptors combinations for the plant growth stage and it's growth condition classification based on 3D point clouds of the plants. We have also implemented a modified form of 3D SIFT descriptor, that is invariant to rotation and is computationally less intense than most of the 3D SIFT descriptors reported in the existing literature. The performance is evaluated in terms of the classification accuracy and the results are presented in terms of accuracy tables. We find the ISS-SHOT and the SIFT-SIFT combinations consistently perform better and Fisher Vector (FV) is a better encoder than Vector of Linearly Aggregated (VLAD) for such applications. It can serve as a better modality.
ArXiv, 2021
Facial Expression Recognition from static images is a challenging problem in computer vision appl... more Facial Expression Recognition from static images is a challenging problem in computer vision applications. Convolutional Neural Network (CNN), the state-of-the-art method for various computer vision tasks, has had limited success in predicting expressions from faces having extreme poses, illumination, and occlusion conditions. To mitigate this issue, CNNs are often accompanied by techniques like transfer, multi-task, or ensemble learning that often provide high accuracy at the cost of high computational complexity. In this work, we propose a Part-based Ensemble Transfer Learning network, which models how humans recognize facial expressions by correlating the spatial orientation pattern of the facial features with a specific expression. It consists of 5 sub-networks, in which each sub-network performs transfer learning from one of the five subsets of facial landmarks: eyebrows, eyes, nose, mouth, or jaw to expression classification. We test the proposed network on the CK+, JAFFE, and...
ArXiv, 2019
Plant Phenomics can be used to monitor the health and the growth of plants. Computer vision appli... more Plant Phenomics can be used to monitor the health and the growth of plants. Computer vision applications like stereo reconstruction, image retrieval, object tracking, and object recognition play an important role in imaging based plant phenotyping. This paper offers a comparative evaluation of some popular 3D correspondence grouping algorithms, motivated by the important role that they can play in tasks such as model creation, plant recognition and identifying plant parts. Another contribution of this paper is the extension of 2D maximum likelihood matching to 3D Maximum Likelihood Estimation Sample Consensus (MLEASAC). MLESAC is efficient and is computationally less intense than 3D random sample consensus (RANSAC). We test these algorithms on 3D point clouds of plants along with two standard benchmarks addressing shape retrieval and point cloud registration scenarios. The performance is evaluated in terms of precision and recall.
ArXiv, 2021
To meet the needs of a growing world population, we need to increase the global agricultural yiel... more To meet the needs of a growing world population, we need to increase the global agricultural yields by employing modern, precision, and automated farming methods. In the recent decade, high-throughput plant phenotyping techniques, which combine non-invasive image analysis and machine learning, have been successfully applied to identify and quantify plant health and diseases. However, these image-based machine learning usually do not consider plant stress’s progressive or temporal nature. This time-invariant approach also requires images showing severe signs of stress to ensure high confidence detections, thereby reducing this approach’s feasibility for early detection and recovery of plants under stress. In order to overcome the problem mentioned above, we propose a temporal analysis of the visual changes induced in the plant due to stress and apply it for the specific case of water stress identification in Chickpea plant shoot images. For this, we have considered an image dataset o...
Vision Research, 2021
Early visual deprivation is known to have profound consequences on the subsequent development of ... more Early visual deprivation is known to have profound consequences on the subsequent development of spatial visual processing. However, its impact on temporal processing is not well characterized. We have examined spatial and temporal contrast sensitivity functions following treatment for early and extended bilateral visual deprivation in fifteen children born with congenital cataracts in rural India. The results reveal a marked difference in post-treatment spatial and temporal sensitivities. Whereas spatial processing in newly sighted children is significantly impaired relative to age-matched controls, temporal processing exhibits remarkable resilience and is comparable to that in the control group. This difference in spatial and temporal outcomes is especially surprising given our computational analyses of video sequences which indicate a strong linkage between the spatial and temporal spectral content of natural visual inputs. We consider possible explanations for this discrepancy.
IEEE Journal of Translational Engineering in Health and Medicine
Background: Accurate and fast diagnosis of COVID-19 is very important to manage the medical condi... more Background: Accurate and fast diagnosis of COVID-19 is very important to manage the medical conditions of affected persons. The task is challenging owing to shortage and ineffectiveness of clinical testing kits. However, the existing problems can be improved by employing computational intelligent techniques on radiological images like CT-Scans (Computed Tomography) of lungs. Extensive research has been reported using deep learning models to diagnose the severity of COVID-19 from CT images. This has undoubtedly minimized the manual involvement in abnormality identification but reported detection accuracy is limited. Methods: The present work proposes an expert model based on deep features and Parameter Free BAT (PF-BAT) optimized Fuzzy K-nearest neighbor (PF-FKNN) classifier to diagnose novel coronavirus. In this proposed model, features are extracted from the fully connected layer of transfer learned MobileNetv2 followed by FKNN training. The hyperparameters of FKNN are fine-tuned using PF-BAT. Results: The experimental results on the benchmark COVID CT scan data reveal that the proposed algorithm attains a validation accuracy of 99.38% which is better than the existing state-of-the-art methods proposed in past. Conclusion: The proposed model will help in timely and accurate identification of the coronavirus at the various phases. Such kind of rapid diagnosis will assist clinicians to manage the healthcare condition of patients well and will help in speedy recovery from the diseases. INDEX TERMS COVID-19, diagnosis, deep features, parameter free BAT optimization. Clinical and Translational Impact Statement-The proposed automated system can provide accurate and fast detection of COVID-19 signature from lung radiographs. Also, the usage of lighter MobileNetv2 architecture makes it practical for deployment in real-time.
2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
2021 8th International Conference on Signal Processing and Integrated Networks (SPIN)
Computer Vision and Machine Intelligence in Medical Image Analysis
Translational Vision Science & Technology
Frontiers in Neuroscience
Standard automated perimetry (SAP) is the gold standard for evaluating the presence of visual fie... more Standard automated perimetry (SAP) is the gold standard for evaluating the presence of visual field defects (VFDs). Nevertheless, it has requirements such as prolonged attention, stable fixation, and a need for a motor response that limit application in various patient groups. Therefore, a novel approach using eye movements (EMs) – as a complementary technique to SAP – was developed and tested in clinical settings by our group. However, the original method uses a screen-based eye-tracker which still requires participants to keep their chin and head stable. Virtual reality (VR) has shown much promise in ophthalmic diagnostics – especially in terms of freedom of head movement and precise control over experimental settings, besides being portable. In this study, we set out to see if patients can be screened for VFDs based on their EM in a VR-based framework and if they are comparable to the screen-based eyetracker. Moreover, we wanted to know if this framework can provide an effective ...
2020 International Conference on Contemporary Computing and Applications (IC3A)
IEEE Transactions on Instrumentation and Measurement
Applied Intelligence
Lung abnormality is one of the common diseases in humans of all age group and this disease may ar... more Lung abnormality is one of the common diseases in humans of all age group and this disease may arise due to various reasons. Recently, the lung infection due to SARS-CoV-2 has affected a larger human community globally, and due to its rapidity, the World-Health-Organisation (WHO) declared it as pandemic disease. The COVID-19 disease has adverse effects on the respiratory system, and the infection severity can be detected using a chosen imaging modality. In the proposed research work; the COVID-19 is detected using transfer learning from CT scan images decomposed to three-level using stationary wavelet. A three-phase detection model is proposed to improve the detection accuracy and the procedures are as follows; Phase1-data augmentation using stationary wavelets, Phase2-COVID-19 detection using pre-trained CNN model and Phase3-abnormality localization in CT scan images. This work has considered the well known pre-trained architectures, such as ResNet18, ResNet50, ResNet101, and SqueezeNet for the experimental evaluation. In this work, 70% of images are considered to train the network and 30% images are considered to validate the network. The performance of the considered architectures is evaluated by computing the common performance measures. The result of the experimental evaluation confirms that the ResNet18 pre-trained transfer learning-based model offered better classification accuracy (training=99.82%, validation=97.32%, and testing=99.4%) on the considered image dataset compared with the alternatives.
This paper introduces a novel shallow self-supervised tensor neural network for volumetric segmen... more This paper introduces a novel shallow self-supervised tensor neural network for volumetric segmentation of brain MR images obviating training or supervision. The proposed network is a 3D version of the Quantum-Inspired Self Supervised Neural Network (QIS-Net) architecture and is referred to as 3D Quantum-inspired Self-supervised Tensor Neural Network (3D-QNet). The underlying architecture of 3D-QNet is composed of a trinity of volumetric layers viz. input, intermediate and output layers inter-connected using a 26-connected third-order neighborhood-based topology for voxel-wise processing of 3D MR image data suitable for semantic segmentation. Each of the volumetric layers contains quantum neurons designated by qubits or quantum bits. The incorporationof tensor decomposition in quantum formalism leads to faster convergence of the network operations to preclude the inherent slow convergence problems faced by the self-supervised networks. The segmented volumes are obtained once the net...
In the proposed research work; the COVID-19 is detected using transfer learning from CT scan imag... more In the proposed research work; the COVID-19 is detected using transfer learning from CT scan images decomposed to three-level using stationary wavelet. A three-phase detection model is proposed to improve the detection accuracy and the procedures are as follows; Phase1- data augmentation using stationary wavelets, Phase2- COVID-19 detection using pre-trained CNN model and Phase3- abnormality localization in CT scan images. This work has considered the well known pre-trained architectures, such as ResNet18, ResNet50, ResNet101, and SqueezeNet for the experimental evaluation. In this work, 70% of images are considered to train the network and 30% images are considered to validate the network. The performance of the considered architectures is evaluated by computing the common performance measures.