Kshitiz Kumar - Academia.edu (original) (raw)

Papers by Kshitiz Kumar

Research paper thumbnail of Maximizing Audio Event Detection Model Performance on Small Datasets Through Knowledge Transfer, Data Augmentation, and Pretraining: an Ablation Study

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Research paper thumbnail of Modified surgical technique for lamellar macular holes with lamellar hole-associated epiretinal proliferation (LHEP)

International Ophthalmology

To evaluate the efficacy and safety of lamellar hole-associated epiretinal proliferation embeddin... more To evaluate the efficacy and safety of lamellar hole-associated epiretinal proliferation embedding technique with modification in the surgical management of degenerative lamellar macular hole (LMH). There is retrospective case series of consecutive eyes who underwent pars plana vitrectomy with LHEP embedding with internal limiting membrane (ILM) inversion technique for degenerative LMH. Primary outcome measure was improvement in foveal contour and central foveal thickness (CFT). Secondary outcome measures were changes in best corrected visual acuity (BCVA), status of outer retinal layers (external limiting membrane-ELM & ellipsoid zone-EZ) and complications. Ten eyes were operated by modified LHEP embedding technique. Mean age was 65.8 ± 5.3 years with 1:1 male to female ratio. Simultaneous cataract surgery was done in 70% cases. Mean follow-up duration was 7.9 ± 0.87 months. 80% (8/10) eyes had improvement in foveal contour to normal appearance with increase in residual foveal thickness from 90.2 ± 26.83 microns to CFT of 226 ± 35.44 microns at 6 months (p = 0.0054). Mean BCVA improved from 0.69 ± 0.19 logMAR to 0.32 ± 0.29 logMAR (p = 0.012). External limiting membrane (ELM) and ellipsoid zone (EZ) defects were present in four eyes (40%) pre-operatively. At the final visit 2 eyes (20%) had persistent defect in both ELM & EZ. None of the eyes progressed to full-thickness macular hole following surgery. The modified surgical technique of LHEP Embedding with ILM inversion is demonstrated to provide satisfactory results with reduced risk of complications for degenerative LMH. Larger and long-term follow-up studies are needed to establish this technique as standard surgical procedure for LMH with LHEP.

Research paper thumbnail of Word Characters and Phone Pronunciation Embedding for ASR Confidence Classifier

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019

Confidence classifier is an integral component of an automatic speech recognition (ASR) system. T... more Confidence classifier is an integral component of an automatic speech recognition (ASR) system. These classifiers predict the accuracy of an ASR hypothesis by associating a confidence score in [0,1] range, where larger score implies higher probability of the hypothesis being correct. Confidence scores have significant applications in ASR system design, training data selection, model adaptation, and other ASR applications. In this work we focus on word embedding features to improve confidence classifier, and introduce character and phone embeddings as confidence features. We motivate these features in the context of representing and factorizing acoustic scores along the proposed features. We evaluate our work on large scale ASR tasks, and demonstrate significant improvement in the confidence performance with the proposed features. At our typical operating point, we report 8% relative reduction in false alarm (FA) for limited vocabulary enUS Xbox task, and 9.9% relative reduction in FA for large vocabulary enUS server task. We also conducted server experiments for our proposed features in combination with natural language Glove embeddings, and improved the overall relative reduction in FA to 16%.

Research paper thumbnail of Speaker Adaptation for End-to-End CTC Models

2018 IEEE Spoken Language Technology Workshop (SLT), 2018

Research paper thumbnail of Noise Robust Speaker Identification Using Bhattacharya Distance In Adapted Gaussian Models Space

Publication in the conference proceedings of EUSIPCO, Lausanne, Switzerland, 2008

Research paper thumbnail of A Spectro-Temporal Framework for Compensation of Reverberation for Speech Recognition

The objective of this thesis is the development of signal processing and analysis techniques that... more The objective of this thesis is the development of signal processing and analysis techniques that would provide sharply improved speech recognition accuracy in highly reverberant environments. Speech is a natural medium of communication for humans, and in the last decade various speech technologies like automatic speech recognition (ASR), voice response systems etc. have considerably matured. The above systems rely on the clarity of the captured speech but many of the real-world environments include noise and reverberation that mitigate the system performance. The key focus of the thesis is on the robustness of ASR to reverberation. In our work, we first provide a new framework to adequately and efficiently represent the problem of reverberation in speech feature domains. Although our framework incurs modeling approximation errors, we believe that it provides a good basis for developing reverberation compensation algorithms. Based on our framework, we successfully develop a number o...

Research paper thumbnail of Design, development and statistical optimization of ginger peeling machine

Agricultural Engineering International: The CIGR Journal, 2018

The present research aims at developing a ginger peeling machine which can peel the outer skin of... more The present research aims at developing a ginger peeling machine which can peel the outer skin of ginger with less mass loss. Machine and product parameters for the developed ginger peeler were optimized. Fresh gingers with moisture content 87.47% and pre-treated with 1%NaOH solution exhibited highest peeling efficiency (70.20%), followed by hot-water soaking and overnight soaking. At constant moisture content, reverse trend was observed for mass loss. Highest mass loss of about 4.13% was seen with hot water soaked samples, followed by overnight soaking and NaOH treatment. Gingers with 87.47% moisture content and with pre-treatment at 1% NaOH solution exhibited maximum peeling efficiency. Keywords : Ginger, Peeling machine, Peeling efficiency, Pre-treatment.

Research paper thumbnail of Incidence of Paradoxical Neurosensory Detachment in Diabetic Eyes Undergoing Hemodialysis for End-Stage Renal Disease

Cureus, 2021

Introduction Ocular fluid dynamics are known to improve during hemodialysis, and the improvement ... more Introduction Ocular fluid dynamics are known to improve during hemodialysis, and the improvement of uremia after dialysis may lead to osmotic pressure changes in the retina, which eventually affect retinal edema. Recent studies using optical coherence tomography (OCT) to assess the effect of hemodialysis on macular thickness have shown variable results with a majority of them finding a decrease in retinal thickness. Paradoxical neurosensory retinal detachment (NSD) may be defined as the accumulation of subretinal fluid under the macula in patients who are on continuous HD. The purpose of the study was to find out the incidence of paradoxical neurosensory detachment in diabetic eyes undergoing hemodialysis (HD) and its management. Methods This was a cross-sectional, prospective study involving end-stage renal disease (ESRD) patients secondary to diabetes. This study evaluated the changes in macular thickness in diabetic retinopathy patients with and without diabetic macular edema (DM...

Research paper thumbnail of Signal separation for robust speech recognition based on phase difference information obtained in the frequency domain

Research paper thumbnail of Intermediate-layer DNN adaptation for offline and session-based iterative speaker adaptation

Interspeech 2015, 2015

In this work we present intermediate-layer deep neural network adaptation (DNN) techniques upon w... more In this work we present intermediate-layer deep neural network adaptation (DNN) techniques upon which we build offline as well as iterative speaker adaptation for online applications. We motivate our online work for task completion in Microsoft personal voice assistant, where we present different adaptation styles in a speech session e.g., (a) adapt the speakerindependent (SI) model on the current utterance, (b) recursively adapt an incremental speaker-dependent (SD) model in the session for just the previous utterance, (c) adapt the SI model for all past utterances in the session. We considered a number of adaptation techniques and demonstrated that the intermediatelayer approach with inserting-and-adapting a linear layer on top of an intermediate singular-value-decomposition layer provides the best results for offline adaptation, where we obtained respectively 22.6% and 12% relative reduction in word-errorrate (WER) for supervised and unsupervised adaptation on 100utterances. An alternative intermediate-layer recursive adaptation in a 5-utterances session provided 6% relative-reduction in WER for online applications.

Research paper thumbnail of Advances in application of ultrasound in food processing: A review

Ultrasonics Sonochemistry, 2021

Research paper thumbnail of Transfer Learning Approaches for Streaming End-to-End Speech Recognition System

Research paper thumbnail of Severe Vaso-Occlusive Retinopathy in Systemic Lupus Erythematosus: A Case Series

Research paper thumbnail of Intravitreal dexamethasone implant with retinal photocoagulation for adult-onset Coats' disease

International ophthalmology, Jan 29, 2018

To report a case of adult-onset Coats' disease with secondary retinal vasoproliferative tumor... more To report a case of adult-onset Coats' disease with secondary retinal vasoproliferative tumor managed with dexamethasone intravitreal implant and retinal photocoagulation. Case study. A 41-year-old female with counting finger vision was diagnosed with Coats' disease with secondary retinal vasoproliferative tumor in right eye. Fundus examination revealed exudative retinopathy involving posterior pole and a retinal tumor located in the inferotemporal quadrant. Optical coherence tomography scan confirmed massive exudative neurosensory detachment and fundus fluorescein angiography showed areas of telangiectatic vessels with capillary non-perfusion. Intravitreal injection of dexamethasone implant was done initially followed by laser photocoagulation when the detachment resolved. There was significant improvement in patient's visual acuity with no further recurrence of exudation. Intravitreal dexamethasone implant Ozurdex() (Allergan, Inc., Irvine, Calif., USA) may be an effec...

Research paper thumbnail of Novel continuous roasting of chickpea (Cicer arietinum): Study on physico-functional, antioxidant and roasting characteristics

Research paper thumbnail of Non-negative intermediate-layer DNN adaptation for a 10-KB speaker adaptation profile

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016

Previously we demonstrated that speaker adaptation of acoustic models (AM) can provide significan... more Previously we demonstrated that speaker adaptation of acoustic models (AM) can provide significant improvement in the accuracy of large-scale speech recognition systems. In this work we discuss numerous challenges in scaling speaker adaptation to millions of speakers, where the size of speaker-dependent (SD) parameters is a critical challenge. Subsequently, we formulate an intermediate-layer adaptation framework for adaptation, upon which we build a non-negative adaptation for a very sparse set of non-negative SD parameters. We further improve this work with, (a) non-negative adaptation with a small-positive threshold, (b) setting small-positive weights in an already trained non-negative model to zero. We also discuss effective methods to store the non-negative SD parameters. We show that our methods reduce the SD parameters from 86KB for our previous best adaptation approach to 8.8KB, thus about 90% relative reduction in the size of SD parameters, and still retain 10+% word-error-rate-relative (WERR) gain over the baseline speaker-independent (SI) model.

Research paper thumbnail of Modeling the effect of temperature on the hydration kinetic whole moong grain

Journal of the Saudi Society of Agricultural Sciences, 2016

Research paper thumbnail of Spectral-domain optical coherence tomography features in fellow eyes of patients with idiopathic macular hole

European journal of ophthalmology

Purpose: To describe the vitreomacular interface and foveal structural changes in fellow eyes of ... more Purpose: To describe the vitreomacular interface and foveal structural changes in fellow eyes of patients with idiopathic macular holes using spectral-domain optical coherence tomography (SD-OCT). Methods: Retrospective analysis of consecutive medical records and SD-OCT images of the fellow eyes of patients with macular hole was done. Changes of the vitreoretinal interface and foveal structures on SD-OCT scan of the 101 fellow eyes of 101 subjects with full-thickness macular hole were studied and compared with 101 eyes of 101 age-matched healthy subjects. Results: Sixty-four patients (57.65%) were female. Mean age at presentation was 60.44 ± 12.17 years. The best-corrected visual acuity (BCVA) in eyes with macular hole was 0.86 logMAR units and in fellow eyes was 0.41 logMAR units. Seven eyes had macular hole in the fellow eye at the time of presentation. The majority of the fellow eyes (87/101, 78.37%) were phakic. The average base diameter of macular hole was 1105 ± 451.63 µm. Inc...

Research paper thumbnail of Predicting speech recognition confidence using deep learning with word identity and score features

2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013

Research paper thumbnail of Binaural and Multiple-Microphone Signal Processing Motivated by Auditory Perception

2008 Hands-Free Speech Communication and Microphone Arrays, 2008

Research paper thumbnail of Maximizing Audio Event Detection Model Performance on Small Datasets Through Knowledge Transfer, Data Augmentation, and Pretraining: an Ablation Study

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Research paper thumbnail of Modified surgical technique for lamellar macular holes with lamellar hole-associated epiretinal proliferation (LHEP)

International Ophthalmology

To evaluate the efficacy and safety of lamellar hole-associated epiretinal proliferation embeddin... more To evaluate the efficacy and safety of lamellar hole-associated epiretinal proliferation embedding technique with modification in the surgical management of degenerative lamellar macular hole (LMH). There is retrospective case series of consecutive eyes who underwent pars plana vitrectomy with LHEP embedding with internal limiting membrane (ILM) inversion technique for degenerative LMH. Primary outcome measure was improvement in foveal contour and central foveal thickness (CFT). Secondary outcome measures were changes in best corrected visual acuity (BCVA), status of outer retinal layers (external limiting membrane-ELM & ellipsoid zone-EZ) and complications. Ten eyes were operated by modified LHEP embedding technique. Mean age was 65.8 ± 5.3 years with 1:1 male to female ratio. Simultaneous cataract surgery was done in 70% cases. Mean follow-up duration was 7.9 ± 0.87 months. 80% (8/10) eyes had improvement in foveal contour to normal appearance with increase in residual foveal thickness from 90.2 ± 26.83 microns to CFT of 226 ± 35.44 microns at 6 months (p = 0.0054). Mean BCVA improved from 0.69 ± 0.19 logMAR to 0.32 ± 0.29 logMAR (p = 0.012). External limiting membrane (ELM) and ellipsoid zone (EZ) defects were present in four eyes (40%) pre-operatively. At the final visit 2 eyes (20%) had persistent defect in both ELM & EZ. None of the eyes progressed to full-thickness macular hole following surgery. The modified surgical technique of LHEP Embedding with ILM inversion is demonstrated to provide satisfactory results with reduced risk of complications for degenerative LMH. Larger and long-term follow-up studies are needed to establish this technique as standard surgical procedure for LMH with LHEP.

Research paper thumbnail of Word Characters and Phone Pronunciation Embedding for ASR Confidence Classifier

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019

Confidence classifier is an integral component of an automatic speech recognition (ASR) system. T... more Confidence classifier is an integral component of an automatic speech recognition (ASR) system. These classifiers predict the accuracy of an ASR hypothesis by associating a confidence score in [0,1] range, where larger score implies higher probability of the hypothesis being correct. Confidence scores have significant applications in ASR system design, training data selection, model adaptation, and other ASR applications. In this work we focus on word embedding features to improve confidence classifier, and introduce character and phone embeddings as confidence features. We motivate these features in the context of representing and factorizing acoustic scores along the proposed features. We evaluate our work on large scale ASR tasks, and demonstrate significant improvement in the confidence performance with the proposed features. At our typical operating point, we report 8% relative reduction in false alarm (FA) for limited vocabulary enUS Xbox task, and 9.9% relative reduction in FA for large vocabulary enUS server task. We also conducted server experiments for our proposed features in combination with natural language Glove embeddings, and improved the overall relative reduction in FA to 16%.

Research paper thumbnail of Speaker Adaptation for End-to-End CTC Models

2018 IEEE Spoken Language Technology Workshop (SLT), 2018

Research paper thumbnail of Noise Robust Speaker Identification Using Bhattacharya Distance In Adapted Gaussian Models Space

Publication in the conference proceedings of EUSIPCO, Lausanne, Switzerland, 2008

Research paper thumbnail of A Spectro-Temporal Framework for Compensation of Reverberation for Speech Recognition

The objective of this thesis is the development of signal processing and analysis techniques that... more The objective of this thesis is the development of signal processing and analysis techniques that would provide sharply improved speech recognition accuracy in highly reverberant environments. Speech is a natural medium of communication for humans, and in the last decade various speech technologies like automatic speech recognition (ASR), voice response systems etc. have considerably matured. The above systems rely on the clarity of the captured speech but many of the real-world environments include noise and reverberation that mitigate the system performance. The key focus of the thesis is on the robustness of ASR to reverberation. In our work, we first provide a new framework to adequately and efficiently represent the problem of reverberation in speech feature domains. Although our framework incurs modeling approximation errors, we believe that it provides a good basis for developing reverberation compensation algorithms. Based on our framework, we successfully develop a number o...

Research paper thumbnail of Design, development and statistical optimization of ginger peeling machine

Agricultural Engineering International: The CIGR Journal, 2018

The present research aims at developing a ginger peeling machine which can peel the outer skin of... more The present research aims at developing a ginger peeling machine which can peel the outer skin of ginger with less mass loss. Machine and product parameters for the developed ginger peeler were optimized. Fresh gingers with moisture content 87.47% and pre-treated with 1%NaOH solution exhibited highest peeling efficiency (70.20%), followed by hot-water soaking and overnight soaking. At constant moisture content, reverse trend was observed for mass loss. Highest mass loss of about 4.13% was seen with hot water soaked samples, followed by overnight soaking and NaOH treatment. Gingers with 87.47% moisture content and with pre-treatment at 1% NaOH solution exhibited maximum peeling efficiency. Keywords : Ginger, Peeling machine, Peeling efficiency, Pre-treatment.

Research paper thumbnail of Incidence of Paradoxical Neurosensory Detachment in Diabetic Eyes Undergoing Hemodialysis for End-Stage Renal Disease

Cureus, 2021

Introduction Ocular fluid dynamics are known to improve during hemodialysis, and the improvement ... more Introduction Ocular fluid dynamics are known to improve during hemodialysis, and the improvement of uremia after dialysis may lead to osmotic pressure changes in the retina, which eventually affect retinal edema. Recent studies using optical coherence tomography (OCT) to assess the effect of hemodialysis on macular thickness have shown variable results with a majority of them finding a decrease in retinal thickness. Paradoxical neurosensory retinal detachment (NSD) may be defined as the accumulation of subretinal fluid under the macula in patients who are on continuous HD. The purpose of the study was to find out the incidence of paradoxical neurosensory detachment in diabetic eyes undergoing hemodialysis (HD) and its management. Methods This was a cross-sectional, prospective study involving end-stage renal disease (ESRD) patients secondary to diabetes. This study evaluated the changes in macular thickness in diabetic retinopathy patients with and without diabetic macular edema (DM...

Research paper thumbnail of Signal separation for robust speech recognition based on phase difference information obtained in the frequency domain

Research paper thumbnail of Intermediate-layer DNN adaptation for offline and session-based iterative speaker adaptation

Interspeech 2015, 2015

In this work we present intermediate-layer deep neural network adaptation (DNN) techniques upon w... more In this work we present intermediate-layer deep neural network adaptation (DNN) techniques upon which we build offline as well as iterative speaker adaptation for online applications. We motivate our online work for task completion in Microsoft personal voice assistant, where we present different adaptation styles in a speech session e.g., (a) adapt the speakerindependent (SI) model on the current utterance, (b) recursively adapt an incremental speaker-dependent (SD) model in the session for just the previous utterance, (c) adapt the SI model for all past utterances in the session. We considered a number of adaptation techniques and demonstrated that the intermediatelayer approach with inserting-and-adapting a linear layer on top of an intermediate singular-value-decomposition layer provides the best results for offline adaptation, where we obtained respectively 22.6% and 12% relative reduction in word-errorrate (WER) for supervised and unsupervised adaptation on 100utterances. An alternative intermediate-layer recursive adaptation in a 5-utterances session provided 6% relative-reduction in WER for online applications.

Research paper thumbnail of Advances in application of ultrasound in food processing: A review

Ultrasonics Sonochemistry, 2021

Research paper thumbnail of Transfer Learning Approaches for Streaming End-to-End Speech Recognition System

Research paper thumbnail of Severe Vaso-Occlusive Retinopathy in Systemic Lupus Erythematosus: A Case Series

Research paper thumbnail of Intravitreal dexamethasone implant with retinal photocoagulation for adult-onset Coats' disease

International ophthalmology, Jan 29, 2018

To report a case of adult-onset Coats' disease with secondary retinal vasoproliferative tumor... more To report a case of adult-onset Coats' disease with secondary retinal vasoproliferative tumor managed with dexamethasone intravitreal implant and retinal photocoagulation. Case study. A 41-year-old female with counting finger vision was diagnosed with Coats' disease with secondary retinal vasoproliferative tumor in right eye. Fundus examination revealed exudative retinopathy involving posterior pole and a retinal tumor located in the inferotemporal quadrant. Optical coherence tomography scan confirmed massive exudative neurosensory detachment and fundus fluorescein angiography showed areas of telangiectatic vessels with capillary non-perfusion. Intravitreal injection of dexamethasone implant was done initially followed by laser photocoagulation when the detachment resolved. There was significant improvement in patient's visual acuity with no further recurrence of exudation. Intravitreal dexamethasone implant Ozurdex() (Allergan, Inc., Irvine, Calif., USA) may be an effec...

Research paper thumbnail of Novel continuous roasting of chickpea (Cicer arietinum): Study on physico-functional, antioxidant and roasting characteristics

Research paper thumbnail of Non-negative intermediate-layer DNN adaptation for a 10-KB speaker adaptation profile

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016

Previously we demonstrated that speaker adaptation of acoustic models (AM) can provide significan... more Previously we demonstrated that speaker adaptation of acoustic models (AM) can provide significant improvement in the accuracy of large-scale speech recognition systems. In this work we discuss numerous challenges in scaling speaker adaptation to millions of speakers, where the size of speaker-dependent (SD) parameters is a critical challenge. Subsequently, we formulate an intermediate-layer adaptation framework for adaptation, upon which we build a non-negative adaptation for a very sparse set of non-negative SD parameters. We further improve this work with, (a) non-negative adaptation with a small-positive threshold, (b) setting small-positive weights in an already trained non-negative model to zero. We also discuss effective methods to store the non-negative SD parameters. We show that our methods reduce the SD parameters from 86KB for our previous best adaptation approach to 8.8KB, thus about 90% relative reduction in the size of SD parameters, and still retain 10+% word-error-rate-relative (WERR) gain over the baseline speaker-independent (SI) model.

Research paper thumbnail of Modeling the effect of temperature on the hydration kinetic whole moong grain

Journal of the Saudi Society of Agricultural Sciences, 2016

Research paper thumbnail of Spectral-domain optical coherence tomography features in fellow eyes of patients with idiopathic macular hole

European journal of ophthalmology

Purpose: To describe the vitreomacular interface and foveal structural changes in fellow eyes of ... more Purpose: To describe the vitreomacular interface and foveal structural changes in fellow eyes of patients with idiopathic macular holes using spectral-domain optical coherence tomography (SD-OCT). Methods: Retrospective analysis of consecutive medical records and SD-OCT images of the fellow eyes of patients with macular hole was done. Changes of the vitreoretinal interface and foveal structures on SD-OCT scan of the 101 fellow eyes of 101 subjects with full-thickness macular hole were studied and compared with 101 eyes of 101 age-matched healthy subjects. Results: Sixty-four patients (57.65%) were female. Mean age at presentation was 60.44 ± 12.17 years. The best-corrected visual acuity (BCVA) in eyes with macular hole was 0.86 logMAR units and in fellow eyes was 0.41 logMAR units. Seven eyes had macular hole in the fellow eye at the time of presentation. The majority of the fellow eyes (87/101, 78.37%) were phakic. The average base diameter of macular hole was 1105 ± 451.63 µm. Inc...

Research paper thumbnail of Predicting speech recognition confidence using deep learning with word identity and score features

2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013

Research paper thumbnail of Binaural and Multiple-Microphone Signal Processing Motivated by Auditory Perception

2008 Hands-Free Speech Communication and Microphone Arrays, 2008