Kshitiz Kumar - Academia.edu (original) (raw)
Papers by Kshitiz Kumar
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
International Ophthalmology
To evaluate the efficacy and safety of lamellar hole-associated epiretinal proliferation embeddin... more To evaluate the efficacy and safety of lamellar hole-associated epiretinal proliferation embedding technique with modification in the surgical management of degenerative lamellar macular hole (LMH). There is retrospective case series of consecutive eyes who underwent pars plana vitrectomy with LHEP embedding with internal limiting membrane (ILM) inversion technique for degenerative LMH. Primary outcome measure was improvement in foveal contour and central foveal thickness (CFT). Secondary outcome measures were changes in best corrected visual acuity (BCVA), status of outer retinal layers (external limiting membrane-ELM & ellipsoid zone-EZ) and complications. Ten eyes were operated by modified LHEP embedding technique. Mean age was 65.8 ± 5.3 years with 1:1 male to female ratio. Simultaneous cataract surgery was done in 70% cases. Mean follow-up duration was 7.9 ± 0.87 months. 80% (8/10) eyes had improvement in foveal contour to normal appearance with increase in residual foveal thickness from 90.2 ± 26.83 microns to CFT of 226 ± 35.44 microns at 6 months (p = 0.0054). Mean BCVA improved from 0.69 ± 0.19 logMAR to 0.32 ± 0.29 logMAR (p = 0.012). External limiting membrane (ELM) and ellipsoid zone (EZ) defects were present in four eyes (40%) pre-operatively. At the final visit 2 eyes (20%) had persistent defect in both ELM & EZ. None of the eyes progressed to full-thickness macular hole following surgery. The modified surgical technique of LHEP Embedding with ILM inversion is demonstrated to provide satisfactory results with reduced risk of complications for degenerative LMH. Larger and long-term follow-up studies are needed to establish this technique as standard surgical procedure for LMH with LHEP.
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019
Confidence classifier is an integral component of an automatic speech recognition (ASR) system. T... more Confidence classifier is an integral component of an automatic speech recognition (ASR) system. These classifiers predict the accuracy of an ASR hypothesis by associating a confidence score in [0,1] range, where larger score implies higher probability of the hypothesis being correct. Confidence scores have significant applications in ASR system design, training data selection, model adaptation, and other ASR applications. In this work we focus on word embedding features to improve confidence classifier, and introduce character and phone embeddings as confidence features. We motivate these features in the context of representing and factorizing acoustic scores along the proposed features. We evaluate our work on large scale ASR tasks, and demonstrate significant improvement in the confidence performance with the proposed features. At our typical operating point, we report 8% relative reduction in false alarm (FA) for limited vocabulary enUS Xbox task, and 9.9% relative reduction in FA for large vocabulary enUS server task. We also conducted server experiments for our proposed features in combination with natural language Glove embeddings, and improved the overall relative reduction in FA to 16%.
2018 IEEE Spoken Language Technology Workshop (SLT), 2018
Publication in the conference proceedings of EUSIPCO, Lausanne, Switzerland, 2008
The objective of this thesis is the development of signal processing and analysis techniques that... more The objective of this thesis is the development of signal processing and analysis techniques that would provide sharply improved speech recognition accuracy in highly reverberant environments. Speech is a natural medium of communication for humans, and in the last decade various speech technologies like automatic speech recognition (ASR), voice response systems etc. have considerably matured. The above systems rely on the clarity of the captured speech but many of the real-world environments include noise and reverberation that mitigate the system performance. The key focus of the thesis is on the robustness of ASR to reverberation. In our work, we first provide a new framework to adequately and efficiently represent the problem of reverberation in speech feature domains. Although our framework incurs modeling approximation errors, we believe that it provides a good basis for developing reverberation compensation algorithms. Based on our framework, we successfully develop a number o...
Agricultural Engineering International: The CIGR Journal, 2018
The present research aims at developing a ginger peeling machine which can peel the outer skin of... more The present research aims at developing a ginger peeling machine which can peel the outer skin of ginger with less mass loss. Machine and product parameters for the developed ginger peeler were optimized. Fresh gingers with moisture content 87.47% and pre-treated with 1%NaOH solution exhibited highest peeling efficiency (70.20%), followed by hot-water soaking and overnight soaking. At constant moisture content, reverse trend was observed for mass loss. Highest mass loss of about 4.13% was seen with hot water soaked samples, followed by overnight soaking and NaOH treatment. Gingers with 87.47% moisture content and with pre-treatment at 1% NaOH solution exhibited maximum peeling efficiency. Keywords : Ginger, Peeling machine, Peeling efficiency, Pre-treatment.
Cureus, 2021
Introduction Ocular fluid dynamics are known to improve during hemodialysis, and the improvement ... more Introduction Ocular fluid dynamics are known to improve during hemodialysis, and the improvement of uremia after dialysis may lead to osmotic pressure changes in the retina, which eventually affect retinal edema. Recent studies using optical coherence tomography (OCT) to assess the effect of hemodialysis on macular thickness have shown variable results with a majority of them finding a decrease in retinal thickness. Paradoxical neurosensory retinal detachment (NSD) may be defined as the accumulation of subretinal fluid under the macula in patients who are on continuous HD. The purpose of the study was to find out the incidence of paradoxical neurosensory detachment in diabetic eyes undergoing hemodialysis (HD) and its management. Methods This was a cross-sectional, prospective study involving end-stage renal disease (ESRD) patients secondary to diabetes. This study evaluated the changes in macular thickness in diabetic retinopathy patients with and without diabetic macular edema (DM...
Interspeech 2015, 2015
In this work we present intermediate-layer deep neural network adaptation (DNN) techniques upon w... more In this work we present intermediate-layer deep neural network adaptation (DNN) techniques upon which we build offline as well as iterative speaker adaptation for online applications. We motivate our online work for task completion in Microsoft personal voice assistant, where we present different adaptation styles in a speech session e.g., (a) adapt the speakerindependent (SI) model on the current utterance, (b) recursively adapt an incremental speaker-dependent (SD) model in the session for just the previous utterance, (c) adapt the SI model for all past utterances in the session. We considered a number of adaptation techniques and demonstrated that the intermediatelayer approach with inserting-and-adapting a linear layer on top of an intermediate singular-value-decomposition layer provides the best results for offline adaptation, where we obtained respectively 22.6% and 12% relative reduction in word-errorrate (WER) for supervised and unsupervised adaptation on 100utterances. An alternative intermediate-layer recursive adaptation in a 5-utterances session provided 6% relative-reduction in WER for online applications.
Ultrasonics Sonochemistry, 2021
International ophthalmology, Jan 29, 2018
To report a case of adult-onset Coats' disease with secondary retinal vasoproliferative tumor... more To report a case of adult-onset Coats' disease with secondary retinal vasoproliferative tumor managed with dexamethasone intravitreal implant and retinal photocoagulation. Case study. A 41-year-old female with counting finger vision was diagnosed with Coats' disease with secondary retinal vasoproliferative tumor in right eye. Fundus examination revealed exudative retinopathy involving posterior pole and a retinal tumor located in the inferotemporal quadrant. Optical coherence tomography scan confirmed massive exudative neurosensory detachment and fundus fluorescein angiography showed areas of telangiectatic vessels with capillary non-perfusion. Intravitreal injection of dexamethasone implant was done initially followed by laser photocoagulation when the detachment resolved. There was significant improvement in patient's visual acuity with no further recurrence of exudation. Intravitreal dexamethasone implant Ozurdex() (Allergan, Inc., Irvine, Calif., USA) may be an effec...
2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016
Previously we demonstrated that speaker adaptation of acoustic models (AM) can provide significan... more Previously we demonstrated that speaker adaptation of acoustic models (AM) can provide significant improvement in the accuracy of large-scale speech recognition systems. In this work we discuss numerous challenges in scaling speaker adaptation to millions of speakers, where the size of speaker-dependent (SD) parameters is a critical challenge. Subsequently, we formulate an intermediate-layer adaptation framework for adaptation, upon which we build a non-negative adaptation for a very sparse set of non-negative SD parameters. We further improve this work with, (a) non-negative adaptation with a small-positive threshold, (b) setting small-positive weights in an already trained non-negative model to zero. We also discuss effective methods to store the non-negative SD parameters. We show that our methods reduce the SD parameters from 86KB for our previous best adaptation approach to 8.8KB, thus about 90% relative reduction in the size of SD parameters, and still retain 10+% word-error-rate-relative (WERR) gain over the baseline speaker-independent (SI) model.
Journal of the Saudi Society of Agricultural Sciences, 2016
European journal of ophthalmology
Purpose: To describe the vitreomacular interface and foveal structural changes in fellow eyes of ... more Purpose: To describe the vitreomacular interface and foveal structural changes in fellow eyes of patients with idiopathic macular holes using spectral-domain optical coherence tomography (SD-OCT). Methods: Retrospective analysis of consecutive medical records and SD-OCT images of the fellow eyes of patients with macular hole was done. Changes of the vitreoretinal interface and foveal structures on SD-OCT scan of the 101 fellow eyes of 101 subjects with full-thickness macular hole were studied and compared with 101 eyes of 101 age-matched healthy subjects. Results: Sixty-four patients (57.65%) were female. Mean age at presentation was 60.44 ± 12.17 years. The best-corrected visual acuity (BCVA) in eyes with macular hole was 0.86 logMAR units and in fellow eyes was 0.41 logMAR units. Seven eyes had macular hole in the fellow eye at the time of presentation. The majority of the fellow eyes (87/101, 78.37%) were phakic. The average base diameter of macular hole was 1105 ± 451.63 µm. Inc...
2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013
2008 Hands-Free Speech Communication and Microphone Arrays, 2008
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
International Ophthalmology
To evaluate the efficacy and safety of lamellar hole-associated epiretinal proliferation embeddin... more To evaluate the efficacy and safety of lamellar hole-associated epiretinal proliferation embedding technique with modification in the surgical management of degenerative lamellar macular hole (LMH). There is retrospective case series of consecutive eyes who underwent pars plana vitrectomy with LHEP embedding with internal limiting membrane (ILM) inversion technique for degenerative LMH. Primary outcome measure was improvement in foveal contour and central foveal thickness (CFT). Secondary outcome measures were changes in best corrected visual acuity (BCVA), status of outer retinal layers (external limiting membrane-ELM & ellipsoid zone-EZ) and complications. Ten eyes were operated by modified LHEP embedding technique. Mean age was 65.8 ± 5.3 years with 1:1 male to female ratio. Simultaneous cataract surgery was done in 70% cases. Mean follow-up duration was 7.9 ± 0.87 months. 80% (8/10) eyes had improvement in foveal contour to normal appearance with increase in residual foveal thickness from 90.2 ± 26.83 microns to CFT of 226 ± 35.44 microns at 6 months (p = 0.0054). Mean BCVA improved from 0.69 ± 0.19 logMAR to 0.32 ± 0.29 logMAR (p = 0.012). External limiting membrane (ELM) and ellipsoid zone (EZ) defects were present in four eyes (40%) pre-operatively. At the final visit 2 eyes (20%) had persistent defect in both ELM & EZ. None of the eyes progressed to full-thickness macular hole following surgery. The modified surgical technique of LHEP Embedding with ILM inversion is demonstrated to provide satisfactory results with reduced risk of complications for degenerative LMH. Larger and long-term follow-up studies are needed to establish this technique as standard surgical procedure for LMH with LHEP.
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019
Confidence classifier is an integral component of an automatic speech recognition (ASR) system. T... more Confidence classifier is an integral component of an automatic speech recognition (ASR) system. These classifiers predict the accuracy of an ASR hypothesis by associating a confidence score in [0,1] range, where larger score implies higher probability of the hypothesis being correct. Confidence scores have significant applications in ASR system design, training data selection, model adaptation, and other ASR applications. In this work we focus on word embedding features to improve confidence classifier, and introduce character and phone embeddings as confidence features. We motivate these features in the context of representing and factorizing acoustic scores along the proposed features. We evaluate our work on large scale ASR tasks, and demonstrate significant improvement in the confidence performance with the proposed features. At our typical operating point, we report 8% relative reduction in false alarm (FA) for limited vocabulary enUS Xbox task, and 9.9% relative reduction in FA for large vocabulary enUS server task. We also conducted server experiments for our proposed features in combination with natural language Glove embeddings, and improved the overall relative reduction in FA to 16%.
2018 IEEE Spoken Language Technology Workshop (SLT), 2018
Publication in the conference proceedings of EUSIPCO, Lausanne, Switzerland, 2008
The objective of this thesis is the development of signal processing and analysis techniques that... more The objective of this thesis is the development of signal processing and analysis techniques that would provide sharply improved speech recognition accuracy in highly reverberant environments. Speech is a natural medium of communication for humans, and in the last decade various speech technologies like automatic speech recognition (ASR), voice response systems etc. have considerably matured. The above systems rely on the clarity of the captured speech but many of the real-world environments include noise and reverberation that mitigate the system performance. The key focus of the thesis is on the robustness of ASR to reverberation. In our work, we first provide a new framework to adequately and efficiently represent the problem of reverberation in speech feature domains. Although our framework incurs modeling approximation errors, we believe that it provides a good basis for developing reverberation compensation algorithms. Based on our framework, we successfully develop a number o...
Agricultural Engineering International: The CIGR Journal, 2018
The present research aims at developing a ginger peeling machine which can peel the outer skin of... more The present research aims at developing a ginger peeling machine which can peel the outer skin of ginger with less mass loss. Machine and product parameters for the developed ginger peeler were optimized. Fresh gingers with moisture content 87.47% and pre-treated with 1%NaOH solution exhibited highest peeling efficiency (70.20%), followed by hot-water soaking and overnight soaking. At constant moisture content, reverse trend was observed for mass loss. Highest mass loss of about 4.13% was seen with hot water soaked samples, followed by overnight soaking and NaOH treatment. Gingers with 87.47% moisture content and with pre-treatment at 1% NaOH solution exhibited maximum peeling efficiency. Keywords : Ginger, Peeling machine, Peeling efficiency, Pre-treatment.
Cureus, 2021
Introduction Ocular fluid dynamics are known to improve during hemodialysis, and the improvement ... more Introduction Ocular fluid dynamics are known to improve during hemodialysis, and the improvement of uremia after dialysis may lead to osmotic pressure changes in the retina, which eventually affect retinal edema. Recent studies using optical coherence tomography (OCT) to assess the effect of hemodialysis on macular thickness have shown variable results with a majority of them finding a decrease in retinal thickness. Paradoxical neurosensory retinal detachment (NSD) may be defined as the accumulation of subretinal fluid under the macula in patients who are on continuous HD. The purpose of the study was to find out the incidence of paradoxical neurosensory detachment in diabetic eyes undergoing hemodialysis (HD) and its management. Methods This was a cross-sectional, prospective study involving end-stage renal disease (ESRD) patients secondary to diabetes. This study evaluated the changes in macular thickness in diabetic retinopathy patients with and without diabetic macular edema (DM...
Interspeech 2015, 2015
In this work we present intermediate-layer deep neural network adaptation (DNN) techniques upon w... more In this work we present intermediate-layer deep neural network adaptation (DNN) techniques upon which we build offline as well as iterative speaker adaptation for online applications. We motivate our online work for task completion in Microsoft personal voice assistant, where we present different adaptation styles in a speech session e.g., (a) adapt the speakerindependent (SI) model on the current utterance, (b) recursively adapt an incremental speaker-dependent (SD) model in the session for just the previous utterance, (c) adapt the SI model for all past utterances in the session. We considered a number of adaptation techniques and demonstrated that the intermediatelayer approach with inserting-and-adapting a linear layer on top of an intermediate singular-value-decomposition layer provides the best results for offline adaptation, where we obtained respectively 22.6% and 12% relative reduction in word-errorrate (WER) for supervised and unsupervised adaptation on 100utterances. An alternative intermediate-layer recursive adaptation in a 5-utterances session provided 6% relative-reduction in WER for online applications.
Ultrasonics Sonochemistry, 2021
International ophthalmology, Jan 29, 2018
To report a case of adult-onset Coats' disease with secondary retinal vasoproliferative tumor... more To report a case of adult-onset Coats' disease with secondary retinal vasoproliferative tumor managed with dexamethasone intravitreal implant and retinal photocoagulation. Case study. A 41-year-old female with counting finger vision was diagnosed with Coats' disease with secondary retinal vasoproliferative tumor in right eye. Fundus examination revealed exudative retinopathy involving posterior pole and a retinal tumor located in the inferotemporal quadrant. Optical coherence tomography scan confirmed massive exudative neurosensory detachment and fundus fluorescein angiography showed areas of telangiectatic vessels with capillary non-perfusion. Intravitreal injection of dexamethasone implant was done initially followed by laser photocoagulation when the detachment resolved. There was significant improvement in patient's visual acuity with no further recurrence of exudation. Intravitreal dexamethasone implant Ozurdex() (Allergan, Inc., Irvine, Calif., USA) may be an effec...
2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016
Previously we demonstrated that speaker adaptation of acoustic models (AM) can provide significan... more Previously we demonstrated that speaker adaptation of acoustic models (AM) can provide significant improvement in the accuracy of large-scale speech recognition systems. In this work we discuss numerous challenges in scaling speaker adaptation to millions of speakers, where the size of speaker-dependent (SD) parameters is a critical challenge. Subsequently, we formulate an intermediate-layer adaptation framework for adaptation, upon which we build a non-negative adaptation for a very sparse set of non-negative SD parameters. We further improve this work with, (a) non-negative adaptation with a small-positive threshold, (b) setting small-positive weights in an already trained non-negative model to zero. We also discuss effective methods to store the non-negative SD parameters. We show that our methods reduce the SD parameters from 86KB for our previous best adaptation approach to 8.8KB, thus about 90% relative reduction in the size of SD parameters, and still retain 10+% word-error-rate-relative (WERR) gain over the baseline speaker-independent (SI) model.
Journal of the Saudi Society of Agricultural Sciences, 2016
European journal of ophthalmology
Purpose: To describe the vitreomacular interface and foveal structural changes in fellow eyes of ... more Purpose: To describe the vitreomacular interface and foveal structural changes in fellow eyes of patients with idiopathic macular holes using spectral-domain optical coherence tomography (SD-OCT). Methods: Retrospective analysis of consecutive medical records and SD-OCT images of the fellow eyes of patients with macular hole was done. Changes of the vitreoretinal interface and foveal structures on SD-OCT scan of the 101 fellow eyes of 101 subjects with full-thickness macular hole were studied and compared with 101 eyes of 101 age-matched healthy subjects. Results: Sixty-four patients (57.65%) were female. Mean age at presentation was 60.44 ± 12.17 years. The best-corrected visual acuity (BCVA) in eyes with macular hole was 0.86 logMAR units and in fellow eyes was 0.41 logMAR units. Seven eyes had macular hole in the fellow eye at the time of presentation. The majority of the fellow eyes (87/101, 78.37%) were phakic. The average base diameter of macular hole was 1105 ± 451.63 µm. Inc...
2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013
2008 Hands-Free Speech Communication and Microphone Arrays, 2008