Sushant Hiray | IIT Bombay (original) (raw)
Thesis Chapters by Sushant Hiray
We have designed a monitoring engine that will generate various performance metric by analyzing... more We have designed a monitoring engine that will generate various performance metric by analyzing off-line logs. In this report we will see various application layer metric obtained by our monitoring engine and analyze them.
For now, our monitoring engine is only applicable to one of the VNF prototype, namely, Project Clearwater. We will be building a software-based DPI tool, providing advanced traffic analysis and extensive reporting that will be applicable to any VNF prototype.
Overview of application of Principal Component Analysis and Prinicipal Geodesic Analysis in Shape... more Overview of application of Principal Component Analysis and Prinicipal Geodesic Analysis in Shape Analysis.
Talks by Sushant Hiray
Talk presented on steps taken in building an offline SIP monitoring tool. Code can be found at: ... more Talk presented on steps taken in building an offline SIP monitoring tool.
Code can be found at: https://github.com/sushant-hiray/sip-dpi
Seminar presenting the usage of Kalman filters in robot localization.
Introduction to Open Source and GSoC
Papers by Sushant Hiray
arXiv (Cornell University), Aug 21, 2017
The paper describes experiments on estimating emotion intensity in tweets using a generalized reg... more The paper describes experiments on estimating emotion intensity in tweets using a generalized regressor system. The system combines lexical, syntactic and pretrained word embedding features, trains them on general regressors and finally combines the best performing models to create an ensemble. The proposed system stood 3 rd out of 22 systems in the leaderboard of WASSA-2017 Shared Task on Emotion Intensity.
FIRE (Working Notes), 2017
Native Language Identification has played an important role in forensics primarily for author pro... more Native Language Identification has played an important role in forensics primarily for author profiling and identification. In this work, we discuss our approach to the shared task of Indian Language Identification. The task is primarily to identify the native language of the writer from the given XML file which contains a set of Facebook comments in the English language. We propose a hierarchical ensemble approach which combines various machine learning techniques along with language agnostic feature extraction to perform the final classification. Our hierarchical ensemble improves the TF-IDF based baseline accuracy by 3.9%. The proposed system stood 3 rd across unique team submissions..
arXiv (Cornell University), Aug 18, 2017
This paper presents models for detecting agreement/disagreement in online discussions. In this wo... more This paper presents models for detecting agreement/disagreement in online discussions. In this work we show that by using a Siamese inspired architecture to encode the discussions, we no longer need to rely on hand-crafted features to exploit the meta thread structure. We evaluate our model on existing online discussion corpora ABCD, IAC and AWTP. Experimental results on ABCD dataset show that by fusing lexical and word embedding features, our model achieves the state of the art performance of 0.804 average F1 score. We also show that the model trained on ABCD dataset performs competitively on relatively smaller annotated datasets (IAC and AWTP).
arXiv (Cornell University), Aug 19, 2017
Deep neural networks (DNNs) have recently achieved great success in a multitude of classification... more Deep neural networks (DNNs) have recently achieved great success in a multitude of classification tasks. Ensembles of DNNs have been shown to improve the performance. In this paper, we explore the recent state-of-the-art DNNs used for image classification. We modified these DNNs and applied them to the task of acoustic scene classification. We conducted a number of experiments on the TUT Acoustic Scenes 2017 dataset to empirically compare these methods. Finally, we show that the best model improves the baseline score for DCASE-2017 Task 1 by 3.1% in the test set and by 10% in the development set.
arXiv (Cornell University), Apr 17, 2018
The paper describes the best performing system for the SemEval-2018 Affect in Tweets (English) su... more The paper describes the best performing system for the SemEval-2018 Affect in Tweets (English) sub-tasks. The system focuses on the ordinal classification and regression sub-tasks for valence and emotion. For ordinal classification valence is classified into 7 different classes ranging from-3 to 3 whereas emotion is classified into 4 different classes 0 to 3 separately for each emotion namely anger, fear, joy and sadness. The regression sub-tasks estimate the intensity of valence and each emotion. The system performs domain adaptation of 4 different models and creates an ensemble to give the final prediction. The proposed system achieved 1 st position out of 75 teams which participated in the fore-mentioned subtasks. We outperform the baseline model by margins ranging from 49.2% to 76.4 %, thus, pushing the state-of-the-art significantly.
arXiv (Cornell University), Jun 1, 2023
This work focuses on improving the Spoken Language Identification (LangId) system for a challenge... more This work focuses on improving the Spoken Language Identification (LangId) system for a challenge that focuses on developing robust language identification systems that are reliable for non-standard, accented (Singaporean accent), spontaneous code-switched, and child-directed speech collected via Zoom. We propose a two-stage Encoder-Decoder-based E2E model. The encoder module consists of 1D depth-wise separable convolutions with Squeeze-and-Excitation (SE) layers with a global context. The decoder module uses an attentive temporal pooling mechanism to get fixed length time-independent feature representation. The total number of parameters in the model is around 22.1 M, which is relatively light compared to using some large-scale pre-trained speech models. We achieved an EER of 15.6% in the closed track and 11.1% in the open track (baseline system 22.1%). We also curated additional LangId data from YouTube videos (having Singaporean speakers), which will be released for public use.
Handle large datasets which cannot be loaded into memory Tensorflow Dataset API usage for input p... more Handle large datasets which cannot be loaded into memory Tensorflow Dataset API usage for input pipeline TfRecords integration with Keras model training Tensorboard visualization for monitoring Logs and Model check-pointing Multiprocessing or Multithreading support whenever possible
2017 Seventh International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), 2017
This paper presents models for detecting agreement/disagreement in online discussions. In this wo... more This paper presents models for detecting agreement/disagreement in online discussions. In this work we show that by using a Siamese inspired architecture to encode the discussions, we no longer need to rely on hand-crafted features to exploit the meta thread structure. We evaluate our model on existing online discussion corpora ABCD, IAC and AWTP. Experimental results on ABCD dataset show that by fusing lexical and word embedding features, our model achieves the state of the art performance of 0.804 average F1 score. We also show that the model trained on ABCD dataset performs competitively on relatively smaller annotated datasets (IAC and AWTP).
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 2017
The paper describes experiments on estimating emotion intensity in tweets using a generalized reg... more The paper describes experiments on estimating emotion intensity in tweets using a generalized regressor system. The system combines lexical, syntactic and pretrained word embedding features, trains them on general regressors and finally combines the best performing models to create an ensemble. The proposed system stood 3 rd out of 22 systems in the leaderboard of WASSA-2017 Shared Task on Emotion Intensity.
Proceedings of The 12th International Workshop on Semantic Evaluation, 2018
The paper describes the best performing system for the SemEval-2018 Affect in Tweets (English) su... more The paper describes the best performing system for the SemEval-2018 Affect in Tweets (English) sub-tasks. The system focuses on the ordinal classification and regression sub-tasks for valence and emotion. For ordinal classification valence is classified into 7 different classes ranging from-3 to 3 whereas emotion is classified into 4 different classes 0 to 3 separately for each emotion namely anger, fear, joy and sadness. The regression sub-tasks estimate the intensity of valence and each emotion. The system performs domain adaptation of 4 different models and creates an ensemble to give the final prediction. The proposed system achieved 1 st position out of 75 teams which participated in the fore-mentioned subtasks. We outperform the baseline model by margins ranging from 49.2% to 76.4 %, thus, pushing the state-of-the-art significantly.
Forum for Information Retrieval Evaluation, 2017
Native Language Identification has played an important role in forensics primarily for author pro... more Native Language Identification has played an important role in forensics primarily for author profiling and identification. In this work, we discuss our approach to the shared task of Indian Language Identification. The task is primarily to identify the native language of the writer from the given XML file which contains a set of Facebook comments in the English language. We propose a hierarchical ensemble approach which combines various machine learning techniques along with language agnostic feature extraction to perform the final classification. Our hierarchical ensemble improves the TF-IDF based baseline accuracy by 3.9%. The proposed system stood 3 rd across unique team submissions..
ArXiv, 2017
Deep neural networks (DNNs) have recently achieved great success in a multitude of classification... more Deep neural networks (DNNs) have recently achieved great success in a multitude of classification tasks. Ensembles of DNNs have been shown to improve the performance. In this paper, we explore the recent state-of-the-art DNNs used for image classification. We modified these DNNs and applied them to the task of acoustic scene classification. We conducted a number of experiments on the TUT Acoustic Scenes 2017 dataset to empirically compare these methods. Finally, we show that the best model improves the baseline score for DCASE-2017 Task 1 by 3.1% in the test set and by 10% in the development set.
Native Language Identification has played an important role in forensics primarily for author pro... more Native Language Identification has played an important role in forensics primarily for author profiling and identification. In this work, we discuss our approach to the shared task of Indian Language Identification. The task is primarily to identify the native language of the writer from the given XML file which contains a set of Facebook comments in the English language. We propose a hierarchical ensemble approach which combines various machine learning techniques along with language agnostic feature extraction to perform the final classification. Our hierarchical ensemble improves the TF-IDF based baseline accuracy by 3.9%. The proposed system stood 3rd across unique team submissions..
The paper describes the best performing system for the SemEval-2018 Affect in Tweets (English) su... more The paper describes the best performing system for the SemEval-2018 Affect in Tweets (English) sub-tasks. The system focuses on the ordinal classification and regression sub-tasks for valence and emotion. For ordinal classification valence is classified into 7 different classes ranging from-3 to 3 whereas emotion is classified into 4 different classes 0 to 3 separately for each emotion namely anger, fear, joy and sadness. The regression sub-tasks estimate the intensity of valence and each emotion. The system performs domain adaptation of 4 different models and creates an ensemble to give the final prediction. The proposed system achieved 1 st position out of 75 teams which participated in the fore-mentioned sub-tasks. We outperform the baseline model by margins ranging from 49.2% to 76.4%, thus, pushing the state-of-the-art significantly.
We have designed a monitoring engine that will generate various performance metric by analyzing... more We have designed a monitoring engine that will generate various performance metric by analyzing off-line logs. In this report we will see various application layer metric obtained by our monitoring engine and analyze them.
For now, our monitoring engine is only applicable to one of the VNF prototype, namely, Project Clearwater. We will be building a software-based DPI tool, providing advanced traffic analysis and extensive reporting that will be applicable to any VNF prototype.
Overview of application of Principal Component Analysis and Prinicipal Geodesic Analysis in Shape... more Overview of application of Principal Component Analysis and Prinicipal Geodesic Analysis in Shape Analysis.
Talk presented on steps taken in building an offline SIP monitoring tool. Code can be found at: ... more Talk presented on steps taken in building an offline SIP monitoring tool.
Code can be found at: https://github.com/sushant-hiray/sip-dpi
Seminar presenting the usage of Kalman filters in robot localization.
Introduction to Open Source and GSoC
arXiv (Cornell University), Aug 21, 2017
The paper describes experiments on estimating emotion intensity in tweets using a generalized reg... more The paper describes experiments on estimating emotion intensity in tweets using a generalized regressor system. The system combines lexical, syntactic and pretrained word embedding features, trains them on general regressors and finally combines the best performing models to create an ensemble. The proposed system stood 3 rd out of 22 systems in the leaderboard of WASSA-2017 Shared Task on Emotion Intensity.
FIRE (Working Notes), 2017
Native Language Identification has played an important role in forensics primarily for author pro... more Native Language Identification has played an important role in forensics primarily for author profiling and identification. In this work, we discuss our approach to the shared task of Indian Language Identification. The task is primarily to identify the native language of the writer from the given XML file which contains a set of Facebook comments in the English language. We propose a hierarchical ensemble approach which combines various machine learning techniques along with language agnostic feature extraction to perform the final classification. Our hierarchical ensemble improves the TF-IDF based baseline accuracy by 3.9%. The proposed system stood 3 rd across unique team submissions..
arXiv (Cornell University), Aug 18, 2017
This paper presents models for detecting agreement/disagreement in online discussions. In this wo... more This paper presents models for detecting agreement/disagreement in online discussions. In this work we show that by using a Siamese inspired architecture to encode the discussions, we no longer need to rely on hand-crafted features to exploit the meta thread structure. We evaluate our model on existing online discussion corpora ABCD, IAC and AWTP. Experimental results on ABCD dataset show that by fusing lexical and word embedding features, our model achieves the state of the art performance of 0.804 average F1 score. We also show that the model trained on ABCD dataset performs competitively on relatively smaller annotated datasets (IAC and AWTP).
arXiv (Cornell University), Aug 19, 2017
Deep neural networks (DNNs) have recently achieved great success in a multitude of classification... more Deep neural networks (DNNs) have recently achieved great success in a multitude of classification tasks. Ensembles of DNNs have been shown to improve the performance. In this paper, we explore the recent state-of-the-art DNNs used for image classification. We modified these DNNs and applied them to the task of acoustic scene classification. We conducted a number of experiments on the TUT Acoustic Scenes 2017 dataset to empirically compare these methods. Finally, we show that the best model improves the baseline score for DCASE-2017 Task 1 by 3.1% in the test set and by 10% in the development set.
arXiv (Cornell University), Apr 17, 2018
The paper describes the best performing system for the SemEval-2018 Affect in Tweets (English) su... more The paper describes the best performing system for the SemEval-2018 Affect in Tweets (English) sub-tasks. The system focuses on the ordinal classification and regression sub-tasks for valence and emotion. For ordinal classification valence is classified into 7 different classes ranging from-3 to 3 whereas emotion is classified into 4 different classes 0 to 3 separately for each emotion namely anger, fear, joy and sadness. The regression sub-tasks estimate the intensity of valence and each emotion. The system performs domain adaptation of 4 different models and creates an ensemble to give the final prediction. The proposed system achieved 1 st position out of 75 teams which participated in the fore-mentioned subtasks. We outperform the baseline model by margins ranging from 49.2% to 76.4 %, thus, pushing the state-of-the-art significantly.
arXiv (Cornell University), Jun 1, 2023
This work focuses on improving the Spoken Language Identification (LangId) system for a challenge... more This work focuses on improving the Spoken Language Identification (LangId) system for a challenge that focuses on developing robust language identification systems that are reliable for non-standard, accented (Singaporean accent), spontaneous code-switched, and child-directed speech collected via Zoom. We propose a two-stage Encoder-Decoder-based E2E model. The encoder module consists of 1D depth-wise separable convolutions with Squeeze-and-Excitation (SE) layers with a global context. The decoder module uses an attentive temporal pooling mechanism to get fixed length time-independent feature representation. The total number of parameters in the model is around 22.1 M, which is relatively light compared to using some large-scale pre-trained speech models. We achieved an EER of 15.6% in the closed track and 11.1% in the open track (baseline system 22.1%). We also curated additional LangId data from YouTube videos (having Singaporean speakers), which will be released for public use.
Handle large datasets which cannot be loaded into memory Tensorflow Dataset API usage for input p... more Handle large datasets which cannot be loaded into memory Tensorflow Dataset API usage for input pipeline TfRecords integration with Keras model training Tensorboard visualization for monitoring Logs and Model check-pointing Multiprocessing or Multithreading support whenever possible
2017 Seventh International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), 2017
This paper presents models for detecting agreement/disagreement in online discussions. In this wo... more This paper presents models for detecting agreement/disagreement in online discussions. In this work we show that by using a Siamese inspired architecture to encode the discussions, we no longer need to rely on hand-crafted features to exploit the meta thread structure. We evaluate our model on existing online discussion corpora ABCD, IAC and AWTP. Experimental results on ABCD dataset show that by fusing lexical and word embedding features, our model achieves the state of the art performance of 0.804 average F1 score. We also show that the model trained on ABCD dataset performs competitively on relatively smaller annotated datasets (IAC and AWTP).
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 2017
The paper describes experiments on estimating emotion intensity in tweets using a generalized reg... more The paper describes experiments on estimating emotion intensity in tweets using a generalized regressor system. The system combines lexical, syntactic and pretrained word embedding features, trains them on general regressors and finally combines the best performing models to create an ensemble. The proposed system stood 3 rd out of 22 systems in the leaderboard of WASSA-2017 Shared Task on Emotion Intensity.
Proceedings of The 12th International Workshop on Semantic Evaluation, 2018
The paper describes the best performing system for the SemEval-2018 Affect in Tweets (English) su... more The paper describes the best performing system for the SemEval-2018 Affect in Tweets (English) sub-tasks. The system focuses on the ordinal classification and regression sub-tasks for valence and emotion. For ordinal classification valence is classified into 7 different classes ranging from-3 to 3 whereas emotion is classified into 4 different classes 0 to 3 separately for each emotion namely anger, fear, joy and sadness. The regression sub-tasks estimate the intensity of valence and each emotion. The system performs domain adaptation of 4 different models and creates an ensemble to give the final prediction. The proposed system achieved 1 st position out of 75 teams which participated in the fore-mentioned subtasks. We outperform the baseline model by margins ranging from 49.2% to 76.4 %, thus, pushing the state-of-the-art significantly.
Forum for Information Retrieval Evaluation, 2017
Native Language Identification has played an important role in forensics primarily for author pro... more Native Language Identification has played an important role in forensics primarily for author profiling and identification. In this work, we discuss our approach to the shared task of Indian Language Identification. The task is primarily to identify the native language of the writer from the given XML file which contains a set of Facebook comments in the English language. We propose a hierarchical ensemble approach which combines various machine learning techniques along with language agnostic feature extraction to perform the final classification. Our hierarchical ensemble improves the TF-IDF based baseline accuracy by 3.9%. The proposed system stood 3 rd across unique team submissions..
ArXiv, 2017
Deep neural networks (DNNs) have recently achieved great success in a multitude of classification... more Deep neural networks (DNNs) have recently achieved great success in a multitude of classification tasks. Ensembles of DNNs have been shown to improve the performance. In this paper, we explore the recent state-of-the-art DNNs used for image classification. We modified these DNNs and applied them to the task of acoustic scene classification. We conducted a number of experiments on the TUT Acoustic Scenes 2017 dataset to empirically compare these methods. Finally, we show that the best model improves the baseline score for DCASE-2017 Task 1 by 3.1% in the test set and by 10% in the development set.
Native Language Identification has played an important role in forensics primarily for author pro... more Native Language Identification has played an important role in forensics primarily for author profiling and identification. In this work, we discuss our approach to the shared task of Indian Language Identification. The task is primarily to identify the native language of the writer from the given XML file which contains a set of Facebook comments in the English language. We propose a hierarchical ensemble approach which combines various machine learning techniques along with language agnostic feature extraction to perform the final classification. Our hierarchical ensemble improves the TF-IDF based baseline accuracy by 3.9%. The proposed system stood 3rd across unique team submissions..
The paper describes the best performing system for the SemEval-2018 Affect in Tweets (English) su... more The paper describes the best performing system for the SemEval-2018 Affect in Tweets (English) sub-tasks. The system focuses on the ordinal classification and regression sub-tasks for valence and emotion. For ordinal classification valence is classified into 7 different classes ranging from-3 to 3 whereas emotion is classified into 4 different classes 0 to 3 separately for each emotion namely anger, fear, joy and sadness. The regression sub-tasks estimate the intensity of valence and each emotion. The system performs domain adaptation of 4 different models and creates an ensemble to give the final prediction. The proposed system achieved 1 st position out of 75 teams which participated in the fore-mentioned sub-tasks. We outperform the baseline model by margins ranging from 49.2% to 76.4%, thus, pushing the state-of-the-art significantly.
Deep neural networks (DNNs) have recently achieved great success in a multitude of classification... more Deep neural networks (DNNs) have recently achieved great success in a multitude of classification tasks. Ensembles of DNNs have been shown to improve the performance. In this paper, we explore the recent state-of-the-art DNNs used for image classification. We modified these DNNs and applied them to the task of acoustic scene classification. We conducted a number of experiments on the TUT Acoustic Scenes 2017 dataset to empirically compare these methods. Finally, we show that the ensemble of these DNNs improves the baseline score for DCASE-2017 Task 1 by 10%.
Association of Computational Linguistics, Sep 2017
The paper describes experiments on estimating emotion intensity in tweets using a generalized reg... more The paper describes experiments on estimating emotion intensity in tweets using a generalized regressor system. The system combines lexical, syntactic and pre-trained word embedding features, trains them on general regressors and finally combines the best performing models to create an ensemble. The proposed system stood 3 rd out of 22 systems in the leaderboard of WASSA-2017 Shared Task on Emotion Intensity.
This paper presents models for detecting agree-ment/disagreement in online discussions. In this w... more This paper presents models for detecting agree-ment/disagreement in online discussions. In this work, we show that by using a Siamese inspired architecture to encode the discussions, we no longer need to rely on hand-crafted features to exploit the meta thread structure. We evaluate our model on existing online discussion corpora ABCD, IAC and AWTP. Experimental results on ABCD dataset show that by fusing lexical and word embedding features, our model achieves the state of the art performance of 0.804 average F1 score. We also show that the model trained on ABCD dataset performs competitively on relatively smaller annotated datasets (IAC and AWTP).