Shrimai Prabhumoye (original) (raw)
Senior Research Scientist at NVIDIA
Adjunct Professor at Boston University
I am a Senior Research Scientist with the Applied Deep Learning Research Group at Nvidia and an Adjunct Professor at Boston University. My research is dedicated to advancing the state-of-the-art in large language models (LLMs) by enhancing their reasoning capabilities and ensuring their safety through rigorous mitigation of toxicity and bias. As the lead contributor to the Nemotron family of models, I have worked extensively on data curation, pretraining, and scaling. My current work focuses on optimizing pretraining pipelines with an emphasis on data selection, blending, and ordering strategies to maximize downstream model accuracy. I am particularly focused on improving reasoning in LLMs, including generating synthetic data for advanced mathematical reasoning and enabling models to handle longer, more complex reasoning tasks that require deeper thought and understanding. My work has featured in many media outlets like VentureBeat, Forbes and TechCrunch.
Before that, I graduated with a PhD from School of Computer Science, Carnegie Mellon University. At CMU, I was fortunate to be advised by Prof. Alan W. Black and Prof. Ruslan Salakhutdinov. My thesis focused on controllable text generation with a focus on style, content and structure, as well as its ethical considerations. I co-designed the Computational Ethics for NLP course which was offered for the first time in Spring 2018 at CMU. I graduated with a Masters in Language Technologies in Aug 2017. During that time, I was leading the CMU Magnus team in the Amazon Alexa Prize competition. I completed my undergraduate at National Institute of Technology, Karnataka, India.
Publications
24. AutoBiasTest: Controllable Sentence Generation for Automated and Open-Ended Social Bias Testing in Language Models
Rafal Kocielnik, Shrimai Prabhumoye, Vivian Zhang, R Michael Alvarez, Anima Anandkumar.
Published on arxiv, 2023.
23. Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models
Shrimai Prabhumoye, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro.
In the proceedings of the European Association for Computational Linguitics (EACL) 2023.
22. Context Generation Improves Open Domain Question Answering
Dan Su, Mostofa Patwary, Shrimai Prabhumoye, Peng Xu, Ryan Prenger, Mohammad Shoeybi, Pascale Fung, Anima Anandkumar, Bryan Catanzaro.
In Findings of the European Association for Computational Linguitics (EACL) 2023.
21. Can You Label Less by Using Out-of-Domain Data? Active & Transfer Learning with Few-shot Instructions
Rafal Kocielnik, Sara Kangaslahti, Shrimai Prabhumoye, Meena Hari, Michael Alvarez, Anima Anandkumar.
In Transfer Learning for Natural Language Processing Workshop at NeurIPS 2022.
20. Evaluating Parameter Efficient Learning for Generation
Peng Xu, Mostofa Patwary, Shrimai Prabhumoye, Virginia Adams, Ryan J Prenger, Wei Ping, Nayeon Lee, Mohammad Shoeybi, Bryan Catanzaro.
In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2022.
19. Multi-Stage Prompting for Knowledgeable Dialogue Generation
Zihan Liu, Mostofa Patwary, Ryan Prenger, Shrimai Prabhumoye, Wei Ping, Mohammad Shoeybi, Bryan Catanzaro.
In Findings of the Association for Computational Linguistics (ACL) 2022.
18. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Shaden Smith*, Mostofa Patwary*, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zhang, Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, Bryan Catanzaro.
Published on arxiv, 2022.
17. Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases
Shrimai Prabhumoye*, Rafal Kocielnik*, Mohammad Shoeybi, Anima Anandkumar, Bryan Catanzaro.
Published on arxiv, 2023.
16. Five sources of bias in natural language processing
Dirk Hovy, Shrimai Prabhumoye.
Language and Linguistics Compass, 2021.
15. Focused Attention Improves Document-Grounded Generation
Shrimai Prabhumoye, Kazuma Hashimoto, Yingbo Zhou, Alan W Black, Ruslan Salakhutdinov.
In the proceedings of North America Chapter of Association of Computational Linguistics (NAACL) 2021.
14. Case Study: Deontological Ethics in NLP
Shrimai Prabhumoye*, Brendon Boldt*, Ruslan Salakhutdinov, Alan W Black.
In the proceedings of North America Chapter of Association of Computational Linguistics (NAACL) 2021.
13. Exploring Controllable Text Generation Techniques
Shrimai Prabhumoye, Alan W Black, Ruslan Salakhutdinov.
Proceedings of the 28th International Conference on Computational Linguistics (COLING) 2020.
Selected for oral presentation
12. Topological Sort for Sentence Ordering
Shrimai Prabhumoye, Ruslan Salakhutdinov, Alan W Black.
In the proceedings of Association for Computational Linguistics Conference (ACL) 2020.
11. Politeness Transfer: A Tag and Generate Approach
Aman Madaan*, Amrith Setlur*, Tanmay Parekh*, Barnabas Poczos, Graham Neubig,Yiming Yang,
Ruslan Salakhutdinov, Alan W Black, Shrimai Prabhumoye.
In the proceedings of Association for Computational Linguistics Conference (ACL) 2020.
10. I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents
Shrimai Prabhumoye*, Margaret Li*, Jack Urbanek, Emily Dinan, Douwe Kiela, Jason Weston, Arthur Szlam.
arXiv:2002.02878 [cs.AI]
9. Generating Interactive Worlds with Text
Angela Fan*, Jack Urbanek*, Pratik Ringshia, Emily Dinan, Emma Qian, Siddharth Karamcheti, Shrimai Prabhumoye,
Douwe Kiela, Tim Rocktaschel, Arthur Szlam, Jason Weston.
In the Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence.
8. Principled Frameworks for Evaluating Ethics in NLP Systems
Shrimai Prabhumoye, Elijah Mayfield, Alan W Black.
Widening NLP Workshop at ACL 2019.
7. "My Way of Telling a Story": Persona based Grounded Story Generation
Shrimai Prabhumoye*, Khyathi Chandu*, Ruslan Salakhutdinov, Alan W Black.
In the proceedings of Storytelling Workshop at ACL 2019.
6. Equity Beyond Bias in Language Technologies for Education
Elijah Mayfield, Michael Madaio, Shrimai Prabhumoye, David Gerritsen, Brittany McLaughlin,
Ezekiel Dixon-Román, Alan W Black.
In the Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications at ACL 2019.
5. Towards Content Transfer Through Grounded Text Generation
Shrimai Prabhumoye, Chris Quirk, Michel Galley
In the proceedings of North America Chapter of Association of Computational Linguistics (NAACL) 2019.
Selected for oral presentation
4. A Dataset for Document Grounded Conversations
Kangyan Zhou, Shrimai Prabhumoye, Alan W Black.
In the proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP) 2018.
3. Style Transfer Through Back-Translation
Shrimai Prabhumoye, Yulia Tsvetkov, Ruslan Salakhutdinov, Alan W Black.
In the proceedings of Association for Computational Linguistics Conference (ACL) 2018.
Selected for oral presentation
2. Linguistic Markers of Influence in Informal Interactions
Shrimai Prabhumoye*, Samridhi Choudhary*, Evangelia Spiliopoulou, Christopher Bogart, Carolyn Penstein Rose, Alan W Black.
In the proceedings of Workshop on NLP+CSS at ACL 2017.
1. Building CMU Magus from User Feedback
Shrimai Prabhumoye*, Fadi Botros*, Khyathi Chandu*, Samridhi Choudhary*, Esha Keni*, Chaitanya Malaviya*, Thomas Manzini*, Rama Pasumarthi*, Shivani Poddar*, Abhilasha Ravichander*, Zhou Yu, Alan Black
In the proceedings of Alexa Prize 2017.
Talks
Controllable Text Generation: Should machines reflect the way humans interact in society?
Deep Learning: Classics and Trends, Oct 2020.
Allen Institute for Artificial Intelligence (AI2), Aug 2020.
Salesforce, Jul 2020.
Montreal Institute for Learning Algorithms (Mila), Jul 2020.
Apple, Seattle, Jul 2020.
The LTI Summer Seminar, Jul 2020,
Controlling style, content and structure in Natural Language Generation
University of Massachusets Amherst, October 2019.
Google AI Research, NYC, June 2019.
Mentored Students
Politeness Transfer: A Tag and Generate Approach
This work introduces a new task of politeness transfer which involves converting non-polite sentences to polite sentences while preserving the meaning. We also provide a dataset of more than 1.39 million instances automatically labeled for politeness to encourage benchmark evaluations on this new task. We design a tag and generate pipeline that identifies stylistic attributes and subsequently generates a sentence in the target style while preserving most of the source content.
Associated Publication: Politeness Transfer: A Tag and Generate Approach at ACL 2020
Downstream tasks to evaluate style transfer
Mukul Bhutani
We know that downstream tasks are influenced by the demographic skew of training sets like the sentiment analysis task is affected by the gender confound and the part of speech (POS) tagging task is affected by the age confound. By building a generation engine that can preserve content while controlling for style, we can now produce demographically balanced datasets for these NLP tasks. We are also looking at using these downstream tasks to automatically evaluate style transfer models.
A Dataset for Document Grounded Conversations
Kangyan Zhou
This work introduces a document grounded dataset for conversations using Wikipedia articles on movies. The dataset contains 4112 conversations with an average of 21.43 turns per conversation. We describe two neural architectures that provide benchmark performance on the task of generating the next response.
Associated Publication: A Dataset for Document Grounded Conversations at EMNLP 2018
Teaching
Guest Lectures
Style Transfer
Machine Translation and Sequence-to-sequence Models
CS 11-731, Carnegie Mellon University, Fall 2018
Ethics in Conversational Agents
Computational Ethics in NLP
CS 11-830, Carnegie Mellon University, Spring 2018, Spring 2019 and Spring 2020
Chatbots
Speech Processing
CS 11-492 11-692 11-892, Carnegie Mellon University, Fall 2017, 2018, and Fall 2019
Neural Dialogue
Speech Processing
CS 11-492 11-692 11-892, Carnegie Mellon University, Fall 2017, 2018, and Fall 2019
Speech Processing
CS 11-492 11-692 11-892, Carnegie Mellon University, Fall 2017, 2018, and Fall 2019
Chatting with Computers Workshop
OurCS, Carnegie Mellon University, Fall 2017.
Teaching Assistant
Computational Ethics in NLP
CS 11-830, Carnegie Mellon University, Spring 2018
Speech Processing
CS 11-492 11-692 11-892, Carnegie Mellon University, Fall 2017