Shrimai Prabhumoye (original) (raw)

Senior Research Scientist at NVIDIA
Adjunct Professor at Boston University

I am a Senior Research Scientist with the Applied Deep Learning Research Group at Nvidia and an Adjunct Professor at Boston University. My research is dedicated to advancing the state-of-the-art in large language models (LLMs) by enhancing their reasoning capabilities and ensuring their safety through rigorous mitigation of toxicity and bias. As the lead contributor to the Nemotron family of models, I have worked extensively on data curation, pretraining, and scaling. My current work focuses on optimizing pretraining pipelines with an emphasis on data selection, blending, and ordering strategies to maximize downstream model accuracy. I am particularly focused on improving reasoning in LLMs, including generating synthetic data for advanced mathematical reasoning and enabling models to handle longer, more complex reasoning tasks that require deeper thought and understanding. My work has featured in many media outlets like VentureBeat, Forbes and TechCrunch.

Before that, I graduated with a PhD from School of Computer Science, Carnegie Mellon University. At CMU, I was fortunate to be advised by Prof. Alan W. Black and Prof. Ruslan Salakhutdinov. My thesis focused on controllable text generation with a focus on style, content and structure, as well as its ethical considerations. I co-designed the Computational Ethics for NLP course which was offered for the first time in Spring 2018 at CMU. I graduated with a Masters in Language Technologies in Aug 2017. During that time, I was leading the CMU Magnus team in the Amazon Alexa Prize competition. I completed my undergraduate at National Institute of Technology, Karnataka, India.

Publications

Rafal Kocielnik, Shrimai Prabhumoye, Vivian Zhang, R Michael Alvarez, Anima Anandkumar.
Published on arxiv, 2023.

23. Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

Shrimai Prabhumoye, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro.
In the proceedings of the European Association for Computational Linguitics (EACL) 2023.

22. Context Generation Improves Open Domain Question Answering

Dan Su, Mostofa Patwary, Shrimai Prabhumoye, Peng Xu, Ryan Prenger, Mohammad Shoeybi, Pascale Fung, Anima Anandkumar, Bryan Catanzaro.
In Findings of the European Association for Computational Linguitics (EACL) 2023.

21. Can You Label Less by Using Out-of-Domain Data? Active & Transfer Learning with Few-shot Instructions

Rafal Kocielnik, Sara Kangaslahti, Shrimai Prabhumoye, Meena Hari, Michael Alvarez, Anima Anandkumar.
In Transfer Learning for Natural Language Processing Workshop at NeurIPS 2022.

20. Evaluating Parameter Efficient Learning for Generation

Peng Xu, Mostofa Patwary, Shrimai Prabhumoye, Virginia Adams, Ryan J Prenger, Wei Ping, Nayeon Lee, Mohammad Shoeybi, Bryan Catanzaro.
In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2022.

19. Multi-Stage Prompting for Knowledgeable Dialogue Generation

Zihan Liu, Mostofa Patwary, Ryan Prenger, Shrimai Prabhumoye, Wei Ping, Mohammad Shoeybi, Bryan Catanzaro.
In Findings of the Association for Computational Linguistics (ACL) 2022.

18. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Shaden Smith*, Mostofa Patwary*, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zhang, Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, Bryan Catanzaro.
Published on arxiv, 2022.

Shrimai Prabhumoye*, Rafal Kocielnik*, Mohammad Shoeybi, Anima Anandkumar, Bryan Catanzaro.
Published on arxiv, 2023.

16. Five sources of bias in natural language processing

Dirk Hovy, Shrimai Prabhumoye.
Language and Linguistics Compass, 2021.

15. Focused Attention Improves Document-Grounded Generation

Shrimai Prabhumoye, Kazuma Hashimoto, Yingbo Zhou, Alan W Black, Ruslan Salakhutdinov.
In the proceedings of North America Chapter of Association of Computational Linguistics (NAACL) 2021.

14. Case Study: Deontological Ethics in NLP

Shrimai Prabhumoye*, Brendon Boldt*, Ruslan Salakhutdinov, Alan W Black.
In the proceedings of North America Chapter of Association of Computational Linguistics (NAACL) 2021.

13. Exploring Controllable Text Generation Techniques

Shrimai Prabhumoye, Alan W Black, Ruslan Salakhutdinov.
Proceedings of the 28th International Conference on Computational Linguistics (COLING) 2020.
Selected for oral presentation

12. Topological Sort for Sentence Ordering

Shrimai Prabhumoye, Ruslan Salakhutdinov, Alan W Black.
In the proceedings of Association for Computational Linguistics Conference (ACL) 2020.

11. Politeness Transfer: A Tag and Generate Approach

Aman Madaan*, Amrith Setlur*, Tanmay Parekh*, Barnabas Poczos, Graham Neubig,Yiming Yang,
Ruslan Salakhutdinov, Alan W Black, Shrimai Prabhumoye.
In the proceedings of Association for Computational Linguistics Conference (ACL) 2020.

10. I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents

Shrimai Prabhumoye*, Margaret Li*, Jack Urbanek, Emily Dinan, Douwe Kiela, Jason Weston, Arthur Szlam.
arXiv:2002.02878 [cs.AI]

9. Generating Interactive Worlds with Text

Angela Fan*, Jack Urbanek*, Pratik Ringshia, Emily Dinan, Emma Qian, Siddharth Karamcheti, Shrimai Prabhumoye,
Douwe Kiela, Tim Rocktaschel, Arthur Szlam, Jason Weston.
In the Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence.

8. Principled Frameworks for Evaluating Ethics in NLP Systems

Shrimai Prabhumoye, Elijah Mayfield, Alan W Black.
Widening NLP Workshop at ACL 2019.

7. "My Way of Telling a Story": Persona based Grounded Story Generation

Shrimai Prabhumoye*, Khyathi Chandu*, Ruslan Salakhutdinov, Alan W Black.
In the proceedings of Storytelling Workshop at ACL 2019.

6. Equity Beyond Bias in Language Technologies for Education

Elijah Mayfield, Michael Madaio, Shrimai Prabhumoye, David Gerritsen, Brittany McLaughlin,
Ezekiel Dixon-Román, Alan W Black.
In the Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications at ACL 2019.

5. Towards Content Transfer Through Grounded Text Generation

Shrimai Prabhumoye, Chris Quirk, Michel Galley
In the proceedings of North America Chapter of Association of Computational Linguistics (NAACL) 2019.
Selected for oral presentation

4. A Dataset for Document Grounded Conversations

Kangyan Zhou, Shrimai Prabhumoye, Alan W Black.
In the proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP) 2018.

3. Style Transfer Through Back-Translation

Shrimai Prabhumoye, Yulia Tsvetkov, Ruslan Salakhutdinov, Alan W Black.
In the proceedings of Association for Computational Linguistics Conference (ACL) 2018.
Selected for oral presentation

2. Linguistic Markers of Influence in Informal Interactions

Shrimai Prabhumoye*, Samridhi Choudhary*, Evangelia Spiliopoulou, Christopher Bogart, Carolyn Penstein Rose, Alan W Black.
In the proceedings of Workshop on NLP+CSS at ACL 2017.

1. Building CMU Magus from User Feedback

Shrimai Prabhumoye*, Fadi Botros*, Khyathi Chandu*, Samridhi Choudhary*, Esha Keni*, Chaitanya Malaviya*, Thomas Manzini*, Rama Pasumarthi*, Shivani Poddar*, Abhilasha Ravichander*, Zhou Yu, Alan Black
In the proceedings of Alexa Prize 2017.

Talks

Controllable Text Generation: Should machines reflect the way humans interact in society?

Deep Learning: Classics and Trends, Oct 2020.
Allen Institute for Artificial Intelligence (AI2), Aug 2020.
Salesforce, Jul 2020.
Montreal Institute for Learning Algorithms (Mila), Jul 2020.
Apple, Seattle, Jul 2020.
The LTI Summer Seminar, Jul 2020,

Controlling style, content and structure in Natural Language Generation

University of Massachusets Amherst, October 2019.
Google AI Research, NYC, June 2019.

Mentored Students

Politeness Transfer: A Tag and Generate Approach

This work introduces a new task of politeness transfer which involves converting non-polite sentences to polite sentences while preserving the meaning. We also provide a dataset of more than 1.39 million instances automatically labeled for politeness to encourage benchmark evaluations on this new task. We design a tag and generate pipeline that identifies stylistic attributes and subsequently generates a sentence in the target style while preserving most of the source content.

Associated Publication: Politeness Transfer: A Tag and Generate Approach at ACL 2020

Downstream tasks to evaluate style transfer

Mukul Bhutani

We know that downstream tasks are influenced by the demographic skew of training sets like the sentiment analysis task is affected by the gender confound and the part of speech (POS) tagging task is affected by the age confound. By building a generation engine that can preserve content while controlling for style, we can now produce demographically balanced datasets for these NLP tasks. We are also looking at using these downstream tasks to automatically evaluate style transfer models.

A Dataset for Document Grounded Conversations

Kangyan Zhou

This work introduces a document grounded dataset for conversations using Wikipedia articles on movies. The dataset contains 4112 conversations with an average of 21.43 turns per conversation. We describe two neural architectures that provide benchmark performance on the task of generating the next response.

Associated Publication: A Dataset for Document Grounded Conversations at EMNLP 2018

Teaching

Guest Lectures

Style Transfer

Machine Translation and Sequence-to-sequence Models
CS 11-731, Carnegie Mellon University, Fall 2018

Ethics in Conversational Agents

Computational Ethics in NLP
CS 11-830, Carnegie Mellon University, Spring 2018, Spring 2019 and Spring 2020

Chatbots

Speech Processing
CS 11-492 11-692 11-892, Carnegie Mellon University, Fall 2017, 2018, and Fall 2019

Neural Dialogue

Speech Processing
CS 11-492 11-692 11-892, Carnegie Mellon University, Fall 2017, 2018, and Fall 2019

Chatting with Computers Workshop
OurCS, Carnegie Mellon University, Fall 2017.

Teaching Assistant

Computational Ethics in NLP

CS 11-830, Carnegie Mellon University, Spring 2018

Speech Processing

CS 11-492 11-692 11-892, Carnegie Mellon University, Fall 2017