Alon's Home Page (original) (raw)

	Dr. Alon Lavie Consulting Professor Language Technologies Institute School of Computer Science Carnegie Mellon University Pittsburgh, PA USA Email: alavie AT cs DOT cmu DOT edu (anti-spam notation).

I am currently a Consulting (adjunct) Professor at the Language Technologies Institute (LTI) at Carnegie Mellon University (CMU), where I have been a member of the faculty since 1996. For almost 20 years (1996-2015) I was a Research Professor at the LTI.

Concurrently, I am currently the VP of AI Research atPhrase, a leading enterprise translation automation technology platform, where I lead and manage our AI research team in Pittsburgh, Prague and Edinburgh, and provide strategic leadership on our AI R&D and product development company-wide. Prior to joining Phrase in August 2023, I was the VP of Language Technologies atUnbabel, with leadership responsibilities for AI R&D company-wide, and with a focus on the development of Translation Quality Technologies.

My primary research interests and activities focus on Machine Translation (MT) and on MT Evaluation. I directed and led the ten year development (2004-2014) of theMETEOR automated MT evaluation metric. More recently, while at Unbabel, I directed the development of a new neural MT evaluation metric namedCOMET, and a complementary tool for MT quality analysis namedMT-Telescope. My other main research interests focus on MT adaptation approaches with and without human feedback, applied to both high-resource language pairs as well as low-resource and minority languages. Additional interests include translation Quality Estimation, and methods for multi-engine MT system combination.

In 2009, I co-founded a technology start-up company by the name ofSafaba Translation Solutions, and I served the company as Chairman of the Board, President and CTO. Safaba developed automated translation solutions for large global enterprises that allowed them to translate large volumes of content in all the languages of their markets. Safaba's approach focused on generating client-adapted high-quality translations using machine-learning-based technology. In June 2015, Safaba was acquired by Amazon.

From June 2015 to March 2019, I was a senior manager at Amazon, where I led and managed the Amazon Machine Translation R&D group in Pittsburgh.

I served as President of the International Association for Machine Translation (IAMT) (2013-2015). I previously served two terms as president of the Association for Machine Translation in the Americas (AMTA) (2008-2012), and was General Chair of the AMTA 2010 and 2012 conferences, and of the 2015 MT Summit conference. I am also a member of the Association for Computational Linguistics (ACL), where I was president of SIGParse - ACL's special interest group on parsing (2008-2013).

In August 2021, at the 18th biennial Machine Translation Summit conference, I was honored to be awarded with the 2021 Makoto Nagao IAMT Award of Honour for my contributions to the field of Machine Translation.

Research

My main areas of research are Machine Translation (MT) and Natural Language Processing (NLP), and in particular, NLP technologies applied to language translation and multi-lingual processing problems. My current most active areas of research focus are Machine Translation adaptation approaches with human feedback and syntax-driven statistical and hybrid approaches to Machine Translation, applied to both high-resource language pairs as well as low-resource and minority languages. One main focus of work has been the development of novel syntax-based methods for acquisition of the resources that are necessary for MT. I have also actively worked on frameworks for Multi-Engine Machine Translation (MEMT) and on developing automatic metrics for MT evaluation (particularly, METEOR). I have also worked extensively in the past on developing parsing approaches for accurate annotation of Grammatical Relations (GRs)in spoken language data, on robust parsing algorithms for analysis of spoken language, and on the design and development of Speech-to-Speech Machine Translation systems.

Select Research Projects:

The AVENUE and LETRAS Projects:

I was co-PI of the AVENUE and LETRAS projects (funded by NSF). AVENUE is concerned with the design and rapid development of new Machine Translation methods for languages for which only scarce resources are available. Our goal in AVENUE is to apply these new MT methods to minority languages, with a specific focus on native languages of North and Latin America. We worked on developing MT systems between Spanish and Mapudungun, a native language spoken in southern Chile, and have started working on Quechua, a native language spoken mainly in Peru, Ecuador and Bolivia. The LETRAS project is a follow-on project to AVENUE, where we are focusing on further development of the underlying general MT framework and expanding its application to new languages, including Inupiaq (a native Alaskan language), and native languages in Bolivia and Brazil. Together with Jaime Carbonell, Lori Levin, and a team of several graduate students, the primary research topics I am working on include: The design and implementation of a transfer-based MT framework specifically suitable for learning from data and for rapid prototyping of MT systems (work with Erik Peterson); Automatic learning of MT transfer-rules for languages with limited amounts of data resources (work with Kathrin Probst); Automatic rule refinement based on feedback from users (work withAriadna Font-Llitjos; and unsupervised learning of morphological inflection classes from monolingual data (work with Christian Monson).
Select Publications:

2003,Lavie, A., S. Vogel, L. Levin, E. Peterson, K. Probst, A. Font Llitjos, R. Reynolds, J. Carbonell, and R. Cohen, "Experiments with a Hindi-to-English Transfer-based MT System under a Miserly Data Scenario". ACM Transactions on Asian Language Information Processing (TALIP), 2(2).
2002,Probst, K., L. Levin, E. Peterson, A. Lavie, and J. Carbonell, "MT for Minority Languages Using Elicitation-Based Learning of Syntactic Transfer Rules". Machine Translation, 17(4).

The Hebrew-English MT Project:

As a direct follow-up to our AVENUE project work and in collaboration withShuly Wintner and hisComputational Linguistics Group at the University of Haifain Israel, we are developing a prototype Hebrew-to-English Machine Translation system that is based on the framework developed under AVENUE. This work is being supported by a small grant from the Caesaria Rothschild Institute at the University of Haifa.
Select Publications:

2004,Lavie, A., S. Wintner, Y. Eytani, E. Peterson and K. Probst. "Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System". In Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-2004), Baltimore, MD, October 2004.

The MEMT Project:

I was the lead-PI of a project on a new approach to Multi-Engine Machine Translation (MEMT). The goal of MEMT is to synthesize the output of multiple MT systems into a new output that is of higher accuracy than all of the contributing systems. The new approach invloves two main stages. An explicit word matcher is first used in order to identify the words that are common between the MT engine outputs. A decoding algorithm then uses this information, in conjunction with confidence estimates for the various engines and several statsitical language model features in order to score and rank a collection of sentence hypotheses that are synthetic combinations of words from the various original engines. The highest scoring sentence hypothesis is selected as the final output of our system. The project was funded by the DARPA GALE program, where our MEMT system served as an essential component for combining the output from multiple MT engines within the Interoperability Demonstration system (IOD). The MEMT system has been made available for experimentation to other research groups. Contact me by email to obtain a copy.
Select Publications:

2011,Heafield, K. and A. Lavie. "CMU System Combination in WMT 2011" . In Proceedings of the Sixth Workshop on Statistical Machine Translation, Edinburgh, Scotland, July 2011. Pages 145-151.
2010,Heafield, K. and A. Lavie. "Voting on N-grams for Machine Translation System Combination" . In Proceedings of the Ninth Biennial Conference of the Association for Machine Translation in the Americas (AMTA-2010), Denver, Colorado, November 2010.
2005,Jayaraman, S. and A. Lavie. "Multi-Engine Machine Translation Guided by Explicit Word Matching" . In Proceedings of the 10th Annual Conference of the European Association for Machine Translation (EAMT-2005), Budapest, Hungary, May 2005.

The METEOR Project:

METEOR is an automatic metric for MT evaluation that we have been developing at CMU for the past couple of years. METEOR is designed to address a number of weaknesses in the currently commonly used BLEU and NIST metrics. The metric heavily relies on an algorithm for finding an optimal word-to-word matching between a candidate MT translation and a human-produced reference translation for the same input sentence. METEOR produces normalized scores (in the range of [0,1]), and has been demonstrated to have significantly higher-levels of correlation with human judgments of MT quality, as compared with the more commonly used BLEU and NIST metrics. METEOR is freely available, and can be downloaded from here.
Select Publications:

2009,Lavie, A. and M. Denkowski, "The METEOR Metric for Automatic Evaluation of Machine Translation". Machine Translation Journal, 23(2-3). 2009. Pages 105-115. DOI: 10.1007/s10590-009-9059-4. The original publication is available at www.springerlink.com
2007,Lavie, A. and A. Agarwal, "METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments" . In Proceedings of the Second Workshop on Statistical Machine Translation at the 45th Meeting of the Association for Computational Linguistics (ACL-2007), Prague, Czech Republic, June 2007. Pages 228-231.
2005,Banerjee, S. and A. Lavie, "METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments" . In Proceedings of Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization at the 43th Annual Meeting of the Association of Computational Linguistics (ACL-2005), Ann Arbor, Michigan, June 2005.
2004,Lavie, A., K. Sagae and S. Jayaraman. "The Significance of Recall in Automatic Metrics for MT Evaluation". In Proceedings of the 6th Conference of the Association for Machine Translation in the Americas (AMTA-2004), Washington, DC, September 2004.

The GRASP Project:

I am PI of the GRASP Project (funded by NSF), where I am working together withBrian MacWhinney (co-PI) and Kenji Sagae on developing a framework for robust high-accuracy parsing of grammatical relations in spoken language data. Our goal is to automatically annotate the CHILDES database (a large database of child-parent conversations) with grammatical relations, in order to support advanced corpus-based research of child language acquisition.
Select Publications:

2007,Sagae, K., E. Davis, A. Lavie, B. MacWhinney and S. Wintner, "High-accuracy Annotation and Parsing of CHILDES Transcripts" . In Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition at the 45th Meeting of the Association for Computational Linguistics (ACL-2007), Prague, Czech Republic, June 2007. Pages 25-32.
2005,Sagae, K., A. Lavie and B. MacWhinney, "Automatic Measurement of Syntactic Development in Child Language" . In Proceedings of the 43th Annual Meeting of the Association of Computational Linguistics (ACL-2005), Ann Arbor, Michigan, June 2005.
2004,Sagae, K., B. MacWhinney and A. Lavie "Adding Syntactic Annotations to Transcripts of Parent-Child Dialogs". In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC-2004), Lisbon, Portugal, May 2004.

Previous Research Projects

I was a co-PI of the Nespole!and C-STAR speech translation projects and of the LingWearand Babylon mobile speech translation projects.

The Nespole! project (2000-2003) was funded jointly by the European Commission and the US NSF. The main goal of the project was to advance the state-of-the-art of speech-to-speech translation in a real-world setting of common users involved in e-commerce applications. The project was a collaboration between three European research labs (ITC-irst in Trento Italy, ISL at University of Karlsruhe in Germany, CLIPS at UJF in Grenoble France), our research group at CMU, and two industrial partners (APT - the Trentino provincial tourism bureau, and AETHRA - an Italian tele-communications commercial company).
The C-STAR project is part of an ongoing joint collaboration between research labs in seven different countries (Japan, Korea, China, Italy, France, Germany and USA) on construction of robust spoken language translation systems for dedicated applications. I was primarily involved in phases C-STAR-II (1996-1999) and C-STAR-III (1999-2002) of the project.
The LingWear (2000-2001) and Babylon (2002-2004) projects were concerned with the development of mobile, hand-held Speech Translation applications in support of military and civilian users.

I was the lead PI of AMTEXT project (2003-2005, funded by DoD), a small pilot project that investigated the feasibility of a rapid development approach to Machine Translation based on Information Extraction. The approach builds upon the MT transfer framework developed in the AVENUE project and on Fei Huang's work on translation of Named Entities. The main idea is to use a small elicitation corpus of translated and word-aligned sentences to semi-automatically learn pattern transfer-rules that can then be used to both extract the information of interest in the source-language and translate this information into the target-language.

I was a co-PI of the Clarity project (1997-1999, funded by DoD) on the automatic detection and classification of the discourse structure of spoken language.

Other Research Interests

I have a general interest in parsing algorithms for natural and programming languages and in theoretical problems related to parsing. My own research has primarily focused on the area of robust analysis and understanding of spoken language. In my PhD work, I developed GLR*, one of the first robust parsers for spoken language analysis, and a key component in the earlier versions of the JANUS speech translation system.

Teaching

From 1996 to 2014, I was the lead instructor of the Algorithms for NLP (11-711) course at the LTI. Algorithms for NLP is an introductory graduate-level course on the computational properties of natural languages and the fundamental algorithms for processing natural languages. The course provides an in-depth presentation of the major algorithms used in NLP, including Lexical, Morphological, Syntactic and Semantic analysis, with the primary focus on parsing algorithms and their analysis.

I was also a co-instructor of the Machine Translation (11-731) andAdvanced MT Seminar (11-734) courses, and co-supervised the NLP Lab (11-712) and theMT Lab (11-732) courses.

My Students

John Mendonca (PhD)
Greg Hanneman (PhD)

My Students that have Graduated

Ricardo Rei (PhD, 2024) (co-advised with Luisa Coheur)
Austin Matthews (PhD, 2019) (co-advised with Chris Dyer)
Michael Denkowski (PhD, 2015)
Jonathan Clark (PhD, 2014)
Kenneth Heafield (PhD, 2013)
Christian Monson (PhD, 2008) (co-advised with Jaime Carbonell)
Kenji Sagae (PhD, 2006) (co-advised with Brian MacWhinney)
Kathrin Probst (PhD, 2005) (co-advised with Jaime Carbonell and Lori Levin)
Chad Langley (PhD, 2003)
Hassan Al-Haj (MS, 2011)
Alok Parlikar (MS, 2009, co-advised with Stephan Vogel)
Danny Rashid (MS, 2009)
Abhaya Agarwal (MS, 2008)
Erik Peterson (MS, 2008)
Eric Davis (MS, 2008)
Shyamsundar Jayaraman (MS, 2005)
Matthew Broadhead (MS, 1997)
Cortis Clark (MS, 1997)

Select Talks and Presentations

Machine Translation in Academia and in the Commercial World: a Contrastive Perspective. Invited Keynote Presentation at the Ninth Workshop on Statistical Machine Translation, Baltimore, Maryland USA, June 26-27, 2014.
Evaluating the Output of Machine Translation Systems. Tutorial Presented at the 13th MT Summit, Xiamen, China. September 19, 2011.
Statistical MT with Syntax and Morphology: Challenges and Some Solutions. Presentation at LTI Colloquium. September 2, 2011.
Machine Translation Overview. Presentation at LTI Immigration Course. August 22, 2011.
Evaluation of Machine Translation Systems: Metrics and Methodology. Invited Presentation at the 56th IFIP WG 10.4 Meeting, Obidos, Portugal. July 3, 2009.
Stat-XFER: A General Search-based Syntax-driven Framework for MT. Research talk at MT Marathon, Prague. January 26, 2009.
Syntax-driven Learning of Sub-sentential Translation Equivalents and Translation Rules from Parsed Parallel Corpora. Presented at the Second Workshop on Syntax and Structure in Statistical Translation at the 46th Meeting of the Association for Computational Linguistics (ACL-2008), Columbus, OH. June 20, 2008.
Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System. Presented at ISCOL workshop at the 9th Bar-Ilan Symposium on the Foundations of AI (BISFAI-2007), Bar-Ilan, Israel. June 20, 2007.

My Full Publication List

Miscellaneous Information

Contact Information

Office:

5715 Gates-Hillman Complex

+1-412-268-5655

Fax: +1-412-268-6298

Administrative Assistant:

Mary Jo Bensasi

65xx Gates-Hillman Complex

maryjob AT cs DOT cmu DOT edu

+1-412-268-7517

Mailing Address:

Dr. Alon Lavie

Language Technologies Institute

School of Computer Science

Carnegie Mellon University

5000 Forbes Avenue

Pittsburgh, PA 15213-3891

Email:

alavie AT cs DOT cmu DOT edu (anti-spam notation)

Home:

5124 Beeler St.

Pittsburgh, PA 15217

+1-412-621-0933