Alon's Home Page (original) (raw)

Dr. Alon Lavie Consulting Professor Language Technologies Institute School of Computer Science Carnegie Mellon University Pittsburgh, PA USA Email: alavie AT cs DOT cmu DOT edu (anti-spam notation).

I am currently a Consulting (adjunct) Professor at the Language Technologies Institute (LTI) at Carnegie Mellon University (CMU), where I have been a member of the faculty since 1996. For almost 20 years (1996-2015) I was a Research Professor at the LTI.

Concurrently, I am currently the VP of AI Research atPhrase, a leading enterprise translation automation technology platform, where I lead and manage our AI research team in Pittsburgh, Prague and Edinburgh, and provide strategic leadership on our AI R&D and product development company-wide. Prior to joining Phrase in August 2023, I was the VP of Language Technologies atUnbabel, with leadership responsibilities for AI R&D company-wide, and with a focus on the development of Translation Quality Technologies.

My primary research interests and activities focus on Machine Translation (MT) and on MT Evaluation. I directed and led the ten year development (2004-2014) of theMETEOR automated MT evaluation metric. More recently, while at Unbabel, I directed the development of a new neural MT evaluation metric namedCOMET, and a complementary tool for MT quality analysis namedMT-Telescope. My other main research interests focus on MT adaptation approaches with and without human feedback, applied to both high-resource language pairs as well as low-resource and minority languages. Additional interests include translation Quality Estimation, and methods for multi-engine MT system combination.

In 2009, I co-founded a technology start-up company by the name ofSafaba Translation Solutions, and I served the company as Chairman of the Board, President and CTO. Safaba developed automated translation solutions for large global enterprises that allowed them to translate large volumes of content in all the languages of their markets. Safaba's approach focused on generating client-adapted high-quality translations using machine-learning-based technology. In June 2015, Safaba was acquired by Amazon.

From June 2015 to March 2019, I was a senior manager at Amazon, where I led and managed the Amazon Machine Translation R&D group in Pittsburgh.

I served as President of the International Association for Machine Translation (IAMT) (2013-2015). I previously served two terms as president of the Association for Machine Translation in the Americas (AMTA) (2008-2012), and was General Chair of the AMTA 2010 and 2012 conferences, and of the 2015 MT Summit conference. I am also a member of the Association for Computational Linguistics (ACL), where I was president of SIGParse - ACL's special interest group on parsing (2008-2013).

In August 2021, at the 18th biennial Machine Translation Summit conference, I was honored to be awarded with the 2021 Makoto Nagao IAMT Award of Honour for my contributions to the field of Machine Translation.


Research

My main areas of research are Machine Translation (MT) and Natural Language Processing (NLP), and in particular, NLP technologies applied to language translation and multi-lingual processing problems. My current most active areas of research focus are Machine Translation adaptation approaches with human feedback and syntax-driven statistical and hybrid approaches to Machine Translation, applied to both high-resource language pairs as well as low-resource and minority languages. One main focus of work has been the development of novel syntax-based methods for acquisition of the resources that are necessary for MT. I have also actively worked on frameworks for Multi-Engine Machine Translation (MEMT) and on developing automatic metrics for MT evaluation (particularly, METEOR). I have also worked extensively in the past on developing parsing approaches for accurate annotation of Grammatical Relations (GRs)in spoken language data, on robust parsing algorithms for analysis of spoken language, and on the design and development of Speech-to-Speech Machine Translation systems.

Select Research Projects:

The AVENUE and LETRAS Projects:

I was co-PI of the AVENUE and LETRAS projects (funded by NSF). AVENUE is concerned with the design and rapid development of new Machine Translation methods for languages for which only scarce resources are available. Our goal in AVENUE is to apply these new MT methods to minority languages, with a specific focus on native languages of North and Latin America. We worked on developing MT systems between Spanish and Mapudungun, a native language spoken in southern Chile, and have started working on Quechua, a native language spoken mainly in Peru, Ecuador and Bolivia. The LETRAS project is a follow-on project to AVENUE, where we are focusing on further development of the underlying general MT framework and expanding its application to new languages, including Inupiaq (a native Alaskan language), and native languages in Bolivia and Brazil. Together with Jaime Carbonell, Lori Levin, and a team of several graduate students, the primary research topics I am working on include: The design and implementation of a transfer-based MT framework specifically suitable for learning from data and for rapid prototyping of MT systems (work with Erik Peterson); Automatic learning of MT transfer-rules for languages with limited amounts of data resources (work with Kathrin Probst); Automatic rule refinement based on feedback from users (work withAriadna Font-Llitjos; and unsupervised learning of morphological inflection classes from monolingual data (work with Christian Monson).
Select Publications:

The Hebrew-English MT Project:

As a direct follow-up to our AVENUE project work and in collaboration withShuly Wintner and hisComputational Linguistics Group at the University of Haifain Israel, we are developing a prototype Hebrew-to-English Machine Translation system that is based on the framework developed under AVENUE. This work is being supported by a small grant from the Caesaria Rothschild Institute at the University of Haifa.
Select Publications:

The MEMT Project:

I was the lead-PI of a project on a new approach to Multi-Engine Machine Translation (MEMT). The goal of MEMT is to synthesize the output of multiple MT systems into a new output that is of higher accuracy than all of the contributing systems. The new approach invloves two main stages. An explicit word matcher is first used in order to identify the words that are common between the MT engine outputs. A decoding algorithm then uses this information, in conjunction with confidence estimates for the various engines and several statsitical language model features in order to score and rank a collection of sentence hypotheses that are synthetic combinations of words from the various original engines. The highest scoring sentence hypothesis is selected as the final output of our system. The project was funded by the DARPA GALE program, where our MEMT system served as an essential component for combining the output from multiple MT engines within the Interoperability Demonstration system (IOD). The MEMT system has been made available for experimentation to other research groups. Contact me by email to obtain a copy.
Select Publications:

The METEOR Project:

METEOR is an automatic metric for MT evaluation that we have been developing at CMU for the past couple of years. METEOR is designed to address a number of weaknesses in the currently commonly used BLEU and NIST metrics. The metric heavily relies on an algorithm for finding an optimal word-to-word matching between a candidate MT translation and a human-produced reference translation for the same input sentence. METEOR produces normalized scores (in the range of [0,1]), and has been demonstrated to have significantly higher-levels of correlation with human judgments of MT quality, as compared with the more commonly used BLEU and NIST metrics. METEOR is freely available, and can be downloaded from here.
Select Publications:

The GRASP Project:

I am PI of the GRASP Project (funded by NSF), where I am working together withBrian MacWhinney (co-PI) and Kenji Sagae on developing a framework for robust high-accuracy parsing of grammatical relations in spoken language data. Our goal is to automatically annotate the CHILDES database (a large database of child-parent conversations) with grammatical relations, in order to support advanced corpus-based research of child language acquisition.
Select Publications:

Previous Research Projects

I was a co-PI of the Nespole!and C-STAR speech translation projects and of the LingWearand Babylon mobile speech translation projects.

I was the lead PI of AMTEXT project (2003-2005, funded by DoD), a small pilot project that investigated the feasibility of a rapid development approach to Machine Translation based on Information Extraction. The approach builds upon the MT transfer framework developed in the AVENUE project and on Fei Huang's work on translation of Named Entities. The main idea is to use a small elicitation corpus of translated and word-aligned sentences to semi-automatically learn pattern transfer-rules that can then be used to both extract the information of interest in the source-language and translate this information into the target-language.

I was a co-PI of the Clarity project (1997-1999, funded by DoD) on the automatic detection and classification of the discourse structure of spoken language.

Other Research Interests

I have a general interest in parsing algorithms for natural and programming languages and in theoretical problems related to parsing. My own research has primarily focused on the area of robust analysis and understanding of spoken language. In my PhD work, I developed GLR*, one of the first robust parsers for spoken language analysis, and a key component in the earlier versions of the JANUS speech translation system.


Teaching

From 1996 to 2014, I was the lead instructor of the Algorithms for NLP (11-711) course at the LTI. Algorithms for NLP is an introductory graduate-level course on the computational properties of natural languages and the fundamental algorithms for processing natural languages. The course provides an in-depth presentation of the major algorithms used in NLP, including Lexical, Morphological, Syntactic and Semantic analysis, with the primary focus on parsing algorithms and their analysis.

I was also a co-instructor of the Machine Translation (11-731) andAdvanced MT Seminar (11-734) courses, and co-supervised the NLP Lab (11-712) and theMT Lab (11-732) courses.


My Students

My Students that have Graduated


Select Talks and Presentations


My Full Publication List


Miscellaneous Information


Contact Information

Office:

5715 Gates-Hillman Complex

+1-412-268-5655

Fax: +1-412-268-6298

Administrative Assistant:

Mary Jo Bensasi

65xx Gates-Hillman Complex

maryjob AT cs DOT cmu DOT edu

+1-412-268-7517

Mailing Address:

Dr. Alon Lavie

Language Technologies Institute

School of Computer Science

Carnegie Mellon University

5000 Forbes Avenue

Pittsburgh, PA 15213-3891

Email:

alavie AT cs DOT cmu DOT edu (anti-spam notation)

Home:

5124 Beeler St.

Pittsburgh, PA 15217

+1-412-621-0933