Vitor Carvalho .............................................................................................................................................................................................................(frequently
confused with Victor Carvalho) (original) (raw)
This page is outdated. Please check my current page instead.
I'm a Lead Research Scientist/Manager at Snapchat Research. I live in San Diego, CA.
My PhD thesis advisor was the ingenious William W. Cohen. I have worked at Microsoft, Qualcomm Research, Inome and Ericsson R&D. I'm interested in applied research interfacing Machine Learning, Natural Language Processing, Information Retrieval, Text Mining, Data Mining and AI.
Writing and Publications:
- A Few Recent papers:
- IJCAI 2017 , Exploring Personalized Neural Conversational Models
- GIS 2012 (GIBDA) , Geocoding Billions of Addresses: Toward a Spatial Record Linkage System with Big Data
- NAACL 2012 , The Intelius Nickname Collection: Quantitative Analyses from Billions of Public Records - LDC link for the data is here
- VLDB-2011 (QDB),The Case for Cost-Sensitive and Easy-To-Interpret Models in Industrial Record Linkage
- ECIR-2011 , An Analysis of Time-Instability in Web Search Results
- SIGIR Forum 2011 , Crowdsourcing for Search and Data Mining
- CIKM 2010, Online Stratified Sampling: Evaluating Classifiers at Web-Scale
- SIGIR CSE-2010, Proceedings of the SIGIR2010 Workshop on Crowdsourcing for Search Evaluation
- SIGIR 2010 FGSIR Workshop , Online Feature Selection for Information Retrieval
- SIGIR 2010, Exploring Reductions in Long Web Queries
- SIGIR 2010, Predicting Query Performance on the Web
- SIGIR 2009, Reducing Long Queries Using Query Quality Predictors
- CEAS 2009, Information Leaks and Suggestions: a Case Study using Mozilla Thunderbird
- CIKM 2008, Suppressing Outliers in Pairwise Preference Ranking
- AAAI WS-08-04, Proceedings of the AAAI 2008 EMAIL Workshop
- SIGIR-2008 LR4IR, A Meta-Learning Approach for Robust Rank Learning
- AAAI-2008 EMAIL Workshop , CutOnce - Recipient Recommendation and Leak Detection in Action
- ECIR-2008, Ranking Users for Intelligent Message Addressing
- WSDM-2008, Fast Learning of Document Ranking Functions with the Committee Perceptron
- Some older publications you may be looking for:
- WWW 2006 , Finding Advertising Keywords on Web Pages
- KDD 2006 , Single-Pass Online Learning: Performance, Voting Schemes and Online Feature Selection
- SIGIR 2005 , On the Collective Classification of Email "Speech Acts"
- EMNLP 2004 , Learning to Classify Email into "Speech Acts"
- CEAS 2004 , Learning to Extract Signature and Reply Lines from Email
- All older publications All Publications:
- All Publicationsin chronological order. Google Scholar entry: Vitor R. Carvalho . DBLP entry: Vitor R. Carvalho Software:
- Ciranda- Java package for email-speech-act prediction
- Jangada- Java package for extraction of signatures (sig files) and reply-to (quotes) lines in email messages
- Cut Once- A Mozilla Thunderbird plug-in for email information leak prevention and recipient recommendation
- I contribute to Minorthird, a package for text learning, classification, extraction and annotations Datasets:
- American English Nicknames Collection: created in our NAACL-HLT 2012 paper and distributed by the LDC (Linguistic Data Consortium)
- 617 messages from the 20 Newsgroups, annotated with reply bodies and signatures, used in the CEAS-2004 paper. Other Stuff (may be outdated):
- My "academic lineage" tracing back all the way to Leibniz, James Clerk Maxwell, Poisson, Lagrange, Bernoulli and Euler (compiled by William Cohen)
- I've recently organized the Workshop on Crowdsourcing for Search and Data Mining at ACM WSDM 2011 with Matt Leaseand Emine Yilmaz.
- I organized the SIGIR 2010Workshop on "Crowdsourcing for Search Evaluation" with Matt Leaseand Emine Yilmaz.
- A few recent program committees: SIGIR-2011, AAAI HCOMP 2011, NextMail-2011,CEAS-2010, NAACL-ACL 2010 Young Investigators , EMNLP-09, IEEE CEC-09(E3C),CEAS-09, IJCAI-09, ICML-09, COLING/ACL-06, AAAI-07, CEAS-05-06-07-08, WWW-08
- I organized, with Mark Dredze and Tessa Lau, EMAIL-2008: theAAAI-2008 Workshop on Enhanced Messaging
- Check out our new IR group blog....Probably Irrelevant
- I used to organize the CMU Machine Learning Lunch and the CMU Information Retrieval Discussion Series
- I was a TA in the Machine Learning (10-601) course during Fall 2007
- I was the TA of the Information Extractioncourse (MLD 10-707 and LTI 11-748) during Spring 07
- Alma Matres: CMU-SCS-LTI, UNICAMP-FEEC, UFPE, Colegio Diocesano - Teresina
- When in Pittsburgh, check out our radio program on WRCT (88.3 FM)