Andrew McCallum Homepage (original) (raw)

Research

The main goal of my research is to dramatically increase our ability to mine actionable knowledge from unstructured text. I am especially interested in information extraction from the Web, understanding the connections between people and between organizations, expert finding, social network analysis, and mining the scientific literature & community. Toward this end my groupdevelops and employs various methods in statistical machine learning, natural language processing, information retrieval and data mining---tending toward probabilistic approaches and graphical models. For more information see our current projectsand publications.

News

We are building an "open reviewing" system for ICLR 2013 and other venues. If you are interested in alternative approaches to peer review, please talk with me!
FACTORIEis a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.
I was the General Chair of ICML 2012, with Program Chairs Joelle Pineau and John Langford.
Generalized Expectation is an accurate way to train models by labeling features.
We have publicly launched Rexa, a new research paper search engine. It is a sibling to CiteSeer and Google Scholar, except that it provides search and browsing over more "object types", including not just papers, but also people, grants and topics.
Charles Sutton and I have a comprehensive introduction to conditional random fields now published by Foundations and Trends in Machine Learning.
I've written an introduction to information extraction by machine learning, intended for an audience that doesn't know machine learning. Information Extraction: Distilling Structured Data from Unstructured Text . Andrew McCallum. ACM Queue, Volume 3, Number 9, November 2005.
MALLET is a Java toolkit for machine learning applied to natural language. It provides facilities for document classification, information extraction, part-of-speech tagging, noun phrase segmentation, general finite state transducers and classification, and much more---all desgined to be extremely efficient for large data and feature sets. Although quite mature in functionality, documentation is still sparse.
An analysis of topical trends in the five years of ICML before 2008.
Three of my papers made it into CiteSeer's list of most cited computer science papers.