Andrew McCallum Homepage (original) (raw)
Research
The main goal of my research is to dramatically increase our ability to mine actionable knowledge from unstructured text. I am especially interested in information extraction from the Web, understanding the connections between people and between organizations, expert finding, social network analysis, and mining the scientific literature & community. Toward this end my groupdevelops and employs various methods in statistical machine learning, natural language processing, information retrieval and data mining---tending toward probabilistic approaches and graphical models. For more information see our current projectsand publications.
News
- We are building an "open reviewing" system for ICLR 2013 and other venues. If you are interested in alternative approaches to peer review, please talk with me!
- FACTORIEis a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.
- I was the General Chair of ICML 2012, with Program Chairs Joelle Pineau and John Langford.
- Generalized Expectation is an accurate way to train models by labeling features.
- We have publicly launched Rexa, a new research paper search engine. It is a sibling to CiteSeer and Google Scholar, except that it provides search and browsing over more "object types", including not just papers, but also people, grants and topics.
- Charles Sutton and I have a comprehensive introduction to conditional random fields now published by Foundations and Trends in Machine Learning.
- I've written an introduction to information extraction by machine learning, intended for an audience that doesn't know machine learning. Information Extraction: Distilling Structured Data from Unstructured Text . Andrew McCallum. ACM Queue, Volume 3, Number 9, November 2005.
- MALLET is a Java toolkit for machine learning applied to natural language. It provides facilities for document classification, information extraction, part-of-speech tagging, noun phrase segmentation, general finite state transducers and classification, and much more---all desgined to be extremely efficient for large data and feature sets. Although quite mature in functionality, documentation is still sparse.
- An analysis of topical trends in the five years of ICML before 2008.
- Three of my papers made it into CiteSeer's list of most cited computer science papers.