World Wide Knowledge Base (Web->KB) project (original) (raw)

### Goal:

To develop a probabilistic, symbolic knowledge base that mirrors the content of the world wide web. If successful, this will make text information on the web available in computer-understandable form, enabling much more sophisticated information retrieval and problem solving.

Approach:

We are developing a system that can be trained to extract symbolic knowledge from hypertext, using a variety of machine learning methods.

Datasets:

The first experiments consisted in extracting knowledge about computer science departments. We have assembled two data sets for this task:

Other Datasets used by the WebKB Group

See the other research on text learning by our research group.

Publications:

Overview of the Project:

Text and Hypertext Classification:

Relational Learning for Hypertext Domains:

Automatic Corpus Construction from the Web

Spidering:

Information Extraction:

Student projects and unpublished reports:

Researchers:

Project Alumni:


theo-11-last update: Jan 2001 by Rayid Ghani

this web page is stored at /afs/cs.cmu.edu/project/theo-11/www/wwkb/index.html