a general purpose data processor for the semantic web (original) (raw)

Cwm

Discussion | Features | Related Work | Development | Acks

Cwm (pronounced coom) is a general-purpose data processor for the semantic web, somewhat like sed, awk, etc. for text files or XSLT for XML. It is a forward chaining reasoner which can be used for querying, checking, transforming and filtering information. Its core language is RDF, extended to include rules, and it uses RDF/XML or RDF/N3 (see Notation3 Primer) serializations as required.

Cwm is written in python; it is part of SWAP, a Semantic Web Application Platform. It is open source under the W3C software license.

[Download cwm.tar.gz now]

Quick Reference:

Discussion -- places to talk about this

public-cwm-announce

This low-traffic is for announcements about releases of cwm software, and monthly summaries of changes to the bug/RFE list. You might find the cwm-announce RSS feed handy.

public-cwm-bugs

This is for the announcement and brief discussion/clarification of cwm bugs or Requests For Enhancement (RFE). If responding to an existing bug, only use mailers which send the refernce headers so that the threads on this mail ling list work. For new threads, please make the subject line informative, and use the word "bug" or "RFE" as appopriate. The current plan is to review changes in this monthly and send it to the announce list.

public-cwm-talk

Discussion by users and/or developers of the use and abuse of project software.

Semantic Web Interest group

Features and Tutorial

The Semantic Web Tutorial Using N3 covers features such as:

Loading files in RDF/XML and/or N3, generating RDF or N3 files from the result.
- (The obscureboring parts of RDF/XML syntax, specifically reification and XML Literal parse type, arenot handled by the main parser).
Pretty printing data so that anonymous nodes are used creatively to minimize the number of explicit existentials (generated Ids).
Applying rules written in N3 to the data
Filtering the data to the result of a particular query
Generating arbitrary formats (using --strings)
Using an internal knowledge of functions to resolve them within a query, including:
- Simple math and string operations
- Getting and parsing documents from the web
- Accessing command line arguments and environment variables
- Cryptography: hashing, generating keys, signing things and checking signatures.

Other features are in development, and haven't been documented as thoroughly:

Accessing the web to directly or indirectly resolve a query, including:
- Getting schemas for terms in the query
- Using metadata to point to definitive documents
- Looking up data in local or remote SQL servers

Environment Variables

CWM_RDF_PARSER

rdflib2rdf or sax2rdf (default). Affects the choice of RDF parser module used by cwm.

Security Issues

Be careful when using rules from an untrusted source.

Rules can read data from the web, indirectly letting data out by the URIs they use.
Rules can take up your resources such as processor time and memory.
Rules can pick data up from within the web you have access to, including confidential files.

Be carfeul even when using cryptography. I am not an expert but a few things to watch are:

Allways think where the weakest link is. It is not always on the net.
Where do you keep the private key, anyway?
Beware of all forms of attack, including replay and man in the middle.
Always sign some random junk as well as the critical data to prevent the reverse engineering of the key.
Ask a crypto specialist to look over your stuff
Make the techniques, rules, code. public. Public debugging is valuable. Trying to hide it from attackers by keeping it secret doesn't pay.
This code is not guaranteed anyway, or made for production use. It is designed for prototyping new semantic web applications. Use at your own risk.

About the name cwm

Originally, the name is from from "Closed World Machine" because it processed information in a limited space, cwm does not make any assumptions about a closed world. Think of it as defined area but with openings - like a valley.

see also Sean Palmer's guide to cwm -- sometimes it is more up to date than this!

Check out other programs which use the same language:

Euler - a backward-chaining reasoner by Jos de Roo. Euler will tell you whether a give set of facts and rules supports a give conclusion.
EulerSharp - a C# port of Euler
cwmclone - a partial clone of cwm by Bijan Parsia to XSB prolog engine - to demonstrate that conventional logica programming tools are efficent and straightforwradly adapted to semantic web work.
Jena RDF toolkit now accepts N3 as well as RDF/XML (2003/2)
RDF::Notation3 perl module (submission 10 Oct 2001 by Petr Cimprich )
Swish - N3 -capable semantic web code Haskell by Graham Kline (2003/6)
Pychinko is a rete-based cwm clone - should be much faster.

Development

This swap code is open source and available for those that want to play with it, but comes with no warranty.

Using CVS

from the public w3c CVS repository. Check out the whole tree to develop. This includes the test data - if you don't need that, delete the test subdirectory. Make a fresh directory where you want to put stuff from dev.w3.org.

$ cvs -d:pserver:anonymous@dev.w3.org:/sources/public login password? anonymous $ cvs -d:pserver:anonymous@dev.w3.org:/sources/public get 2000/10/swap

From the web

Get the files one by one. cwm.py is the main application file. You can browse the source files on the web, but this is not a practical way to install the system.

In the following, we assume $SWAP expands to the place where you have code checked out.

Test Driven Development (Don't trust the docs ;-)

The best test of works is what has been tested. So the list of files in the regression test defines the set of features which are generally checked on each checkin. Cwm developers agree that all the tests have to pass before code is checked in. To run the tests, do make in the swap/test directory. We reckon to add a test for a new bug, so that bugs don't recur in future versions.

Each subdirectory of test has its own detailed.tests file. In that you can put tests for new features. Note that the test commands are all writted to be run in $SWAP/test.

How to make a release

Remember to cvs update to ensure you have any changes other people have done before running tests.
A quick test that your code still works is cd $SWAP/test; make fast
The test a release must pass before you make it is cd $SWAP/test; make pre-release
Update the releases page with details of the new bug fixes and/or features.
Edit $SWAP/Makefile
- Make sure the HTML files generated from any new .py files are added to the list HTMLS
- Change the version number if you are going to make a new tarball

Code Overview

Cwm developers agree to keep line lengths below 80 characters, though we have some code that predates that agreement.

llyn.py - The Store

An in-memory store which does the inference. See the Formula class methods for a more or less conventional RDF API. A Forumula is a set of statements.

notation3.py - Serializing/deserializaing RDF/N3

Originally written by Dan Connolly, uses a basic RDF stream parser interface, migrating to API

Parses N3
Generates N3

The command line form (alias n3 python notation3.py; n3 -help) allows RDF to be parsed and re-output.

The module will also run as a CGI script to convert N3 to RDF M&S 1.0 - by DanC magic.

Source

xml2rdf.py Parsing RDF/XML

Based on Python's xmllib, this parser is compatible with the RDF stream interface of, notation3.py. It completes the square of parsers and generators. Defunct. Now use sax parser and sax2rdf.py.

Parses RDF

It has a command line mode for self-test purposes.

Source

cwm_xxx.py - builtin modules

These are quite easy to add to. Look at a few and clone a similar one to the one you have made.

Design issues

The code above investigated and raised issues discussed in the following documents.

not to mention

RDFM&S and schema issues
The question of quoting and BagIDs etc

Acknowledgements

Thanks to Dan Connolly for writing the first code and thereby introducing me to Python, and to him and Sean Palmer and Mark Nottingham for writing built-in function modules. Eric Prud'hommeaux wrote the remote database query and mySQL interface. Sandro Hawke has made various contributions. Yosi Scharf engineered the cwm 1.0.0 release and fixed various bugs and added SPARQL support. Thanks to Sean for his guide to cwm, as well as n3p, which is the basis for Cwm's Sparql parser.. Thanks for all on #RDFIG for being everything which is #RDFIG.

Development of cwm is supported in part by funding from US Defense Advanced Research Projects Agency (DARPA) and Air Force Research Laboratory, Air Force Materiel Command, USAF, under agreement number F30602-00-2-0593, "Semantic Web Development".

License

Cwm: http://www.w3.org/2000/10/swap/doc/cwm.html

Copyright © 2000-2004 World Wide Web Consortium, (Massachusetts Institute of Technology, European Research Consortium for Informatics and Mathematics, Keio University). Parts of the Sparql parser are Copyright © Sean Palmer. All Rights Reserved. This work is distributed under the W3C¨ Software License [1] in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [1] http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231

Historical stuff from this page

Tim BL, with his director hat off
Id:cwm.html,v1.762009/10/2013:06:52syosiExpId: cwm.html,v 1.76 2009/10/20 13:06:52 syosi Exp Id:cwm.html,v1.762009/10/2013:06:52syosiExp