a general purpose data processor for the semantic web (original) (raw)
W3C | Semantic Web | SWAP
Cwm
Discussion | Features | Related Work | Development | Acks
Cwm (pronounced coom) is a general-purpose data processor for the semantic web, somewhat like sed, awk, etc. for text files or XSLT for XML. It is a forward chaining reasoner which can be used for querying, checking, transforming and filtering information. Its core language is RDF, extended to include rules, and it uses RDF/XML or RDF/N3 (see Notation3 Primer) serializations as required.
Cwm is written in python; it is part of SWAP, a Semantic Web Application Platform. It is open source under the W3C software license.
Quick Reference:
Discussion -- places to talk about this
This low-traffic is for announcements about releases of cwm software, and monthly summaries of changes to the bug/RFE list. You might find the cwm-announce RSS feed handy.
This is for the announcement and brief discussion/clarification of cwm bugs or Requests For Enhancement (RFE). If responding to an existing bug, only use mailers which send the refernce headers so that the threads on this mail ling list work. For new threads, please make the subject line informative, and use the word "bug" or "RFE" as appopriate. The current plan is to review changes in this monthly and send it to the announce list.
Discussion by users and/or developers of the use and abuse of project software.
Semantic Web Interest group
Features and Tutorial
The Semantic Web Tutorial Using N3 covers features such as:
- Loading files in RDF/XML and/or N3, generating RDF or N3 files from the result.
- (The obscureboring parts of RDF/XML syntax, specifically reification and XML Literal parse type, arenot handled by the main parser).
- Pretty printing data so that anonymous nodes are used creatively to minimize the number of explicit existentials (generated Ids).
- Applying rules written in N3 to the data
- Filtering the data to the result of a particular query
- Generating arbitrary formats (using --strings)
- Using an internal knowledge of functions to resolve them within a query, including:
- Simple math and string operations
- Getting and parsing documents from the web
- Accessing command line arguments and environment variables
- Cryptography: hashing, generating keys, signing things and checking signatures.
See also: Cwm command line arguments reference.
Other features are in development, and haven't been documented as thoroughly:
- Accessing the web to directly or indirectly resolve a query, including:
- Getting schemas for terms in the query
- Using metadata to point to definitive documents
- Looking up data in local or remote SQL servers
Environment Variables
CWM_RDF_PARSER
rdflib2rdf or sax2rdf (default). Affects the choice of RDF parser module used by cwm.
Security Issues
Be careful when using rules from an untrusted source.
- Rules can read data from the web, indirectly letting data out by the URIs they use.
- Rules can take up your resources such as processor time and memory.
- Rules can pick data up from within the web you have access to, including confidential files.
Be carfeul even when using cryptography. I am not an expert but a few things to watch are:
- Allways think where the weakest link is. It is not always on the net.
- Where do you keep the private key, anyway?
- Beware of all forms of attack, including replay and man in the middle.
- Always sign some random junk as well as the critical data to prevent the reverse engineering of the key.
- Ask a crypto specialist to look over your stuff
- Make the techniques, rules, code. public. Public debugging is valuable. Trying to hide it from attackers by keeping it secret doesn't pay.
- This code is not guaranteed anyway, or made for production use. It is designed for prototyping new semantic web applications. Use at your own risk.
About the name cwm
Originally, the name is from from "Closed World Machine" because it processed information in a limited space, cwm does not make any assumptions about a closed world. Think of it as defined area but with openings - like a valley.
see also Sean Palmer's guide to cwm -- sometimes it is more up to date than this!
Check out other programs which use the same language:
- Euler - a backward-chaining reasoner by Jos de Roo. Euler will tell you whether a give set of facts and rules supports a give conclusion.
- EulerSharp - a C# port of Euler
- cwmclone - a partial clone of cwm by Bijan Parsia to XSB prolog engine - to demonstrate that conventional logica programming tools are efficent and straightforwradly adapted to semantic web work.
- Jena RDF toolkit now accepts N3 as well as RDF/XML (2003/2)
- RDF::Notation3 perl module (submission 10 Oct 2001 by Petr Cimprich )
- Swish - N3 -capable semantic web code Haskell by Graham Kline (2003/6)
- Pychinko is a rete-based cwm clone - should be much faster.
Development
This swap code is open source and available for those that want to play with it, but comes with no warranty.
Using CVS
from the public w3c CVS repository. Check out the whole tree to develop. This includes the test data - if you don't need that, delete the test subdirectory. Make a fresh directory where you want to put stuff from dev.w3.org.
$ cvs -d:pserver:anonymous@dev.w3.org:/sources/public login password? anonymous $ cvs -d:pserver:anonymous@dev.w3.org:/sources/public get 2000/10/swap
From the web
Get the files one by one. cwm.py is the main application file. You can browse the source files on the web, but this is not a practical way to install the system.
In the following, we assume $SWAP expands to the place where you have code checked out.
Test Driven Development (Don't trust the docs ;-)
The best test of works is what has been tested. So the list of files in the regression test defines the set of features which are generally checked on each checkin. Cwm developers agree that all the tests have to pass before code is checked in. To run the tests, do make in the swap/test directory. We reckon to add a test for a new bug, so that bugs don't recur in future versions.
Each subdirectory of test has its own detailed.tests file. In that you can put tests for new features. Note that the test commands are all writted to be run in $SWAP/test.
How to make a release
- Remember to
cvs update
to ensure you have any changes other people have done before running tests. - A quick test that your code still works is
cd $SWAP/test; make fast
- The test a release must pass before you make it is cd $SWAP/test; make pre-release
- Update the releases page with details of the new bug fixes and/or features.
- Edit $SWAP/Makefile
- Make sure the HTML files generated from any new .py files are added to the list HTMLS
- Change the version number if you are going to make a new tarball
Code Overview
Cwm developers agree to keep line lengths below 80 characters, though we have some code that predates that agreement.
llyn.py - The Store
An in-memory store which does the inference. See the Formula class methods for a more or less conventional RDF API. A Forumula is a set of statements.
notation3.py - Serializing/deserializaing RDF/N3
Originally written by Dan Connolly, uses a basic RDF stream parser interface, migrating to API
- Parses N3
- Generates N3
The command line form (alias n3 python notation3.py; n3 -help) allows RDF to be parsed and re-output.
The module will also run as a CGI script to convert N3 to RDF M&S 1.0 - by DanC magic.
xml2rdf.py Parsing RDF/XML
Based on Python's xmllib, this parser is compatible with the RDF stream interface of, notation3.py. It completes the square of parsers and generators. Defunct. Now use sax parser and sax2rdf.py.
- Parses RDF
It has a command line mode for self-test purposes.
cwm_xxx.py - builtin modules
These are quite easy to add to. Look at a few and clone a similar one to the one you have made.
Design issues
The code above investigated and raised issues discussed in the following documents.
not to mention
- RDFM&S and schema issues
- The question of quoting and BagIDs etc
Acknowledgements
Thanks to Dan Connolly for writing the first code and thereby introducing me to Python, and to him and Sean Palmer and Mark Nottingham for writing built-in function modules. Eric Prud'hommeaux wrote the remote database query and mySQL interface. Sandro Hawke has made various contributions. Yosi Scharf engineered the cwm 1.0.0 release and fixed various bugs and added SPARQL support. Thanks to Sean for his guide to cwm, as well as n3p, which is the basis for Cwm's Sparql parser.. Thanks for all on #RDFIG for being everything which is #RDFIG.
Development of cwm is supported in part by funding from US Defense Advanced Research Projects Agency (DARPA) and Air Force Research Laboratory, Air Force Materiel Command, USAF, under agreement number F30602-00-2-0593, "Semantic Web Development".
License
Cwm: http://www.w3.org/2000/10/swap/doc/cwm.html
Copyright © 2000-2004 World Wide Web Consortium, (Massachusetts Institute of Technology, European Research Consortium for Informatics and Mathematics, Keio University). Parts of the Sparql parser are Copyright © Sean Palmer. All Rights Reserved. This work is distributed under the W3C¨ Software License [1] in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [1] http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231
Historical stuff from this page
Tim BL, with his director hat off
Id:cwm.html,v1.762009/10/2013:06:52syosiExpId: cwm.html,v 1.76 2009/10/20 13:06:52 syosi Exp Id:cwm.html,v1.762009/10/2013:06:52syosiExp