Semantic Web Inference using Haskell (original) (raw)
Nine by Nine | G. Klyne |
---|---|
Nine by Nine | |
May 29, 2003 |
Semantic Web Inference using Haskell
Abstract
This memo describes the software package Swish, which is being developed to conduct experiments in using the programming language Haskell as a basis for performing inference on Semantic Web data.
© 2003 G. Klyne; Some rights reserved.
Id:swish−0.1.html,v1.32003/06/0316:22:57grahamExpId: swish-0.1.html,v 1.3 2003/06/03 16:22:57 graham Exp Id:swish−0.1.html,v1.32003/06/0316:22:57grahamExp
Table of Contents
1.Introduction 2.Background 3.Description of Swish software 3.1Swish software components 3.2Swish command format 3.3Function interfaces 4.Software installation 4.1System requirements 4.2Installation files 4.3System dependent details 4.3.1Installation using Hugs 4.3.2Installation using GHCi 4.3.3Installation using GHC 4.4Installation testing 4.5Additional libraries used 4.5.1Parsec 4.5.2Quicksort 4.5.3HUnit 5.Conclusions 6.Acknowledgements §References §Author's Address A.Revision history B.Unresolved issues C.CVS revision log
1. Introduction
This memo describes Swish, a software package that is intended to provide a framework for building programs in Haskell that perform inference on RDF [1] data.
Swish is primarily intended to be used as a starting point for developing new RDF applications in Haskell, but it also includes a stand-alone program that can be used to perform some simple manipulation of RDF data. Currently, only the Notation 3 [4] serialization form is supported, but I'd like to add support for full RDF/XML in due course.
The software can be downloaded from the web by following links from http://www.ninebynine.org/Software/Intro.html. This software is made generally available under the GNU General Public Licence (GPL), version 2 (or later), a full copy of which is available at http://www.gnu.org/licenses/gpl.html, and also as file GPL.TXT in the Swish software distribution. Other licensing arrangements may be negotiated with the author: see LICENSING.TXT in the Swish software distribution.
2. Background
The Swish software package was developed as a result of experience using CWM [5], an off-the-shelf general-purpose RDF processing program, for evaluating simple inference rules on network access control information expressed in RDF [19]. Specifically, while the general inference capabilities of CWM were almost sufficient for the network access application, some capabilities were required that are unlikely to be provided by any completely general-purpose tool; e.g. analysis of IP network addresses by subnet and host address.
Additionally, the framework for datatyped literals currently proposed by the RDFcore working group [8] is quite open-ended, and it is not specified how generic applications may provide support for new or non-standard datatypes.
In light of these considerations, I sought ways of combining the full expressive capability of a general purpose programming language with the declarative style of inference rules and formal specifications. Using Haskell [11], a pure functional programming language, is the approach adopted.
To use Haskell as a basis for performing inference on RDF data, certain capabilities are neded:
- a means to store and access RDF data in Haskell,
- a capability to read RDF data using Haskell, and
- a capability to write RDF data using Haskell.
Swish aims to provide these capabilities. Further, it provides capabilities to compare RDF graphs (insensitive to possible renaming of blank nodes), and to merge RDF graphs, renaming blank nodes as necessary to prevent unintended semantic consequences [9].
I anticipate that the main use for Swish will be as a support library for new utilities that apply predefined RDF inference rules. Where CWM is a general-purpose tool for manipulating RDF, I expect to use Swish as a toolkit for creating tools to deal with specific RDF processing requirements. In time, this may lead to identification of some useful capabilities that can guide the design of future general-purpose RDF processing tools.
The programming language Haskell was chosen for a number of reasons:
- it is a pure functional language supporting a style of programming that is directly based on a specification that is being implemented,
- it is non-strict, employing lazy evaluation, meaning that, expressions are generally not evaluated unless and until their value is needed; this further supports a style of programming that is close to the typical form of mathematical-style of description that is often used to specify RDF inference rules, and
- it is relatively mature, with good-quality implementations available for a variety of platforms, supported by an active and extensive user community, and performance for compiled code that is comparable with that of more conventional programming languages.
More information about Haskell can be found at[11]. A useful paper discussing some particular characteristics of functional programming languages is [17].
3. Description of Swish software
Swish comprises a number of modules that can be invoked by Haskell programs, and a stand-alone command-line utility that can be used to perform some basic processing of RDF data.
The Haskell source code for the stand-alone utility may also be used as a starting point for similar utilities that perform specific application processing of RDF data.
3.1 Swish software components
Swish has the following components:
- Swish.hs, SwishMain.hs, SwishMonad.hs, SwishCommands.hs: Swish is a command line utility that performas some simple RDF processing tasks: checking Notation 3 syntax, comparing RDF graphs, merging RDF graphs. The source code may also be used as a basis for creating new RDF processing utilities.
- SwishTest.hs, RDFGraphTest.hs, N3ParserTest.hs, N3FormatterTest.hs, GraphTest.hs, LookupMapTest.hs, DateTimeTest.hs, ParseURITest.hs, ParseTest.hs, URITest.hs: these are all test modules, and are not part of any swish application. They use the HUnit library to run a series of unit tests on the various swish components. Each of these compiles to a stand-alone program that runs a series of test cases.
- RDFGraph.hs: this defines data types and methods for RDF graphs. It defines a datatype RDFLabel (URIs, bnodes, RDF literals and a couple of other extensions for future use), and datatype NSGraph that implements the graph class interface. (NSGraph is similar to GraphMem but adds prefix and formula lists to the content of a graph, and also defines a graph merge function that merges graphs with renaming of bnodes as needed to preserve semantics.) RDFGraph is a type synonym for NSGraph RDFLabel.
- N3Parser.hs: a Notation3 parser that converts string into an RDFGraph value. The parser is built using the Parsec library (except the URI parser that is currently based on some private parsing components).
- N3Formatter.hs: a Notation3 formatter that converts an RDFGraph value to a string, or to a ShowS function. The formatter uses a monad to track formatter state. Currently, the formatter is pretty primitive, and is intended to minimally satisfy the round-tripping of RDF graphs via Notation3 format (also using N3Parser). Future versions may try to generate more human-friendly output.
- GraphClass.hs: some basic definitions for a type class for representing and manipulating labelled directed graphs.
- GraphMem.hs: a graph implementation based on a simple in-memory list structure. There is much scope for performance improvement by replacing the list structure with a more efficiently accessed structure. (Ralf Hinzehttp://www.informatik.uni-bonn.de/~ralf/ has some interesting options for this; I like the idea of using a splay trees.) This code needs a reworking to properly separate the graph container structure from the graph node type.
- GraphMatch.hs: graph comparison function. This function tests two graphs for isomorphism, taking account of nodes that are "variable" (i.e. may be freely renamed without changing the fundmanetal graph structure). The code implements an algorithm by Jeremy Carroll[6].
- URI.hs, ParseURI.hs: URI handling code, including methods to deduce and process relative URI references. The code is based on an early draft of the revised URI specification [7], and may be subject to revision as that specification progresses.
- Parse.hs: Simple parser code, based on some example programs in Simon Thompson's book The Craft of Functional Programming[18]. This was written to support the URI handler, which was my very first program in Haskell, and suffers from my early confusion about how to use Haskell. In due course, the URI parser will be revised to use the Parsec library, and this code will be retired.
- LookupMap.hs: class and function definitions for lookup tables, which are used in various ways throughout the RDF graph handling code. (The Swish package would benefit from greatly from this code using a hash table or some other more serious lookup algorithm.)
- ListHelpers.hs, MiscHelpers.hs: a few useful helper functions.
- Data/*.n3: subdirectory Data of the main installation directory contains a number of Notation3 data files that are used to test the Swish utility, and in particular are assumed to be present by the SwishTest program.
- ghcc.bat, MakeSwish.bat, TestSwish.bat: these are files used to build and test the Swish software on a Windows operating system, using the Glasgow Haskell Compiler (GHC). They have been tested under Windows 2000, and should work on any recent Windows operating system (NT, 2K, XP, 98, 95) ghcc.bat will need to be edited to reflect the local installation of GHC and the support libraries. Examination of these files should also provide any information needed to build and run the software on non-Windows operating systems.
- A small number of other files contain further information about the software and/or related topics.
Currently, the only supported RDF graph serialization format is Notation3, but future developments may add support for other formats. RDF/XML would clearly be most desirable. Meanwhile, utilities such as CWM [5] can be used convert RDF/XML to and from Notation 3 format.
3.2 Swish command format
The Swish utility is a command-line utility that performs some simple RDF processing functions. The capabilities provided are with a view to testing the underlying RDF library software rather than performing any particular application purpose.
A Swish command contains a one or more command line options that are processed from left-to-right. The Swish program maintains an internal graph workspace, which is updated or referenced as the command options are processed.
Swish command options:
-?
Displays a summary of the command line options.
-n3
Indicates that Notation3 be used for subsequent input and output. (Currently, this is the only format option, and is selected by default.)
-i[=file]
read file into the graph workspace, replacing any existing graph. If the filename is omitted, the graph is read from standard input.
-m[=file]
read and merge file with the graph workspace. Blank nodes in the input file are renamed as necessary to avoid node identifiers already used by the existing graph. If the filename is omitted, the graph is read from standard input.
-c[=file]
read file and compare the resulting graph with the workspace. Graph comparison is done in a fashion that treats isomorphic graphs as equivalence, and is insensitive to renaming of blank nodes. This is intended to match the definition of graph equivalence in the RDF abstract syntax specification [10]. If the filename is omitted, the graph is read from standard input. If the graphs are unequal, the exit status code is 1.
-o[=file]
write the graph workspace to a file. If the filename is omitted, the graph is written to the standard output.
The Swish program terminates with a status code that indicates the final status of the operation(s) performed. Haskell distinguishes between a success status code whose value is not specified, assumed to be system dependent, and a failure code which is associated with an integer value. The status code values returned by Swish are:
Success
Operation completed successfully; graphs compare equal.
1
Graphs compare different.
2
Input data file incorrect format.
3
File access problem.
4
Incorrect option in command line.
Here are some example Swish command lines:
- swish -n3 -i=file
Read 'file' as Notation3, and report any syntax errors. - swich -n3 -i=file1 -c=file2
Read 'file1' and 'file2' as Notation3, report any syntax errors, and if both are OK, compare the resulting graphs, indicating whether or not they are equivalent. - swish -n3 -i=file1 -o=file2
Read 'file1' as Notation3, report any syntax errors, and output the resulting graph as reformatted Notation3. (The output will generally be different form the input, and may be unpretty. It may be used to test round-tripping of Notation 3 data.)
3.3 Function interfaces
[[[To be provided; until then see the source files, notably SwishCommands.hs, GraphClass.hs, RDFGraph.hs, and the various test modules.]]]
4. Software installation
The Swish software is distributed as a single ZIP archive. Start installation by creating an empty directory for the software, and extracting the content of the ZIP archive into that directory. Select the ZIP option that uses directory information from the archive so that the sub-directory structure is preserved.
The following sections deal with how get get the software running in different Haskell environments. The instructions relate to MS Windows operating systems, but it should be fairly obvious how to adapt the procedures for Unix/Linux systems.
4.1 System requirements
Swish is written entirely in Haskell, and should work with any Haskell system that supports Haskell 98 together with the extensions noted below. The software has been tested using Hugs [12] (version November 2002), Glasgow Haskell Compiler (GHC) [13] (version 5.04.3) and the interactive version of GHC (GHCi).
The required extensions to standard Haskell-98 are:
- Multi-parameter classes.
- Hierarchical libraries.
- Control.Monad.Trans and Control.Monad.State libraries, as distributed with GHC or Hugs.
Some freely available additional Haskell libraries are used, as described later. For convenience, these are included with the Swish software distribution, but are not themselves part of the Swish software for licensing purpose. More details are given later.
My development has been performed mostly using Hugs on a 1.3GHz PC with 256Mb of memory. For most purposes, this has been more than adequate. Some of the larger test cases, and the more perverse graph comparisons, may take several minutes to run on this platform (SwishTest takes about 20 minutes). In practice, the applications are likely to be more demanding than basic requirements of Swish.
4.2 Installation files
The Swish software distribution includes the following files:
Install directory
Swish.html, Swish.xml: this documentation file, and XML source code.
*.hs: Haskell source files (see software overview above).
*Test.hs: unit test Haskell source files.
*.bat: MS-Windows command files for building and testing the software using GHC.
*.txt: additional information, including licensing details.
Data subdirectory
Contains Notation3 data files used by the SwishTest program.
Parsec subdirectory
Contains the Parsec library used by Swish.
HUnit subdirectory
Contains the HUnit library used by Swish test modules.
Sort subdirectory
Contains the Quicksort library used by Swish. (References to this module can be removed, and the standard Haskell function List.sort used in place of QuickSort.)
4.3 System dependent details
4.3.1 Installation using Hugs
Running the Swish software under Hugs is straightforward. The Hugs options -98 and +N must be specified.
Special steps that might help include:
- Edit the Hugs registry key to include additional directories for the Parsec, HUnit and Sort libraries.
- Edit the Hugs registry key to increase the Hugs heap size (Hugs -h option) for some of the test cases. (Since doing that I've done some code tuning, which may mean the number below is unnecessarily high.)
- I have applied changes by using RegEdt32 to modify the registry key HKEY_CURRENT_USER\Software\Haskell\Hugs\Nov 2002\Options, though some of these changes can be applied using the Hugs :set command. By saving these changes in the registry, I can run Hugs directly from my editor, or by activiating a source file in file manager, and have all the right options applied.
- To run a program in Hugs, use the :load command to load the main program module; if the library paths are set correctly, all other files required will be automatically located and loaded. When a program is loaded, type evaluate 'main' to execute it. The Swish program itself is an exception to the above: it is not designed to be run from an interactive shell. To achieve the same effect, load Swishtest into Hugs, and evaluate
runSwish "command options"
noting that the command line must be supplied as a Haskell string expression (e.g. in double quotes).
The full settings reported by my Hugs installation are:
Current settings: +fewuiRWXN -stgGl.qQkoOIHT -h5000000 -p"%s> " -r$$ -c40 Search path : -P{Hugs}\libraries;{Hugs}\libraries\HUnit; {Hugs}\libraries\Parsec;{Hugs}\libraries\Sort Project Path : Source suffixes : -S.hs;.lhs Editor setting : -E"C:\Program Files\TextPad 4\TextPad.exe" Preprocessor : -F Compatibility : Hugs Extensions (-98)
4.3.2 Installation using GHCi
Running the Swish software under GHCi is almost as easy as using Hugs. GHCi command line options used include '-fglasgow-exts' and '-iF:\Haskell\Lib\HUnit;F:\Haskell\Lib\Parsec;F:\Haskell\Lib\Sort' (adjusted according to the directories containing the library files). Working under MS-Windows, I find it convenient to create a desktop shortcut to run GHCi, specifying the Swish source directory and other options as properties of the shortcut.
To run a program in the GHCi command interpreter, follow the same procedure that is described for running a program under Hugs. The GHCi and Hugs command shells are very similar.
There is a GHCi initialization file '.ghci' that if placed in the appropriate startup directory is read automatically by GHCi and defines some convenient commands for running the non-interactive GHC compiler from within the GHCi shell.
4.3.3 Installation using GHC
MS-Windows command scripts have been prepared to compile and run the Swish software in an MS-Windows command window. It should be straightforward to create Unix equivalents using information from these. The relevant files are:
- MakeSwish.bat, to compile and link all the programs.
- ghcc.bat, a helper file used by MakeSwish.bat to compile and link a single program.
- TestSwish.bat, to run the various test programs.
The file ghcc.bat assumes a standard GHC installation, with the GHC compiler is on the current search path, and will probably need to be edited to refelct the actual locations of the support libraries used.
Once the programs have been compiled and linked, they can be run in the usual way by using entering the program name at a command prompt. The test programs do not expect any command line options and run to completion. The program Swish.exe takes command line options as descriped above.
4.4 Installation testing
A Swish installation under GHC can be tested by running the command script TestSwish.bat, and ensuring that all tests complete with zero errors. On a 1.7GHz PC running Windows 2000, the tests take a few minutes to complete.
To test the installation from an interactive shell, the test programs need to be loaded and executed individually. To confirm a successful installation, it is probably sufficient to load and run RDFGraphTest, which should complete quite quickly (about 30 seconds under Hugs on a 1.3GHz PC), then run SwishTest which takes about 20 minutes on the same system.
4.5 Additional libraries used
Swish uses some additional libraries that are not part of the swish software, but which are included with the Swish software distribution for the convenience of users.
Please note that these support libraries are distributed under their own licensing terms and conditions, which I have reproduced below where available. Please contact the respective authors for further information.
4.5.1 Parsec
Parsec [14] is a monadic parser combinator library for Haskell. I found it to be excellently documented and generally easy to use. It also serves as a useful introduction, to using monads in Haskell.
4.5.1.1 Parsec licence
Copyright 1999-2000, Daan Leijen. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
This software is provided by the copyright holders "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the copyright holders be liable for any direct, indirect, incidental, special, exemplary, or consequential damages ( including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.
4.5.2 Quicksort
Quicksort is part of a collection of sorting functions in haskell, published by Ralf Hinze [16].
At the time of writing, I can find no claim for copyright or distribution licensing terms.
4.5.3 HUnit
HUnit [15] is a unit testing framework for Haskell, loosely modelled on the JUnit framework that is popular with Java programmers.
Swish application code does not use HUnit, but the test programs do make extensive use of it.
4.5.3.1 HUnit licence
HUnit is Copyright (c) Dean Herington, 2002, all rights reserved, and is distributed as free software under the following license.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions, and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions, and the following disclaimer in the documentation and/or other materials provided with the distribution.
- The names of the copyright holders may not be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT ( INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
5. Conclusions
Swish is very much a work-in-progress, and the present release is a first step along a path with many possible options for future developments.
The immediate next step is to use the Swish framework in the construction of some special-purpose RDF inference tools. My intent is to revisit my earlier work [19], and learn how that work may be served by using Haskell in place of a packaged RDF inference tool.
The Swish code itself is far from perfect, and there is much additional functionality and improvement that can be made. But it does pass an extensive array of tests, and I believe it is sufficiently stable and functional for this initial release.
The software distribution contains a file named TODO.TXT, which lists a number of specific possible enhancements that have been identified to date.
6. Acknowledgements
I would like to thank the following, whose previous work has been most helpful to me (though, of course, they bear no responsibility for the failings of my work):
- The Haskell community [11] for not just one, but at least two, really solid Haskell language implementations, a good range of supporting libraries, and many points of useful advice offered though the Haskell mailing list.
- Dean Herington for his HUnit testing framework.
- Daan Leijen, for his Parsec library.
This document has been authored in XML using the format described in RFC 2629 [3], and converted to HTML using the XML2RFC utility developed by Marshall Rose (http://xml.resource.org/).
References
Author's Address
| | Graham Klyne | | | -------------------------- | --------------------------------------------------------------------------------- | | | Nine by Nine | | | | 14 Chambrai Close | | | | Appleford | | | | Abingdon, Oxon OX14 4NT | | | | UK | | | Phone: | +44 1235 848491 | | Fax: | +44 1235 848562 | | EMail: | GK-swish@ninebynine.org | | URI: | http://www.ninebynine.net/ |
Appendix A. Revision history
2003-05-30:
- Document initially created.
Appendix B. Unresolved issues
- See file TODO.TXT
Appendix C. CVS revision log
Log:swish−0.1.html,vLog: swish-0.1.html,v Log:swish−0.1.html,v Revision 1.3 2003/06/03 16:22:57 graham Typo fixes to web site CVS
Revision 1.6 2003/06/03 16:17:44 graham Fix another typo
Revision 1.5 2003/06/03 11:31:13 graham Fix typos in documentation
Revision 1.4 2003/05/31 00:11:21 graham Fix various typos and omissions
Revision 1.3 2003/05/30 19:12:57 graham Fixed some document typos and added RDF semantics reference
Revision 1.2 2003/05/30 18:37:25 graham First formatted version of Swish documentation
Revision 1.1 2003/05/30 16:41:22 graham Swish documentation, initial version.