(original) (raw)

\input texinfo @c -*-texinfo-*- @c %**start of header @setfilename aleph @c The line below should work, but doesnt @c @setcontentsaftertitlepage @settitle The Aleph Manual @c For double-sided printing, uncomment: @c @setchapternewpage odd @c %**end of header @set VERSION 4 and above @c @set EDITION 1 @set UPDATED October 2002 @setchapternewpage odd @c @smallbook @comment %** end of header @ifinfo @format START-INFO-DIR-ENTRY * Aleph: (aleph). The Aleph Manual. END-INFO-DIR-ENTRY @end format @end ifinfo @titlepage @title The Aleph Manual @subtitle Version @value{VERSION} @c @sp 1 @c @image{golem,,100mm} @c @sp 1 @emph{Then the rabbi said, ``Golem, you have not been completely formed, but I am about to finish you now...You will do as I will tell you.'' Saying these words, Rabbi Leib finished engraving the letter Aleph. Immediately the golem began to rise.''} From @emph{The Golem} by Isaac Bashevis Singer with illustrations by Uri Shulevitz. @author Ashwin Srinivasan @page @vskip 2pc The development of Aleph owes much to the advice and research of many people. Special thanks are due to: Michael Bain, Rui Camacho, Vitor Costa, James Cussens, Ross King, Donald Michie, Stephen Moyle, Stephen Muggleton, David Page, Mark Reid, Claude Sammut, Jude Shavlik, Jan Wielemaker, and Filip Zelezny. @end titlepage @ifinfo @node Top, , , (dir) @top @end ifinfo @ifinfo @menu * Intro:: Introduction. * Getting Started:: Getting started. * Advanced Use:: More advanced use. * Other Programs:: Related versions and programs. * Notes:: Notes * Change Logs:: Change logs and predicates used. * Parameter Index:: Index of parameters and commands. * Concept Index:: Index of concepts. @detailmenu Intro * Aleph:: How to obtain Aleph * Manual:: How to use this manual * Aleph Algorithm:: The basic Aleph algorithm. Getting Started * Load:: Loading Aleph. * Background Knowledge File:: Filestem.b * Positive Examples File:: Filestem.f * Negative Examples File:: Filestem.n * Read Input Files:: Reading in data. * Construct Theory:: Building a theory with Aleph. * Save Theory :: Save theory from Aleph. * Evaluate Theory :: Evaluate theory on data. * Example Files :: Some simple examples Background Knowledge File * Modes:: Mode declarations. * Types:: Type restrictions. * Determinations:: Determination declarations. Advanced Use * Other Settings:: Setting parameters. * Other Searches:: Changing and customising search. * Randomised Search:: Randomised search methods. * Incremental Learning:: Incremental learning. * Theory Learning:: Theory learning. * Tree Learning:: Tree learning. * Constraint Learning:: Constraint learning. * Mode Learning:: Mode learning. * Abductive Learning:: Abductive learning. * Other Commands:: Other commands. Other Searches * Search Strategy:: Changing the search strategy. * Search Function:: Changing the evaluation function. * Pruning:: Specifying domain-specific pruning. * Cost:: Specifying domain-specific costs. * Constraints:: Specifying domain-specific constraints. Notes * Aleph Choice:: On the appropriateness of Aleph. * Aleph Predicates:: On predicate-name clashes with Aleph. * Aleph Bottom Clause:: On the role of the bottom clause. * Aleph Use:: On using Aleph interactively. * Aleph Theory:: On different ways of constructing a theory. * Aleph Parameters:: On when and how the parameter settings should be changed. * Aleph Implementation:: On how the search is implemented. * Aleph Search:: On how to reduce the search space. * Aleph Examples:: On how to use fewer examples. * Aleph Portray:: On a user-defined view of hypotheses and search. * Aleph Numerical Reasoning:: On numerical reasoning with Aleph. * Aleph Applications:: On applications of Aleph. * Aleph Combine:: On using Aleph with other techniques. * Aleph GCWS:: On using Aleph to perform closed-world specialisation. * ILP Basics:: On some basic concepts in ILP @end detailmenu @end menu @end ifinfo @node Intro, Getting Started, , Top @chapter Introduction This document provides reference information on @strong{A} @strong{L}earning @strong{E}ngine for @strong{P}roposing @strong{H}ypotheses (Aleph). Aleph is an Inductive Logic Programming (ILP) system. This manual is not intended to be a tutorial on ILP. A good introduction to the theory, implementation and applications of ILP can be found in S.H. Muggleton and L. De Raedt (1994), @emph{Inductive Logic Programming: Theory and Methods}, Jnl. Logic Programming, 19,20:629--679, available at @uref{ftp://ftp.cs.york.ac.uk/pub/ML\_GROUP/Papers/lpj.ps.gz}. @* Aleph is intended to be a prototype for exploring ideas. Earlier incarnations (under the name P-Progol) originated in 1993 as part of a fun project undertaken by Ashwin Srinivasan and Rui Camacho at Oxford University. The main purpose was to understand ideas of inverse entailment which eventually appeared in Stephen Muggleton's 1995 paper: @emph{Inverse Entailment and Progol}, New Gen. Comput., 13:245-286, available at @uref{ftp://ftp.cs.york.ac.uk/pub/ML\_GROUP/Papers/InvEnt.ps.gz}. Since then, the implementation has evolved to emulate some of the functionality of several other ILP systems. Some of these of relevance to Aleph are: CProgol, FOIL, FORS, Indlog, MIDOS, SRT, Tilde, and WARMR. See @ref{Other Programs} for more details on obtaining some of these programs. @menu * Aleph:: How to obtain Aleph * Manual:: How to use this manual * Aleph Algorithm:: The basic Aleph algorithm. @end menu @node Aleph,Manual, ,Intro @section How to obtain Aleph @cindex Obtaining Aleph Aleph is written in Prolog principally for use with the Yap Prolog compiler. It should also run, albeit less efficiently, with SWI Prolog. It is maintained by Ashwin Srinivasan at the University of Oxford, and can be found at: @uref{http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/aleph.pl}. If you obtain this version, and have not already done so, then subscribe to the Aleph mailing list. You can do this by e-mailing @email{majordomo@@comlab.ox.ac.uk} with the body of the message containing the command: @strong{subscribe aleph}. This version is free for academic use (research and teaching). If you intend to use it for commercial purposes then please contact Ashwin Srinivasan (ashwin at comlab dot ox dot ac uk). @c @email{ashwin@@comlab.ox.ac.uk}. @strong{NB:} Yap is available at: @uref{http://yap.sourceforge.net/} Aleph requires Yap 4.1.15 or higher, compiled with the DEPTH_LIMIT flag set to 1 (that is, include -DDEPTH_LIMIT=1 in the compiler options). Aleph 5 requires SWI Version 5.1.10 or higher. SWI Prolog is available at: @uref{http://www.swi-prolog.org/} @node Manual,Aleph Algorithm,Aleph,Intro @section How to use this manual @cindex Manual usage @itemize @bullet @item If you are a first-time user, proceed directly to @ref{Aleph Algorithm}. @item If you have mastered the naive use of Aleph then see @ref{Advanced Use} on how to get more out of this program. You may also want to look at the @ref{Concept Index}. @item If you are familiar with idea of setting parameters, altering search methods, etc within Aleph, then see @ref{Notes} for ideas that have proved worthwhile in applications. @item If you are interested in what is new with this version, see @ref{Change Logs} for a change-log. @end itemize The Texinfo source file of this manual is available at: @uref{http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/aleph.tex} A ``Makefile'' is available for generating a variety of output formats: @uref{http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/Makefile.txt} @node Aleph Algorithm, ,Manual,Intro @section The basic Aleph algorithm @cindex Basic algorithm During routine use, Aleph follows a very simple procedure that can be described in 4 steps: @enumerate @item @strong{Select example.} Select an example to be generalised. If none exist, stop, otherwise proceed to the next step. @item @cindex Saturation @strong{Build most-specific-clause.} Construct the most specific clause that entails the example selected, and is within language restrictions provided. This is usually a definite clause with many literals, and is called the ``bottom clause.'' This step is sometimes called the ``saturation'' step. Details of constructing the bottom clause can be found in Stephen Muggleton's 1995 paper: @emph{Inverse Entailment and Progol}, New Gen. Comput., 13:245-286, available at @uref{ftp://ftp.cs.york.ac.uk/pub/ML\_GROUP/Papers/InvEnt.ps.gz}. @item @cindex Reduction @strong{Search.} Find a clause more general than the bottom clause. This is done by searching for some subset of the literals in the bottom clause that has the ``best'' score. Two points should be noted. First, confining the search to subsets of the bottom clause does not produce all the clauses more general than it, but is good enough for this thumbnail sketch. Second, the exact nature of the score of a clause is not really important here. This step is sometimes called the ``reduction'' step. @item @strong{Remove redundant.} The clause with the best score is added to the current theory, and all examples made redundant are removed. This step is sometimes called the ``cover removal'' step. Note here that the best clause may make clauses other than the examples redundant. Again, this is ignored here. Return to Step 1. @end enumerate A more advanced use of Aleph (see @ref{Advanced Use}) allows alteration to each of these steps. At the core of Aleph is the ``reduction'' step, presented above as a simple ``subset-selection'' algorithm. In fact, within Aleph, this is implemented by a (restricted) branch-and-bound algorithm which allows an intelligent enumeration of acceptable clauses under a range of different conditions. More on this can be found in @ref{Aleph Implementation}. @node Getting Started, Advanced Use, Intro, Top @chapter Getting started with Aleph @cindex Getting started with Aleph @menu * Load:: Loading Aleph. * Background Knowledge File:: Filestem.b * Positive Examples File:: Filestem.f * Negative Examples File:: Filestem.n * Read Input Files:: Reading in data. * Construct Theory:: Building a theory with Aleph. * Save Theory :: Save theory from Aleph. * Evaluate Theory :: Evaluate theory on data. * Example Files :: Some simple examples. @end menu @node Load, Background Knowledge File, , Getting Started @section Loading Aleph @cindex Loading Aleph Aleph code is contained in a single file, usually called @file{alephX.pl} (the X stands for the current version number, for example aleph4.pl refers to Version 4). To load Aleph, you will need to consult this file into your Prolog compiler, with sufficient stack and heap size (the more, the better!). Here is an example of loading Aleph into the Yap compiler, with a stack size of 5000 K bytes and heap size of 20000 K bytes: @example yap -s5000 -h20000 [ Restoring file startup ] yes ?- [aleph4]. @end example Aleph requires 3 files to construct theories. The most straightforward use of Aleph would involve: @enumerate @item Construct the 3 data files called @file{filestem.b, filestem.f, filestem.n}. See @ref{Background Knowledge File}, @ref{Positive Examples File}, and @ref{Negative Examples File}. @item Read all data using the @code{read_all(filestem)} command. See @ref{Read Input Files}. @item Construct a theory using the @code{induce} command See @ref{Construct Theory}. @end enumerate @node Background Knowledge File, Positive Examples File, Load, Getting Started @section Background knowledge file @cindex Background knowledge file All background knowledge for Aleph is contained in a file with a @strong{.b} extension. Background knowledge is in the form of Prolog clauses that encode information relevant to the domain. It can also contain any directives understood by the Prolog compiler being used (for example, @code{:- consult(someotherfile).}). This file also contains language and search restrictions for Aleph. The most basic amongst these refer to @emph{modes, types} and @emph{determinations} (see @ref{Modes}, @ref{Types}, and @ref{Determinations}). @menu * Modes:: Mode declarations. * Types:: Type restrictions. * Determinations:: Determination declarations. @end menu @node Modes,Types, , Background Knowledge File @subsection Mode declarations @cindex Mode declarations @findex mode/2 These declare the mode of call for predicates that can appear in any clause hypothesised by Aleph. They take the form: @example mode(RecallNumber,PredicateMode). @end example where @code{RecallNumber} bounds the non-determinacy of a form of predicate call, and @code{PredicateMode} specifies a legal form for calling a predicate. @code{RecallNumber} can be either (a) a number specifying the number of successful calls to the predicate; or (b) @strong{*} specifying that the predicate has bounded non-determinacy. It is usually easiest to specify @code{RecallNumber} as @strong{*}. @code{PredicateMode} is a template of the form: @example p(ModeType, ModeType,...) @end example Here are some examples of how they appear in a file: @example :- mode(1,mem(+number,+list)). :- mode(1,dec(+integer,-integer)). :- mode(1,mult(+integer,+integer,-integer)). :- mode(1,plus(+integer,+integer,-integer)). :- mode(1,(+integer)=(#integer)). :- mode(*,has_car(+train,-car)). @end example Each ModeType is either (a) @strong{simple}; or (b) @strong{structured}. A @strong{simple} ModeType is one of: (a) @strong{+T} specifying that when a literal with predicate symbol @strong{p} appears in a hypothesised clause, the corresponding argument should be an ``input'' variable of type @strong{T}; (b) @strong{-T} specifying that the argument is an ``output'' variable of type @strong{T}; or (c) @strong{#T} specifying that it should be a constant of type @strong{T}. All the examples above have simple modetypes. A @strong{structured} ModeType is of the form @strong{f(..)} where @strong{f} is a function symbol, each argument of which is either a simple or structured ModeType. Here is an example containing a structured ModeType: @example :- mode(1,mem(+number,[+number|+list])). @end example With these directives Aleph ensures that for any hypothesised clause of the form @code{H:- B1, B2, ..., Bc}: @enumerate @item @strong{Input variables.} Any input variable of type @strong{T} in a body literal Bi appears as an output variable of type @strong{T} in a body literal that appears before Bi, or appears as an input variable of type @strong{T} in H. @item @strong{Output variables.} Any output variable of type @strong{T} in H appears as an output variable of type @strong{T} in Bi. @item @strong{Constants.} Any arguments denoted by @strong{#T} in the modes have only ground terms of type @strong{T}. @end enumerate @node Types, Determinations, Modes, Background Knowledge File @subsection Type specifications @cindex Type specifications Types have to be specified for every argument of all predicates to be used in constructing a hypothesis. This specification is done within a @code{mode(...,...)} statement (see @ref{Modes}). For Aleph types are just names, and no type-checking is done. Variables of different types are treated distinctly, even if one is a sub-type of the other. @node Determinations, , Types, Background Knowledge File @subsection Determinations @cindex Determinations @findex determination/2 Determination statements declare the predicated that can be used to construct a hypothesis. They take the form: @example determination(TargetName/Arity,BackgroundName/Arity). @end example The first argument is the name and arity of the target predicate, that is, the predicate that will appear in the head of hypothesised clauses. The second argument is the name and arity of a predicate that can appear in the body of such clauses. Typically there will be many determination declarations for a target predicate, corresponding to the predicates thought to be relevant in constructing hypotheses. If no determinations are present Aleph does not construct any clauses. Determinations are only allowed for 1 target predicate on any given run of Aleph: if multiple target determinations occur, the first one is chosen. Here are some examples of how they appear in a file: @example :- determination(eastbound/1,has_car/2). :- determination(mult/3,mult/3). :- determination(p/1,'='/2). @end example @node Positive Examples File, Negative Examples File, Background Knowledge File, Getting Started @section Positive examples file @cindex Positive examples file Positive examples of a concept to be learned with Aleph are written in a file with a @strong{.f} extension. The filestem should be the same as that used for the background knowledge. The positive examples are simply ground facts. Here are some examples of how they appear in a file: @example eastbound(east1). eastbound(east2). eastbound(east3). @end example Code exists for dealing with non-ground positive examples. However, this has never been tested rigorously. @node Negative Examples File, Read Input Files, Positive Examples File, Getting Started @section Negative examples file @cindex Negative examples file Negative examples of a concept to be learned with Aleph are written in a file with a @strong{.n} extension. The filestem should be the same as that used for the background knowledge. The negative examples are simply ground facts. Here are some examples of how they appear in a file: @example eastbound(west1). eastbound(west1). eastbound(west1). @end example Non-ground constraints can be a more compact way of expressing negative information. Such constraints can be specified in the background knowledge file (see @ref{Constraints}). Aleph is capable of learning from positive examples only. This is done using a Bayesian evaluation function (see @code{posonly} in @ref{Search Function}). @node Read Input Files, Construct Theory, Negative Examples File, Getting Started @section Read all input files @cindex Reading input files @findex read_all/1 Once the @file{filestem.b, filestem.f} and @file{filestem.n} files are in place, they can be read into Aleph with the command: @example read_all(filestem). @end example Finer-grain specification of the example files can be achieved by setting the @code{train_pos} and @code{train_neg} flags (see @ref{Other Settings}). @node Construct Theory, Save Theory, Read Input Files, Getting Started @section Construct a theory @cindex Constructing a theory @findex induce/0 The basic command for selecting examples and constructing a theory is: @example induce. @end example When issued Aleph does the four steps described earlier (see @ref{Aleph Algorithm}). The result is usually a trace that lists clauses searched along with their positive and negative example coverage, like: @example eastbound(A) :- has_car(A,B). [5/5] eastbound(A) :- has_car(A,B), short(B). [5/5] eastbound(A) :- has_car(A,B), open_car(B). [5/5] eastbound(A) :- has_car(A,B), shape(B,rectangle). [5/5] @end example and the final result that looks like: @example [theory] [Rule 1] [Pos cover = 5 Neg cover = 0] eastbound(A) :- has_car(A,B), short(B), closed(B). [pos-neg] [5] @end example @code{induce} also reports the performance on the training data as a confusion matrix that looks like: @example [Training set performance] Actual + - + 5 0 5 Pred - 0 5 5 5 5 10 Accuracy = 100% @end example Performance on a test data is also reported if values for the parameters @code{test_pos} and @code{test_neg} are set (see @ref{Other Settings}). The simplest use of @code{induce} implements a simple greedy cover-set algorithm. Aleph allows you to experiment with a number of other ways of searching for answers (see @ref{Advanced Use}). @node Save Theory, Evaluate Theory, Construct Theory, Getting Started @section Save a theory @cindex Saving a theory The final theory constructed by Aleph can be saved in a file @code{FileName} using the command: @findex write_rules/1 @example write_rules(FileName). @end example Alternatively, the command: @findex write_rules/0 @example write_rules. @end example calls @code{write_rules/1} with the current setting for the parameter @code{rulefile}. @node Evaluate Theory, Example Files, Save Theory, Getting Started @section Evaluate a theory @cindex Evaluating a theory Besides automatic performance reporting, the theory constructed by Aleph can be evaluated on examples in any data file using the command: @findex test/4 @example test(File,Flag,Covered,Total) @end example @code{File} is the name of the data file containing the examples. @code{Flag} is one of @code{show} or @code{noshow} to show examples covered or otherwise. Both @code{File} and @code{Flag} have to be provided. @code{test/4} then returns the following numbers. @code{Covered} is the number of examples in the data file covered by current theory. @code{Total} is the total number of examples in the data file. @node Example Files, , Evaluate Theory, Getting Started @section Some simple examples @cindex Example files Some simple examples of Aleph usage can be found in @uref{http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/examples.zip} In each sub-directory you should find Aleph input files and, usually, a typescript of Aleph running on the data provided to accomplish some task. @node Advanced Use, Other Programs, Getting Started, Top @chapter Advanced use of Aleph Advanced use of Aleph allows modifications to each of the steps to the basic algorithm (see @ref{Aleph Algorithm}): @enumerate @item @strong{Select example.} A sample of more than 1 example can be selected (see @code{samplesize} in @ref{Other Settings}). The best clause obtained from reducing each corresponding bottom clause is then added to the theory. Alternatively, no sampling need be performed, and every example can be saturated and reduced (see @code{induce} in @ref{Other Searches}). @item @strong{Build most-specific-clause.} Bottom clauses may be constructed ``lazily'' or not at all (see @code{construct_bottom} in @ref{Other Settings}). Literals in the a bottom clause may be evaluated ``lazily'' (see @code{lazy_evaluate} in @ref{Other Commands}). Individual bottom clauses can be constructed and examined (see @code{sat} in @ref{Other Commands}). @item @strong{Search.} The search for clauses can be altered and customised to try different search strategies, evaluation functions, and refinement operators (see @ref{Other Searches}). A bottom clause can be reduced repeatedly using different search constraints (see @code{reduce} in @ref{Other Commands}). @item @strong{Remove redundant.} Examples covered may be retained to give better estimates of clause scores (see @code{induce} in @ref{Other Searches}). @end enumerate There is now some software in place that allows exploration of the following: @enumerate @item @strong{Randomised search.} The basic Aleph algorithm does a fairly standard general-to-specific search. Some variation on this is possible by the user specifying his or her own refinement operator. In other areas (satisfiability of propositional formulae, simulation of discrete events), randomised methods have proven extremely useful tools to search very large spaces. The implementation within Aleph is an adaptation of the standard randomised methods: GSAT, WSAT, RRR, and the Metropolis algorithm (a special case of simulated annealing with a fixed `temperature') (see @ref{Randomised Search} and @ref{Other Searches}). @item @strong{Incremental learning.} The basic Aleph algorithm is a ``batch'' learner in the sense that all examples and background are expected to be in place before learning commences. An incremental mode allows Aleph to acquire new examples and background information by interacting with the user (see @ref{Incremental Learning}). @item @strong{Theory learning.} The basic Aleph algorithm constructs a ``theory'' one clause at a time. This is an implementation of the greedy set-cover algorithm to the problem of identifying a set of clauses. There is some empirical and theoretical work done on on ILP of sets of clauses at once: see the work of I. Bratko and H. Midelfart in @emph{Proceedings of the Ninth International Workshop on Inductive Logic Programming (ILP'99)}, LNAI-1634. Theory learning by Aleph uses randomised search methods (see next) to search through the space of theories. It has not been tested to any significant extent (see @ref{Theory Learning}). @item @strong{Learning trees.} The basic Aleph algorithm constructs clauses using a greedy set-covering algorithm. In some sense, this can be seen as the first-order equivalent of propositional rule-learning algorithms like Clark and Niblett's CN2. There is now a substantial body of empirical work (done by researchers in Leuven and Freiburg) demonstrating the utility of first-order equivalents of propositional tree-learning procedures. Tree-based learning can be seen as a special case of theory learning and the implementation in Aleph uses the standard recursive-partitioning approach to construct classification, regression, class probability, or model trees (see @ref{Tree Learning}). @item @strong{Learning constraints.} The basic Aleph algorithm constructs definite clauses normally intended to be components of a predictive model for data. Early ILP work (in the Claudien system) demonstrated the value of discovering all non-Horn constraints that hold in a database. The implementation of these ideas in Aleph uses a naive generate-and-test strategy to enumerate all constraints within the mode language provided (see @ref{Constraint Learning}). @item @strong{Learning modes.} The basic Aleph algorithm assumes modes will be declared by the user. There has been some work (by McCreath and Sharma) on automatic extraction of mode and type information from the background knowledge provided. The implementation of these ideas in Aleph follows these ideas fairly closely (see @ref{Mode Learning}). @item @strong{Learning features.} The basic Aleph algorithm constructs a set of rules that, along with the background knowledge, entail the positive examples. Good clauses found during the search for this set of rules can be used to construct boolean features. These can then be used techniques like maximum entropy modelling, support vector machines and so on (see @ref{Feature Construction}). @end enumerate These are all at very early stages of development and therefore even less reliable than the rest of the code (probably). @menu * Other Settings:: Setting parameters. * Other Searches:: Changing and customising search. * Randomised Search:: Randomised search methods. * Incremental Learning:: Incremental learning. * Theory Learning:: Theory learning. * Tree Learning:: Tree learning. * Constraint Learning:: Constraint learning. * Mode Learning:: Mode learning. * Other Commands:: Other Commands. @end menu @node Other Settings, Other Searches, ,Advanced Use @section Setting Aleph parameters @cindex Setting parameter values @findex set/2 The @code{set/2} predicate forms the basis for setting a number of parameter values for Aleph. Parameters are set to values using: @example set(Parameter,Value) @end example @findex setting/2 The current value of a parameter is obtained using: @example setting(Parameter,Value) @end example @findex noset/0 A parameter can be un-set by using: @example noset(Parameter) @end example Meaningful @code{set/2} statements for Aleph are: @table @code @item set(abduce,+@var{V}) @findex abduce @cindex Allowing abduction @var{V} is one of: @code{true} or @code{false} (default @code{false}). If @var{V} is @code{true} then abduction and subsequent generalisation of abduced atoms is performed within the @code{induce} loop. Only predicates declared to be abducible by @code{abducible/1} are candidates for abduction. See @ref{Abductive Learning} for more details. @item set(best,+@var{V}) @findex best @cindex Setting a minimum score @var{V} is a `clause label' obtained from an earlier run. This is a list containing at least the number of positives covered, the number of negatives covered, and the length of a clause found on a previous search. Useful when performing searches iteratively. @item set(cache_clauselength,+@var{V}) @var{V} is a positive integer (default 3). Sets an upper bound on the length of clauses whose coverages are cached for future use. @item set(caching,+@var{V}) @findex caching @cindex Caching clause coverage @var{V} is one of: @code{true} or @code{false} (default @code{false}). If @code{true} then clauses and coverage are cached for future use. Only clauses upto length set by @code{cache_clauselength} are stored in the cache. @item set(check_redundant,+@var{V}) @findex check_redundant @cindex Redundancy check @var{V} is one of: @code{true} or @code{false} (default @code{false}). Specifies whether a call to @code{redundant/2} (see @ref{Other Commands}) should be made for checking redundant literals in a clause. @item set(check_useless,+@var{V}) @findex check_useless @cindex Useless literals in bottom @var{V} is one of: @code{true} or @code{false} (default @code{false}). If set to @code{true}, removes literals in the bottom clause that do not contribute to establishing variable chains to output variables in the positive literal, or produce output variables that are not used by any other literal in the bottom clause. @item set(classes,+@var{V}) @findex classes @var{V} is a list of classes to be predicted by the tree learner (see @ref{Tree Learning}). @item set(clauselength,+@var{V}) @findex clauselength @cindex Clause length restriction @var{V} is a positive integer (default 4). Sets upper bound on number of literals in an acceptable clause. @item set(clauselength_distribution,+@var{V}) @findex clauselength_distributionh @cindex Distribution over clauselengths @var{V} is a list of the form [p1-1,p2-2,...] where ``pi'' represents the probability of drawing a clause with ``i'' literals. Used by randomised search methods @xref{Randomised Search}. @item set(clauses,+@var{V}) @findex clauses @cindex Maximum clauses for theory-level search @var{V} is a positive integer. Sets upper bound on the number of clauses in a theory when performing theory-level search (see @ref{Theory Learning}). @item set(condition,+@var{V}) @findex condition @cindex Conditioning random sample @var{V} is one of: @code{true} or @code{false} (default @code{false}). If @code{true} then randomly generated examples are obtained after conditioning the stochastic generator with the positive examples. @item set(confidence,+@var{V}) @findex confidence @cindex Confidence for tree pruning @var{V} is a floating point number in the interval (0.0,1.0) (default 0.95). Determines the confidence for rule-pruning by the tree learner (see @ref{Tree Learning}). @item set(construct_bottom,+@var{V}) @findex construct_bottom @cindex Lazy bottom clause generation @var{V} is one of: @code{saturation}, @code{reduction} or @code{false} (default @code{saturation}). Specifies the stage at which the bottom clause is constructed. If @code{reduction} then it is constructed lazily during the search. This is useful if the bottom clause is too large to be constructed prior to search. This also sets the flag @code{lazy_bottom} to @code{true}. The user has to provide a refinement operator definition (using @code{refine/2}). If not, the @code{refine} parameter is set to @code{auto}. If @code{false} then no bottom clause is constructed. The user would normally provide a refinement operator definition in this case. @item set(dependent,+@var{V}) @findex dependent @cindex Dependent variable argument @var{V} is a positive integer. Denotes the argument of the dependent variable in the examples (see @ref{Tree Learning} and @ref{Feature Construction}). @item set(depth,+@var{V}) @findex depth @cindex Theorem-proving depth @var{V} is a positive integer (default 10). Sets an upper bound on the proof depth to which theorem-proving proceeds. @item set(explore,+@var{V}) @findex explore @cindex Exploratory mode @var{V} is one of: @code{true} or @code{false} (default @code{false}). If @code{true} then forces search to continue until the point that all remaining elements in the search space are definitely worse than the current best element (normally, search would stop when it is certain that all remaining elements are no better than the current best. This is a weaker criterion.) All internal pruning is turned off (see @ref{Pruning}). @item set(evalfn,+@var{V}) @findex evalfn @cindex Changing the evaluation function @var{V} is one of: @code{coverage}, @code{compression}, @code{posonly}, @code{pbayes}, @code{accuracy}, @code{laplace}, @code{auto_m}, @code{mestimate}, @code{entropy}, @code{gini}, @code{sd}, @code{wracc}, or @code{user} (default @code{coverage}). Sets the evaluation function for a search. @xref{Other Searches}. @item set(good,+@var{V}) @findex good @var{V} is one of: @code{true} or @code{false} (default @code{false}). If @code{true} then stores a Prolog encoding of ``good'' clauses found in the search. A good clause is any clause with utility above that specified by the setting for @code{minscore}. If @code{goodfile} is set to some filename then this encoding is stored externally in that file. @item set(goodfile,+@var{V}) @findex goodfile @var{V} is a Prolog atom. Sets the filename for storing a Prolog encoding of good clauses found in searches conducted to date. Any existing file with this name will get appended. @item set(gsamplesize,+@var{V}) @findex gsamplesize @cindex Random sample size @var{V} is a positive integer (default 100). The size of the randomly generated example set produced for learning from positive examples only. @xref{Other Searches}. @item set(i,+@var{V}) @findex i @cindex Variable chain depth @var{V} is a positive integer (default 2). Set upper bound on layers of new variables. @item set(interactive,+@var{V}) @findex interactive @cindex Interactive theory construction @var{V} is one of: @code{true} or @code{false} (default @code{false}). If @code{true} then constructs theories interactively with @code{induce_rules} and @code{induce_tree}. @item set(language,+@var{V}) @findex language @cindex Language restriction @var{V} is an integer >= 1 or @code{inf} (default @code{inf}). Specifies the number of occurences of a predicate symbol in any clause. @item set(lazy_on_contradiction,+@var{V}) @findex lazy_on_contradiction @cindex Lazy coverage evaluation @var{V} is one of: @code{true} or @code{false} (default @code{false}). Specifies if theorem-proving should proceed if a constraint is violated. @item set(lazy_on_cost,+@var{V}) @findex lazy_on_cost @var{V} is one of: @code{true} or @code{false} (default @code{false}). Specifies if user-defined cost-statements require clause coverages to be evaluated. This is normally not user-set, and decided internally. @item set(lazy_negs,+@var{V}) @findex lazy_negs @cindex Lazy negative coverage evaluation @var{V} is one of: @code{true} or @code{false} (default @code{false}). If @code{true} then theorem-proving on negative examples stops once bounds set by @code{noise} or @code{minacc} are violated. @item set(lookahead,+@var{V}) @findex lookahead @cindex Lookahead for refinement @var{V} is a positive integer. Sets a lookahead value for the automatic refinement operator (obtained by setting @code{refine} to @code{auto}). @item set(m,+@var{V}) @findex m @cindex M estimation @var{V} is a floating point number. Sets a value for ``m-estimate'' calculations. @xref{Search Function}. @item set(max_abducibles,+@var{V}) @findex max_abducibles @cindex Maximum number of abducibles @var{V} is a positive integer (default @code{2}). Sets an upper bound on the maximum number of ground atoms within any abductive explanation for an observation. @xref{Abductive Learning}. @item set(max_features,+@var{V}) @findex max_features @cindex Maximum number of features @var{V} is a positive integer (default @code{inf}). Sets an upper bound on the maximum number of boolean features constructed by searching for good clauses. @xref{Feature Construction} @item set(minacc,+@var{V}) @findex minacc @cindex Minimum clause accuracy @var{V} is an floating point number between 0 and 1 (default 0.0). Set a lower bound on the minimum accuracy of an acceptable clause. The accuracy of a clause has the same meaning as precision: that is, it is @var{p/(p+n)} where @var{p} is the number of positive examples covered by the clause (the true positives) and @var{n} is the number of negative examples covered by the clause (the false positives). @item set(mingain,+@var{V}) @findex mingain @cindex Minimum gain @var{V} is an floating point number (default @code{0.05}). Specifies the minimum expected gain from splitting a leaf when constructing trees. @item set(minpos,+@var{V}) @findex minpos @cindex Minimum positive coverage @var{V} is a positive integer (default 1). Set a lower bound on the number of positive examples to be covered by an acceptable clause. If the best clause covers positive examples below this number, then it is not added to the current theory. This can be used to prevent Aleph from adding ground unit clauses to the theory (by setting the value to 2). Beware: you can get counter-intuitive results in conjunction with the @code{minscore} setting. @item set(minposfrac,+@var{V}) @findex minposfrac @cindex Minimum fractional positive coverage @var{V} is a is a floating point number in the interval [0.0,1.0] (default 0.0). Set a lower bound on the positive examples covered by an acceptable clause as a fraction of the positive examples covered by the head of that clause. If the best clause has a ratio below this number, then it is not added to the current theory. Beware: you can get counter-intuitive results in conjunction with the @code{minpos} setting. @item set(minscore,+@var{V}) @findex minscore @cindex Minimum clause utility @var{V} is an floating point number (default @code{-inf}). Set a lower bound on the utility of of an acceptable clause. When constructing clauses, If the best clause has utility below this number, then it is not added to the current theory. Beware: you can get counter-intuitive results in conjunction with the @code{minpos} setting. @item set(moves,+@var{V}) @findex moves @cindex Moves for randomised search @var{V} is an integer >= 0. Set an upper bound on the number of moves allowed when performing a randomised local search. This only makes sense if @code{search} is set to @code{rls} and @code{rls_type} is set to an appropriate value. @item set(newvars,+@var{V}) @findex newvars @cindex Maximum existential variables @var{V} is a positive integer or @code{inf} (default @code{inf}). Set upper bound on the number of existential variables that can be introduced in the body of a clause. @item set(nodes,+@var{V}) @findex nodes @cindex Maximum nodes searched @var{V} is a positive integer (default 5000). Set upper bound on the nodes to be explored when searching for an acceptable clause. @item set(noise,+@var{V}) @findex noise @cindex Maximum negative coverage @var{V} is an integer >= 0 (default 0). Set an upper bound on the number of negative examples allowed to be covered by an acceptable clause. @item set(openlist,+@var{V}) @findex openlist @cindex Greedy search @cindex Beam search @var{V} is an integer >= 0 or @code{inf} (default @code{inf}). Set an upper bound on the beam-width to be used in a greedy search. @item set(optimise_clauses,+@var{V}) @findex optimise_clauses @cindex Clause optimisations @var{V} is one of: @code{true} or @code{false} (default @code{false}). If @code{true} performs query optimisations described by V.S. Costa, A. Srinivasan, and R.C. Camacho in @emph{A note on two simple transformations for improving the efficiency of an ILP system.} @item set(portray_examples,+@var{V}) @findex portray_examples @cindex Pretty printing of examples @var{V} is one of: @code{true} or @code{false} (default @code{false}). If @code{true} executes goal @code{aleph_portray(Term)} where @code{Term} is one of @code{train_pos}, @code{train_neg}, @code{test_pos}, or @code{test_neg} when executing the command @code{show(Term)}. @item set(portray_hypothesis,+@var{V}) @findex portray_hypothesis @cindex Pretty printing of hypothesis @var{V} is one of: @code{true} or @code{false} (default @code{false}). If @code{true} executes goal @code{aleph_portray(hypothesis)}. This is to be written by the user. @item set(portray_literals,+@var{V}) @findex portray_literals @cindex Pretty printing of literals @var{V} is one of: @code{true} or @code{false} (default @code{false}). If @code{true} executes goal @code{aleph_portray(Literal)} where @code{Literal} is some literal. This is to be written by the user. @item set(portray_search,+@var{V}) @findex portray_search @cindex Pretty printing of search @var{V} is one of: @code{true} or @code{false} (default @code{false}). If @code{true} executes goal @code{aleph_portray(search)}. This is to be written by the user. @item set(print,+@var{V}) @findex print @var{V} is a positive integer (default 4). Sets an upper bound on the maximum number of literals displayed on any one line of the trace. @item set(proof_strategy,+@var{V}) @findex proof_strategy @cindex Changing the proof strategy @var{V} is one of: @code{restricted_sld} or @code{sld} (default @code{restricted_sld}). If @code{restricted_sld}, then examples covered are determined by forcing current hypothesised clause to be the first parent clause in a SLD resolution proof. If @code{sld} then this restriction is not enforced. The former strategy is efficient, but not refutation complete. It is sufficient if all that is needed is to determine how many examples are covered by the current clause, which is what is needed when Aleph is used to construct a set of non-recursive clauses greedily (for example using the @code{induce/0} command: see @ref{Construct Theory}). @item set(prooftime,+@var{V}) @findex prooftime @cindex Changing the proof time @var{V} is a positive integer or @code{inf} (default @code{inf}). Sets an upper bound on the time (in seconds) for testing whether an example is covered. Overrides any value set for @code{searchtime}. @item set(prune_tree,+@var{V}) @findex prune_tree @cindex Pruning for tree learning @var{V} is is one of: @code{true} or @code{false} (default @code{false}). Determines whether rules constructed by the tree learner are subject to pessimistic pruning (see @ref{Tree Learning}). @item set(record,+@var{V}) @findex record @cindex Writing trace to a file @var{V} is one of: @code{true} or @code{false} (default @code{false}). If @code{true} then trace of Aleph execution is written to a file. The filename is given by @code{recordfile}. @item set(recordfile,+@var{V}) @findex recordfile @var{V} is a Prolog atom. Sets the filename to write a trace of execution. Only makes sense if @code{record} is set to @code{true}. @item set(refine,+@var{V}) @findex refine @cindex Refinement operator types @var{V} is one of: @code{user}, @code{auto}, or @code{false} (default @code{false}). Specifies the nature of the customised refinement operator. In all cases, the resulting clauses are required to subsume the bottom clause, if one exists. If @code{false} then no customisation is assumed and standard operation results. If @code{user} then the user specifies a domain-specific refinement operator with @code{refine/2} statements. If @code{auto} then an automatic enumeration of all clauses in the mode language (see @ref{Modes}) is performed. The result is a breadth-first branch-and-bound search starting from the empty clause. This is useful if a bottom clause is either not constructed or is constructed lazily. No attempt is made to ensure any kind of optimality and the same clauses may result from several different refinement paths. Some rudimentary checking can be achieved by setting @code{caching} to @code{true}. The user has to ensure the following for @code{refine} is set to @code{auto}: (1) the setting to @code{auto} is done after the modes and determinations commands, as these are used to generate internally a set of clauses that allow enumeration of clauses in the language; (2) all arguments that are annotated as @strong{#T} in the modes contain generative definitions for type @strong{T}. These are called be the clauses generated internally to obtain the appropriate constants; and (3) the head mode is clearly specified using the @code{modeh} construct. @item set(rls_type,+@var{V}) @findex rls_type @cindex Randomised search types @var{V} is one of: @code{gsat}, @code{wsat}, @code{rrr}, or @code{anneal}. Sets the randomised search method to be one of GSAT, WSAT, RRR or simulated annealing. Requires @code{search} to be set to @code{rls}, and integer values for @code{tries} and @code{moves}. @xref{Randomised Search}. @item set(rulefile,+@var{V}) @findex rulefile @var{V} is a Prolog atom. Sets the filename for storing clauses found in theory (used by @code{write_rules/0}). @item set(samplesize,+@var{V}) @findex samplesize @cindex Samples greater than 1 @var{V} is an integer >= 0 (default 0). Sets number of examples selected randomly by the @code{induce} or @code{induce_cover} commands. The best clause from the sample is added to the theory. A value of 0 turns off random sampling, and the next uncovered example in order of appearance in the file of training examples is selected. @item set(scs_percentile,+@var{V}) @findex scs_percentile @cindex Good clauses in stochastic clause selection @var{V} is an number in the range (0,100] (usually an integer). This denotes that any clause in the top @var{V}-percentile of clauses are considered ``good'' when performing stochastic clause selection. Only meaningful if @code{search} is set to @code{scs}. @item set(scs_prob,+@var{V}) @findex scs_prob @cindex Probability of selecting a good clause in stochastic clause selection @var{V} is an number in the range [0,1.0). This denotes the minimum probability of obtaining a ``good'' clause when performing stochastic clause selection. Only meaningful if @code{search} is set to @code{scs}. @item set(scs_sample,+@var{V}) @findex scs_sample @cindex Clauses sampled in stochastic clause selection @var{V} is a positive integer that determines the number of clauses randomly selected from the hypothesis space in a clause-level search. Only meaningful if @code{search} is set to @code{scs}. his overrules any samplesizes calculated from settings for @code{scs_percentile} and @code{scs_prob}. @item set(search,+@var{V}) @findex search @cindex Changing the search @var{V} is one of: @code{bf}, @code{df}, @code{heuristic}, @code{ibs}, @code{ils}, @code{rls}, @code{scs} @code{id}, @code{ic}, or @code{ar} (default @code{bf}). Sets the search strategy. @xref{Other Searches}. @item set(searchtime,+@var{V}) @findex searchtime @cindex Time bounded search @var{V} is an integer >= 0 or @code{inf} (default @code{inf}). Sets an upper bound on the time (in seconds) for a search. @item set(skolemvars,+@var{V}) @findex skolemvars @cindex Skolem variable numbering in examples @var{V} is an integer (default 10000). Sets the counter for variables in non-ground positive examples. Each variable will be replaced by a skolem variable that has a unique number which is no smaller than @var{V}. This number has to be larger than the number of variables that would otherwise appear in a bottom clause. @item set(splitvars,+@var{V}) @findex splitvars @var{V} is one of: @code{true} or @code{false} (default @code{false}). If set to @code{true} before constructing a bottom clause, then variable co-references in the bottom clause are split apart by new variables. The new variables can occur at input or output positions of the head literal, and only at output positions in body literals. Equality literals between new and old variables are inserted into the bottom clause to maintain equivalence. It may also result in variable renamed versions of other literals being inserted into the bottom clause. All of this increases the search space considerably and can make the search explore redundant clauses. The current version also elects to perform variable splitting whilst constructing the bottom clause (in contrast to doing it dynamically whilst searching). This was to avoid unnecessary checks that could slow down the search when variable splitting was not required. This means the bottom clause can be extremely large, and the whole process is probably not very practical for large numbers of co-references. The procedure has not been rigourously tested to quantify this. @item set(stage,+@var{V}) @findex stage @var{V} is one of: @code{saturation}, @code{reduction} or @code{command} (default @code{command}). Sets the stage of current execution. This is normally not user-set, and decided internally. @item set(store_bottom,+@var{V}) @findex store_bottom @var{V} is one of: @code{true} or @code{false} (default @code{false}). Stores bottom clause constructed for an example for future re-use. @item set(temperature,+@var{V}) @findex temperature @cindex Temperature for simulated annealing @var{V} is a non-zero floating point number. Sets the temperature for randomised search using annealing. Requires @code{search} to be set to @code{rls} and @code{rls_type} to be set to @code{anneal}. @item set(test_pos,+@var{V}) @findex test_pos @cindex Positive examples for testing @var{V} is a Prolog atom or a list of Prolog atoms. Sets the filename or list of filenames containing the positive examples for testing. No filename extensions are assumed and complete filenames have to be provided. @item set(test_neg,+@var{V}) @findex test_neg @cindex Negative examples for testing @var{V} is a Prolog atom or a list of Prolog atoms. Sets the filename or list of filenames containing the negative examples for testing. No filename extensions are assumed and complete filenames have to be provided. @item set(threads,+@var{V}) @findex threads @cindex Number of concurrent threads @var{V} is an integer >= 1 (default 1). This is experimental and should not be changed from the default value until further notice. @item set(train_pos,-@var{V}) @findex train_pos @cindex Positive examples for training @var{V} is a Prolog atom or a list of Prolog atoms. Sets the filename or list of filenames containing the positive examples. If set, no filename extensions are assumed and complete filenames have to be provided. If not set, it is internally assigned a value after the @code{read_all} command. @item set(train_neg,-@var{V}) @findex train_neg @cindex Negative examples for training @var{V} is a Prolog atom or a list of Prolog atoms. Sets the filename or list of filenames containing the negative examples. If set, no filename extensions are assumed and complete filenames have to be provided. If not set, it is internally assigned a value after the @code{read_all} command. @item set(tree_type,+@var{V}) @findex tree_type @cindex Tree type @var{V} is one of @code{classification}, @code{class_probability}, @code{regression}, or @code{model} (see (see @ref{Tree Learning}). @item set(tries,+@var{V}) @findex tries @cindex Restarts for randomised search @var{V} is a positive integer. Sets the maximum number of restarts allowed for randomised search methods. This only makes sense if @code{search} is set to @code{rls} and @code{rls_type} is set to an appropriate value. @item set(typeoverlap,+@var{V}) @findex typeoverlap @var{V} is a floating point number in the interval (0.0,1.0]. Used by @code{induce_modes/0} to determine if a pair of different types should be given the same name. See @ref{Mode Learning} for more details. @item set(uniform_sample,+@var{V}) @findex uniform_sample @cindex Uniform sampling from clause space @var{V} is one of: @code{true} or @code{false} (default @code{false}). Used when drawing clauses randomly from the clause-space. If set set to @code{true} then clauses are drawn by uniform random selection from the space of legal clauses. Since there are usually many more longer clauses than shorter ones, this will mean that clauses drawn randomly are more likely to be long ones. If set to @code{false} then assumes a uniform distribution over clause lengths (up to the maximum length allowed by @code{clauselength}). This is not necessarily uniform over legal clauses. If random clause selection is done without a bottom clause for guidance then this parameter is set to @code{false}. @item set(updateback,+@var{V}) @findex updateback @cindex Adding induced clauses to background @var{V} is one of: @code{true} or @code{false} (default @code{true}). If @code{false} then clauses found by the @code{induce} family are not incorporated into the background. This is experimental. @item set(verbosity,+@var{V}) @findex verbosity @findex verbose @cindex Verbosity @var{V} is an integer >= 0 (default 1). Sets the level of verbosity. Also sets the parameter @code{verbose} to the same value. A value of 0 shows very little. @item set(version,-@var{V}) @findex version @cindex Version @var{V} is the current version of Aleph. This is set internally. @item set(walk,+@var{V}) @findex walk @cindex Random walk probability for Walksat @var{V} is a value between @code{0} and @code{1}. It represents the random walk probability for the Walksat algorithm. @item set(+@var{P},+@var{V}) Sets any user-defined parameter @var{P} to value @var{V}. This is particularly useful when attaching notes with particular experiments, as all settings can be written to a file (see @code{record}). For example, @code{set(experiment,'Run 1 with background B0')}. @end table @node Other Searches, Randomised Search, Other Settings, Advanced Use @section Altering the search @cindex Search commands and options Aleph allows the basic procedure for theory construction to be altered in a number of ways. Besides the @code{induce} command, there are several other commands that can be used to construct a theory. The @code{induce} family of commands are: @enumerate @item @code{induce/0}. This has already been described in detail previously (see @ref{Construct Theory}); @item @code{induce_cover/0}. @findex induce_cover/0 This command is very similar to @code{induce}. The only difference is that positive examples covered by a clause are not removed prior to seeding on a new (uncovered) example. After a search with @code{induce_cover} Aleph only removes the the examples covered by the best clause are removed from a pool of seed examples only. After this, a new example or set of examples is chosen from the seeds left, and the process repeats. The theories returned by @code{induce} and @code{induce_cover} are dependent on the order in which positive examples are presented; @item @code{induce_max/0}. @findex induce_max/0 The theory returned by this command is unaffected by the ordering of positive examples. This is because it saturates and reduces every example. The search is made more efficient by remembering the coverage of the best clause obtained so far for each example being generalised. Both @code{induce_cover} and @code{induce_max} are slower than @code{induce}, and usually produce clauses with a great deal of overlap in coverage. A separate program will have to be used to find some subset of these clauses that minimises this overlap (see @emph{T-Reduce} in @ref{Other Programs}). @item @code{induce_incremental/0}. @findex induce_incremental/0 This command constructs a theory in an incremental mode: the user is allowed to update the examples and background knowledge. This mode of learning is described further in @ref{Incremental Learning}. @item @code{induce_clauses/0}. @findex induce_clauses/0 This command is simply @code{induce/0} or @code{induce_incremental/0} depending on whether the flag @code{interactive} is @code{false} or @code{true}. @item @code{induce_theory/0}. @findex induce_theory/0 This command abandons the clause-by-clause approach to theory construction. Instead, search is done at the theory-level. This is untested and the current implementation should not be considered definitive. See @ref{Theory Learning} for more details. @item @code{induce_tree/0}. @findex induce_tree/0 This command abandons the clause-by-clause approach to theory construction. Instead, search is done by constructing a tree using the standard recursive-partitioning approach. See @ref{Tree Learning} for more details. @item @code{induce_constraints/0}. @findex induce_constraints/0 This command abandons the search for predictive clauses. Instead, search results in all constraints that hold within the background knowledge provided. See @ref{Constraint Learning} for more details. @item @code{induce_modes/0}. @findex induce_modes/0 This command searches for a mode and type assignment that is consistent with the background knowledge provided. See @ref{Mode Learning} for more details. @item @code{induce_features/0}. @findex induce_features/0 This command searches for boolean features given the examples and the background knowledge. See @ref{Feature Construction} for more details. @end enumerate The search for individual clauses (when performed) is principally affected by two parameters. One sets the search strategy (@code{search}) and the other sets the evaluation function (@code{evalfn}). @menu * Search Strategy:: Changing the search strategy. * Search Function:: Changing the evaluation function. * Pruning:: Specifying domain-specific pruning. * Cost:: Specifying domain-specific costs. * Constraints:: Specifying domain-specific constraints. * Refinements:: Specifying a domain-specific refinement operator. @end menu @node Search Strategy, Search Function, , Other Searches @subsection Search strategies @cindex Search strategies A search strategy is set using @code{set(search,Strategy)}. The following search strategies apply to the clause-by-clause searches conducted by Aleph: @table @code @item ar @findex ar search @cindex Association rule search Implements a simplified form of the type of association rule search conducted by the WARMR system (see L. Dehaspe, 1998, PhD Thesis, Katholieke Universitaet Leuven). Here, Aleph simply finds all rules that cover at least a pre-specified fraction of the positive examples. This fraction is specified by the parameter @code{pos_fraction}. @item bf @findex bf search @cindex Breadth-first search strategy Enumerates shorter clauses before longer ones. At a given clauselength, clauses are re-ordered based on their evaluation. This is the default search strategy; @item df @findex df search @cindex Depth-first search strategy Enumerates longer clauses before shorter ones. At a given clauselength, clauses are re-ordered based on their evaluation. @item heuristic @findex heuristic search @cindex Heuristic search strategy Enumerates clauses in a best-first manner. @item ibs @findex ibs search @cindex Iterative beam search strategy Performs an iterative beam search as described by Quinlan and Cameron-Jones, IJCAI-95. Limit set by value for @code{nodes} applies to any 1 iteration. @item ic @findex ic search @cindex Integrity constraints search Performs search for integrity constraints. Used by @code{induce_constraints} (see @ref{Constraint Learning}) @item id @findex id search @cindex Iterative deepening search Performs an iterative deepening search up to the maximum clause length specified. @item ils @findex ils search @cindex Iterative language search strategy An iterative @code{bf} search strategy that, starting from 1, progressively increases the upper-bound on the number of occurrences of a predicate symbol in any clause. Limit set by value for @code{nodes} applies to any 1 iteration. This language-based search was developed by Rui Camacho and is described in his PhD thesis. @item rls @findex rls search @cindex Randomised search Use of the GSAT, WSAT, RRR and simulated annealing algorithms for search in ILP. The choice of these is specified by the parameter @code{rls_type} (see @ref{Other Settings}). GSAT, RRR, and annealing all employ random multiple restarts, each of which serves as the starting point for local moves in the search space. A limit on the number of restarts is specified by the parameter @code{tries} and that on the number of moves by @code{moves}. Annealing is currently restricted to a using a fixed temperature, making it equivalent to an algorithm due to Metropolis. The temperature is specified by setting the parameter @code{temperature}. The implementation of WSAT requires a ``random-walk probability'', which is specified by the parameter @code{walk}. A walk probability of 0 is equivalent to GSAT. More details on randomised search can be found in @ref{Randomised Search}. @item scs @findex scs search @cindex Stochastic clause selection A special case of GSAT that results from repeated random selection of clauses from the hypothesis space. The number of clauses is either set by @code{scs_sample} or is calculated from the settings for @code{scs_prob} and @code{scs_percentile}. These represent: the minimum probability of selecting a ``good'' clause; and the meaning of a ``good'' clause, namely, that it is in the top K-percentile of clauses. This invokes GSAT search with @code{tries} set to the sample size and @code{moves} set to 0. Clause selection can either be blind or informed by some preliminary Monte-Carlo style estimation. This is controlled by @code{scs_type}. More details can be found in @ref{Randomised Search}. @end table @node Search Function, Pruning, Search Strategy, Other Searches @subsection Evaluation functions @cindex Evaluation functions An evaluation function is set using @code{set(evalfn,Evalfn)}. The following clause evaluation functions are recognised by Aleph: @table @code @item accuracy @findex accuracy Clause utility is @code{P/(P+N)}, where @code{P}, @code{N} are the number of positive and negative examples covered by the clause. @item auto_m @findex m estimate (automatic m setting) Clause utility is the m estimate (see @code{mestimate} below) with the value of @code{m} automatically set to be the maximum likelihood estimate of the best value of @code{m}. @item compression @findex compression Clause utility is @code{P - N - L + 1}, where @code{P}, @code{N} are the number of positive and negative examples covered by the clause, and @code{L} the number of literals in the clause. @item coverage @findex coverage Clause utility is @code{P - N}, where @code{P}, @code{N} are the number of positive and negative examples covered by the clause. @item entropy @findex entropy Clause utility is @code{p log p + (1-p) log (1-p)} where @code{p = P/(P + N)} and @code{P}, @code{N} are the number of positive and negative examples covered by the clause. @item gini @findex gini Clause utility is @code{2p(1-p)} where @code{p = P/(P + N)} and @code{P}, @code{N} are the number of positive and negative examples covered by the clause. @item laplace @findex laplace Clause utility is @code{(P+1)/(P+N+2)} where @code{P}, @code{N} are the positive and negative examples covered by the clause. @item mestimate @findex m estimate (user set m) Clause utility is its m estimate as described in S. Dzeroski and I. Bratko (1992), @emph{Handling Noise in Inductive Logic Programming}, Proc. Second Intnl. Workshop on Inductive Logic Programming, ICOT-TM-1182, Inst. for New Gen Comput Technology, Japan. The value of @code{m} is set by @code{set(m,M)}. @item pbayes @findex pbayes (pseudo-Bayes estimate) Clause utility is the pseudo-Bayes conditional probability of a clause described in J. Cussens (1993), @emph{Bayes and Pseudo-Bayes Estimates of Conditional Probability and their Reliability}, ECML-93, Springer-Verlag, Berlin. @item posonly @findex posonly (positive only evaluation) @cindex Positive-only learning Clause utility is calculated using the Bayesian score described in S. H. Muggleton, (1996), @emph{Learning from positive data}, Proc. Sixth Intnl. Workshop on Inductive Logic Programming, LNAI 1314, 358-376, Springer-Verlag, Berlin. Note that all type definitions are required to be generative for this evaluation function and a @code{modeh} declaration is necessary. @item sd @findex sd Clause utility is related to the standard deviation of values predicted. This is only used when constructing regression trees and is not available for use during clause-based search. @item user @findex user (cost function) Clause utility is @code{-C}, where @code{C} is the value returned by a user-defined cost function. @xref{Cost}. @item wracc @findex wracc @cindex Weighted relative accuracy. Clause utility is calculated using the weighted relative accuracy function described by N. Lavrac, P. Flach and B. Zupan, (1999), @emph{Rule Evaluation Measures: a Unifying View}, Proc. Ninth Intnl. Workshop on Inductive Logic Programming, LNAI 1634, 174-185, Springer-Verlag, Berlin. @end table @node Pruning, Cost, Search Function, Other Searches @subsection Built-in and user-defined pruning @cindex Pruning Two sorts of pruning can be distinguished within Aleph when performing a clause-level search. Internal pruning refers to built-in pruning that performs admissible removal of clauses from a search. This is currently available for the following evaluation functions: auto_m, compression, coverage, laplace, mestimate, posonly, and wracc. User-defined prune statements can be written to specify the conditions under which a user knows for certain that a clause (or its refinements) could not possibly be an acceptable hypothesis. Such clauses are pruned from the search. The "prune" definition is written in the background knowledge file (that has extension @file{.b}). The definition is distinguished by the fact that they are all rules of the form: @findex prune/1 @example prune((ClauseHead:-ClauseBody)) :- Body. @end example The following example is from a pharmaceutical application that states that every extension of a clause representing a "pharmacophore" with six "pieces" is unacceptable, and that the search should be pruned at such a clause. @example prune((Head:-Body)) :- violates_constraints(Body). violates_constraints(Body) :- has_pieces(Body,Pieces), violates_constraints(Body,Pieces). violates_constraints(Body,[_,_,_,_,_,_]). has_pieces(...) :- @end example The use of such pruning can greatly improve Aleph's efficiency. They can be seen as a special case of providing distributional information about the hypothesis space. @node Cost, Constraints, Pruning, Other Searches @subsection User-defined cost specification @cindex Cost specification The use of a user-specified cost function is a fundamental construct in statistical decision theory, and provides a general method of scoring descriptions. Aleph allows the specification of the cost of a clause. The cost statements are written in the background knowledge file (that has extension @file{.b}), and are distinguished by the fact that they are all rules of the form: @findex cost/3 @example cost(Clause,ClauseLabel,Cost):- Body. @end example where @code{ClauseLabel} is the list @code{[P,N,L]} where @code{P} is the number of positive examples covered by the clause, @code{N} is the number of negative examples covered by the clause @code{L} is the number of literals in the clause. It is usually not possible to devise automatically admissible pruning strategies for an arbitrary cost function. Thus, when using a user-defined cost measure, Aleph places the burden of specifying a pruning strategy on the user. @node Constraints, Refinements, Cost, Other Searches @subsection User-defined constraints @cindex Constraint specification Aleph accepts integrity constraints that should not be violated by a hypothesis. These are written in the background knowledge file (that has extension @file{.b}) and are similar to the integrity constraints in the ILP programs Clint and Claudien. The constraints are distinguished by the fact that they are all rules of the form: @findex false/0 @example false:- Body. @end example where @code{Body} is a set of literals that specify the condition(s) that should not be violated by a clause found by Aleph. It is usual to use the @code{hypothesis/3} (see @ref{Other Commands}) command to obtain the clause currently being considered by Aleph. The following example is from a pharmaceutical application that states that hypotheses are unacceptable if they have fewer than three "pieces" or which do not specify the distances between all pairs of pieces. @findex false/0 @example false:- hypothesis(Head,Body,_), has_pieces(Body,Pieces), length(Pieces,N), N =< 2. false:- hypothesis(_,Body,_), has_pieces(Body,Pieces), incomplete_distances(Body,Pieces). @end example The use of constraints is another way for Aleph to obtain interesting hypothesis without negative examples. Ordinarily, this will result in a single clause that classifies every example as positive. Such clauses can be precluded by constraints. Note also that an integrity constraint does not state that a refinement of a clause that violates one or more constraints will also be unacceptable. When constructing clauses in an incremental mode, Aleph can be instructed to add a special type of constraint to prevent the construction of overly general clauses (see @ref{Incremental Learning}). @node Refinements, , Constraints, Other Searches @subsection User-defined refinement @cindex Refinement operator specification Aleph allows a method of specifying the refinement operator to be used in a clause-level search. This is done using a Prolog definition for the predicate @code{refine/2}. The definition specifies the transitions in the refinement graph traversed in a search. The "refine" definition is written in the background knowledge file (that has extension ".b"). The definition is distinguished by the fact that they are all rules of the form: @findex refine/2 @example refine(Clause1,Clause2):- Body. @end example This specifies that Clause1 is refined to Clause2. The definition can be nondeterministic, and the set of refinements for any one clause are obtained by repeated backtracking. For any refinement Aleph ensures that Clause2 implies the current most specific clause. Clause2 can contain cuts (``!'') in its body. The following example is from a pharmaceutical application that states that searches for a "pharmacophore" that consists of 4 "pieces" (each piece is some functional group), and associated distances in 3-D space. Auxilliary definitions for predicates like member/2 and dist/5 are not shown. representing a "pharmacophore" with six "pieces" is unacceptable, and that the search should be pruned at such a clause. @example refine(false,active(A)). refine(active(A),Clause):- member(Pred1,[hacc(A,B),hdonor(A,B),zincsite(A,B)]), member(Pred2,[hacc(A,C),hdonor(A,C),zincsite(A,C)]), member(Pred3,[hacc(A,D),hdonor(A,D),zincsite(A,D)]), member(Pred4,[hacc(A,E),hdonor(A,E),zincsite(A,E)]), Clause = (active(A):- Pred1, Pred2, dist(A,B,C,D1,E1), Pred3, dist(A,B,D,D2,E2), dist(A,C,D,D3,E3), Pred4, dist(A,B,E,D4,E4), dist(A,C,E,D5,E5), dist(A,D,E,D6,E6)). @end example To invoke the use of such statements requires setting @code{refine} to @code{user}. For other settings of @code{refine} see entry for @code{refine} in @ref{Other Settings}. @node Randomised Search, Incremental Learning, Other Searches, Advanced Use @section Randomised search methods @cindex Randomised search The simplest kind of randomised search is the following: sample N elements (clauses or theories) from the search space. Score these and return the best element. Ordinal optimisation is a technique that investigates the loss in optimality resulting from this form of search. See: @uref{http://hrl.harvard.edu/people/faculty/ho/DEDS/OO/OOTOC.html} A study of the use of this in ILP can be found in: A. Srinivasan, @emph{A study of two probabilistic methods for searching large spaces with ILP} (under review), available at: @uref{ftp://ftp.comlab.ox.ac.uk/pub/Packages/ILP/Papers/AS/dami99.ps.gz} For a clause-level search, this is invoked by setting the parameter @code{search} to @code{scs} (to denote ``stochastic clause selection''). The number N is either set by assigning a value to @code{scs_sample} or calculated automatically from settings for @code{scs_prob} and @code{scs_percentile}. If these values are denoted ``P'' and ``K'' respectively, then the sample size is calculated to be @code{log(1-P)/log(1-K/100)}, which denotes the number of clauses that have to be sampled before obtaining, with probability at least P, at least one clause in the top K-percentile of clauses Sampling is further controlled by by specifying the setting @code{scs_type} to be one of @code{blind} or @code{informed}. If ``blind'' then clauses are uniform random selections from the space of all legal clauses. If ``informed'' then they are drawn from a specific distribution over clauselengths. This can either be pre-specified (by setting @code{clauselength_distribution}) or obtained automatically by a Monte-Carlo like scheme that attempts to estimate, for each clause length, the probablity of obtaining a clause in the top K-percentile. In either case, the resulting distribution over clauselengths is used to first decide on the number of literals ``l'' in the clause. A legal clause with ``l'' literals is then constructed. In fact, this simple randomised search is a degenerate form of a more general algorithm known as GSAT. Originally proposed within the context of determining satisfiability of propositional formulae, the basic algorithm is as follows: @example currentbest:= 0 (@strong{comment}: ``0'' is a conventional default answer) @strong{for} i = 1 to N @strong{do} current:= randomly selected starting point @strong{if} current is better than currenbest @strong{then} currentbest:= current @strong{for} j = 1 to M @strong{do begin} next:= best local move from current @strong{if} next is better than currenbest @strong{then} currentbest:= next current:= next @strong{end} @strong{return} currentbest @end example @code{N} and @code{M} represent the number of tries and moves allowed. It is apparent that when searching for clauses, a @code{M} value of 0 will result in the algorithm mimicking stochastic clause selection as described above. A variant of this algorithm called Walksat introduces a further random element at the point of selecting @code{next}. This time, a biased coin is flipped. If a ``head'' results then the choice is as per GSAT (that is, the best choice amongst the local neighbours), otherwise @code{next} is randomly assigned to one of any ``potentially good'' neighbours. Potentially good neighbours are those that @emph{may} lead to a better score than the current best score. This is somewhat like simulated annealing, where the choice is the best element if that improves on the best score. Otherwise, the choice is made according to a function that decays exponentially with the difference in scores. This exponential decay is usually weighted by a ``temperature'' parameter. The randomly selected start clause is usually constructed as follows: (1) an example is selected; (2) the bottom clause is constructed for the example; (3) a legal clause is randomly drawn from this bottom clause. The example may be selected by the user (using the @code{sat} command). If bottom clauses are not allowed (by setting @code{construct_bottom} to @code{false}) then legal clauses are constructed directly from the mode declarations. The clause selected is either the result of uniform random selection from all legal clauses, or the result of a specific distribution over clauselengths (specified by setting @code{clauselength_distribution}). The latter is the only method permitted when bottom clauses are not allowed. (In that case, if there is no value specified for @code{clauselength_distribution}, then a uniform distribution over all allowable lengths is used.) RRR refers to the `randomised rapid restarts' as described by F. Zelezny, A. Srinivasan, and D. Page in @emph{Lattice Search Runtime Distributions May Be Heavy-Tailed} available at: @uref{ftp://ftp.comlab.ox.ac.uk/pub/Packages/ILP/Papers/AS/rrr.ps.gz} In the current implementation, RRR stops as soon as a clause with an requisite minimum positive coverage (set using @code{minpos}) and acceptable utility is reached (set using @code{minscore}). The procedure in the paper above stops as soon as a minimum acceptable accuracy is reached. This same effect can be achieved by setting @code{evalfn} to @code{accuracy}. It is intended that the randomised local search methods (GSAT, Walksat, RRR and annealing) can be used either for clause-level search or theory-level search. No equivalent of stochastic clause selection is provided for theory-level search: this has to be mimicked by using the randomised local search, with appropriate settings. At the clause level, local moves involve either adding or deleting a literal from the current clause. Normally, local moves in the clause-space would also involve operations on variables (introducing or removing variable co-references, associating or disassociating variables to constants). These have to accomplished within Aleph by the inclusion of an equality predicate with appropriate mode declarations. Local moves for a theory-level search are described in @ref{Theory Learning}. Randomised local search is invoked within Aleph by setting the parameter @code{search} to @code{rls}. In addition, the type of search is specified by setting @code{rls_type} to one of @code{gsat}, @code{wsat}, @code{rrr} or @code{anneal}. Walksat requires a specification of a biased coin. This is done by setting the parameter @code{walk} to a number between @code{0} and @code{1}. This represents an upper bound on the probability of obtaining a ``tail'' with the coin. The implementation of simulated annealing is very simple and uses a fixed temperature. This is done by setting the parameter @code{temperature} to some real value. @node Incremental Learning, Theory Learning, Randomised Search, Advanced Use @section Incremental construction of theories @cindex Incremental Learning Most prominent ILP systems are ``batch learners'': all examples and background knowledge are in place @emph{before} learning commences. The ILP system then constructs a hypothesis for the examples. A less popular, but nevertheless interesting alternative is that of ``incremental learning'', where examples, background and hypothesis are incrementally updated @emph{during} the course of learning. Aleph allows such an incremental construction of clauses by typing: @example induce_incremental. @end example This results in Aleph repeatedly performing the following steps: @enumerate @item @strong{Ask user for an example.} The default is to use a new positive example from previous search. If the user responds with Ctrl-d (eof) then search stops. If the user responds with ``ok.'' then default is used; otherwise the user has to provide a new example (terminated by a full-stop); @item @strong{Construct bottom clause for example.} Aleph thus expects the appropriate mode declarations. These can be added in Step 4; @item @strong{Search.} Aleph searches for the best clause; @item @strong{Ask user about best clause.} Aleph asks the user about the clause @emph{C} returned by the search. At this point the user can respond with: @itemize @bullet @item @strong{ok.} Clause @emph{C} is added to the hypothesis; @item @strong{prune.} Statement added to prevent @emph{C} and any clauses subsumed by it from appearing as the result of future searches; @item @strong{overgeneral.} Constraint added to prevent @emph{C} and clauses subsuming it from appearing as the result of future searches; @item @strong{overgeneral because not E.} @strong{E} is added as a negative example; @item @strong{overspecific.} @emph{C} is added as a positive example; @item @strong{overspecific because E.} @strong{E} is added as a positive example; @item @strong{X.} @strong{X} is any Aleph command. This can be something like @code{covers} or @code{mode(*,has_car(+train,-car))}; @item @strong{Ctrl-d.} Returns to Step 1. @end itemize @end enumerate Note: the command @code{induce_clauses/0} with the flag @code{interactive} set to @code{true} simply performs the same function as @code{induce_incremental}. The incremental mode does not preclude the use of prior sets of examples or background information. These are provided in the usual way (in files with @code{.b}, @code{.f} and @code{.n} suffixes). An example of using the incremental learner to construct a program for list membership can be found in the @code{incremental} sub-directory in: @uref{http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/examples.zip} @node Theory Learning, Tree Learning, Incremental Learning, Advanced Use @section Theory-level search @cindex Theory-level search An adequate explanation for a set of examples typically requires several clauses. Most ILP systems attempt to construct such explanations one clause at a time. The procedure is usually an iterative greedy set-covering algorithm that finds the best single clause (one that explains or ``covers'' most unexplained examples) on each iteration. While this has been shown to work satisfactorily for most problems, it is nevertheless interesting to consider implementations that attempt to search directly at the ``theory-level''. In other words, elements of the search space are sets of clauses, each of which can be considered a hypothesis for all the examples. The implementation in Aleph of this idea is currently at a very rudimentary level, and preliminary experiments have not demonstrated great benefits. Nevertheless, the approach, with development, could be promising. The implementation within Aleph is invoked by the command: @example induce_theory. @end example This conducts a search that moves from one set of clauses to another. Given a clause set @emph{S} local moves are the result of the following: @enumerate @item @strong{Add clause.} A clause is added to @emph{S}. This is usually a randomly selected legal clause constructed in the manner described in @ref{Randomised Search}; @item @strong{Delete clause.} A clause is deleted from @emph{S}; @item @strong{Add literal.} A literal is added to a clause in @emph{S}; and @item @strong{Delete literal.} A literal is deleted from a clause in @emph{S}. @end enumerate As noted in @ref{Randomised Search}, the use of an equality predicate with appropriate mode declarations may be needed to achieve variable co-references, etc. Currently, @code{induce_cover} starts with an initial set of at most @emph{C} clauses, where this number is specified by setting the @code{clauses} parameter. Each of these are randomly selected legal clauses. @code{induce_cover} then performs theory-level search either using as search strategy a randomised local search method (obtained by setting the @code{search} parameter to @code{rls}: see @ref{Randomised Search}), or a markov chain monte carlo technique (obtained by setting @code{search} to @code{mcmc}). The latter is untested. The only evaluation function allowed is @code{accuracy}. For theories, this is the number @code{(TP+TN)/(TP+TN+FP+FN)} where @code{TP,TN} are are the numbers of positive and negative examples correctly classified respectively; @code{FP} is the numbers of negative examples incorrectly classified as positive; and @code{FN} is the number of positive examples incorrectly classified as positive. @node Tree Learning, Constraint Learning, Theory Learning, Advanced Use @section Tree-based theories @cindex Tree-based theories The algorithm embodied in @code{induce} can be seen as the first-order equivalent of a propositional rule-learning algorithms like Clark and Niblett's CN2. There is now a substantial body of empirical work (done by researchers in Leuven and Freiburg) demonstrating the utility of first-order equivalents of propositional tree-learning procedures. Tree-based learning can be seen as a special case of theory learning and the implementation in Aleph uses the standard recursive-partitioning approach to construct classification, regression, class probability, or model trees. Tree-based theory construction is invoked by the command: @example induce_tree. @end example The type of tree constructed is determined by setting @code{tree_type} to one of: @code{classification}, @code{regression}, @code{class_probability}, or @code{model}. The basic procedure attempts to construct a tree to predict the output argument in the examples. Note that the mode declarations must specify only a single argument as output. Paths from root to leaf constitute clauses. Tree-construction is viewed as a refinement operation: any leaf can currently be refined (converted into a non-leaf) by extending the corresponding clause (resulting in two new leaves). The extension is done using Aleph's automatic refinement operator that extends clauses by a single literal within the mode language . That is, Aleph sets @code{refine} to @code{auto}. Note that using the automatic refinement operator means that the user has to ensure that all arguments that are annotated as @strong{#T} in the modes contain generative definitions for type @strong{T}. The @code{lookahead} option allows additions of several literals at once. The impurity function is specified by the setting the @code{evalfn} parameter. Currently for @code{classification} and @code{class_probability} trees @code{evalfn} must be one of @code{entropy} or @code{gini}. For @code{regression} trees the evaluation function is automatically set to @code{sd} (standard deviation). For @code{model} trees, @code{evalfn} must be one of @code{mse} (mean square error) or @code{accuracy}. In all cases, the result is always presented a set of rules. Rules for @code{class_probability} and @code{regression} trees make their predictions probabilistically using the @code{random/2} predicate provided within Aleph. In addition, settings for the following parameters are relevant: @code{classes}, the list of classes occuring in examples provided (for @code{classification} or @code{class_probability} trees only); @code{dependent}, for the argument constituting the dependent variable in the examples; @code{prune_tree}, for pruning rules from a tree; @code{confidence}, for error-based pruning of rules as described by J R Quinlan in the C4.5 book; @code{lookahead}, specifying the lookahead for the refinement operator to mitigate the horizon effect from zero-gain literals; @code{mingain}, specifying the minimum gain required for refinement to proceed; and @code{minpos} specifying the minimum number of examples required in a leaf for refinement to proceed. Forward pruning is achieved by the parameters (@code{mingain}) and @code{minpos}. The former should be set to some value greater than 0 and the latter to some value greater than 1. Backward pruning uses error pruning of the final clauses in the tree by correcting error estimates obtained from the training data. Automatic error-based pruning is achieved by setting the parameter @code{prune_tree} to @code{auto}. For @code{classification} trees the resulting procedure is identical to the one for rule pruning described by Quinlan in C4.5: Programs for Machine Learning, Morgan Kauffmann. For @code{regression} trees, error-based pruning results in corrections to the sample standard deviation. These corrections assume normality of observed values in a leaf: the method has been studied emprically by L. Torgo in "A Comparative Study of Reliable Error Estimators for Pruning Regression Trees". Following work by F Provost and P Domingos, pruning is not employed for class probability prediction. At this stage, there is no pruning also for model trees. The prediction at each `leaf' differs for each tree type. For @code{classification} trees, prediction is the majority class as estimated from the examples in the leaf; for @code{regression} trees prediction is a value drawn randomly from a normal distribution with mean and standard deviation estimated from the examples in the leaf; for @code{class_probability} trees prediction is a value drawn randomly from the (Laplace corrected) discrete distribution of classes in the leaf; and for @code{model} trees prediction is achieved by a user-defined background predicate (see following). Model trees in Aleph are constructed by examining, at each leaf, one or more model construction predicates. These predicates are defined as part of background knowledge, and can specify different kinds of models For example, the predicates may be for linear regression, polynomial regression etc. for predicting a continuous variable; a decision tree, logistic regression etc. for predicting a nominal variable. For each kind of model, the user has to provide a definition for a predicate that is able to: (a) construct the model; and (b) predict using the model constructed. The process is the same as that for lazy evaluation. Each such predicate is specified using the @code{model/1} command. If several different predicates are specified, then, at each leaf, each predicate is called to construct a model and the predicate that constructs the best model (evaluated using the current setting for @code{evalfn}) is returned. This can be computationally intensive, but can lead to the construction of fairly complex theories, in which different leaves can contain different kinds of models (for example, linear regression models in one leaf and quadratic regression models in another). Tree-learning can be performed interactively, with the user specifying the split to be selected. This is done by setting @code{interactive} to @code{true} before executing the @code{induce_tree} command. An example of using the tree learner can be found in the @code{tree} sub-directory in: @uref{http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/examples.zip} @node Constraint Learning, Mode Learning, Tree Learning, Advanced Use @section Constraint learning @cindex Constraint learning The basic Aleph algorithm constructs definite clauses normally intended to be components of a predictive model for data. Early ILP work (for example, in the Claudien system) demonstrated the value of discovering all non-Horn constraints that hold in a database. A similar functionality can be obtained within Aleph using the command: @example induce_constraints. @end example The implementation of these ideas in Aleph uses a naive generate-and-test strategy to enumerate all constraints within the background knowledge (for the mode language provided). All constraints are of the form: @example false:- ... @end example and are stored in the user-specified @code{goodfile} (the specification of this file is mandatory for @code{induce_constraints} to work). With appropriate mode settings for @code{false} and @code{not} it is possible to identify non-Horn constraints in the same way as Claudien. For example given the background knowledge: @example male('Fred'). female('Wilma'). human('Fred'). human('Wilma'). @end example and the mode declarations: @example :- modeh(1,false). :- modeb(*,human(-person)). :- modeb(1,male(+person)). :- modeb(1,female(+person)). :- modeb(1,not(male(+person))). :- modeb(1,not(female(+person))). @end example Aleph identifies the following constraints: @example false :- human(A), male(A), female(A). false :- human(A), female(A), male(A). false :- human(A), not male(A), not female(A). false :- human(A), not female(A), not male(A). @end example After removing redundant constraints (which Aleph does not do), these are equivalent to the following: @example false :- human(A), male(A), female(A). male(A) ; female(A) :- human(A). @end example The validity of these constraints can only be guaranteed if the background knowledge is assumed to be complete and correct. To account for incorrect statements in the background knowledge it may sometimes be relevant to alter the @code{noise} setting when obtaining constraints which now specifies the number of falsifying substitutions tolerated. The @code{minacc} parameter is ignored. An example of using the constraints learner can be found in the @code{constraints} sub-directory in: @uref{http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/examples.zip} @node Mode Learning, Abductive Learning, Constraint Learning, Advanced Use @section Mode learning @cindex Mode learning The basic Aleph algorithm assumes modes will be declared by the user which, in the past, this has been the source of some difficulty. There has been some work (by E. McCreath and A. Sharma, Proc of the 8th Australian Joint Conf on AI pages 75-82, 1995) on automatic extraction of mode and type information from the background knowledge provided. The implementation of these ideas in Aleph follows these ideas fairly closely and can be invoked by the command: @example induce_modes. @end example Given a set of determinations, the procedure works in two parts: (i) finding equivalence classes of types; and (ii) finding an input/output assignment. Unlike the McCreath and Sharma approach, types in the same equivalence class are given the same name only if they "overlap" significantly (the overlap of type1 with type2 is the proportion of elements of type1 that are also elements of type2). Significantly here means an overlap at least some threshold T (set using @code{typeoverlap}, with default 0.95). Values of @code{typeoverlap} closer to 1.0 are more conservative, in that they require very strong overlap before the elements are called the same type. Since this may not be perfect, modes are also produced for equality statements that re-introduce co-referencing amongst differently named types in the same equivalence class. The user has to however explicitly include a determination declaration for the equality predicate. The i/o assignment is not straightforward, as we may be dealing with non-functional definitions. The assignment sought here is one that maximises the number of input args as this gives the largest bottom clause. This assignment is is sought by means of a search procedure over mode sequences. Suppose we have a mode sequence @code{M = <m1,m2,..m\{i-1\}>} that uses the types T. An argument of type t in mode @code{m\{i\}} is an input iff t overlaps significantly (used in the same sense as earlier) with some type in T. Otherwise the argument is an output. The utility of each mode sequence M is f(M) = g(M) + h(M) where g(M) is the number of input args in M; and h(M) is a (lower) estimate of the number of input args in any mode sequence of which M is a prefix. The search strategy adopted is a simple hill-climbing one. Note that the procedure as implemented assumes background predicates will be generative (which holds when the background knowledge is ground). An example of using the mode learner can be found in the @code{modes} sub-directory in: @uref{http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/examples.zip} @node Abductive Learning, Feature Construction, Mode Learning, Advanced Use @section Abductive learning @cindex Abductive learning The basic Aleph algorithm assumes that the examples provided are observations of the target predicate to be learned. There is, in fact, nothing within the ILP framework that requires this to be the case. For example, suppose the following was already provided in the background knowledge: @example grandfather(X,Y):- father(X,Z), parent(Z,Y). parent(X,Y):- father(X,Y). father('Fred','Jane'). mother('Jane','Robert'). mother('Jane','Peter'). @end example then the examples: @example grandfather('Fred','Robert'). grandfather('Fred','Peter'). @end example are clearly not entailed by the background knowledge. Aleph would then simply try to learn another clause for @code{grandfather/2}, perhaps resulting in something like: @example grandfather(X,Y):- father(X,Z), mother(Z,Y). @end example In fact, the job would have just as easily been done, and the result more useful, if Aleph could learn the following: @example parent(X,Y):- mother(X,Y). @end example This requires Aleph to be able to do two things. First, given observations of @code{grandfather/2} that are not entailed by the background knowledge, generate instances of @code{parent/2} that will allow the observations to be entailed. Second, use the instances of @code{parent/2} that were generated to obtain the clause for @code{parent/2} above. The first of these steps requires a form of abduction. The second requires generalisation in the form of learning. It is the combination of these two steps that is called ``Abductive Learning'' here. The basic procedure used by Aleph is a simplified variant of S. Moyle's Alecto program. Alecto is described in some detail in S. Moyle, "Using Theory Completion to Learn a Navigation Control Program", Proceedings of the Twelfth International Conference on ILP (ILP2002), S. Matwin and C.A. Sammut (Eds), LNAI 2583, pp 182-197, 2003. Alecto does the following: for each positive example, an ``abductive explanation'' is obtained. This explanation is set of ground atoms. The union of abductive explanations from all positive examples is formed (this is also a set of ground atoms). These are then generalised to give the final theory. The ground atoms in an abductive explanation are obtained using Yamamoto's SOLD resolution or SOLDR (Skip Ordered Linear resolution for Definite clauses). Currently, abductive learning is only incorporated within the @code{induce} command. If @code{abduce} is set to @code{true} then Aleph first tries to obtain the best clause for the observed predicate (for example, the best clause for @code{grandfather/2}). Abductive explanations are then generated for all predicates marked as being abducible (see @code{abducible/1}) and generalisations constructed using these. The best generalisation overall is then selected and greedy clause identification by @code{induce} repeats with the observations left. Care has to be taken to ensure that abductive explanations are indeed ground (this can be achieved by using appropriate type predicates within the definitions of the abducible predicates) and limited to some maximum number (this latter requirement is for reasons of efficiency: see setting for @code{max_abducibles}). It should be evident that abductive learning as described here implements a restricted form of theory revision, in which revisions are restricted to completing definitions of background predicates other than those for which observations are provided. This assumes that the background knowledge is correct, but incomplete. In general, if background predicates are both incorrect and incomplete, then a more elaborate procedure would be required. @node Feature Construction, Other Commands, Mode Learning, Advanced Use @section Feature Construction @cindex Feature Construction One promising role for ILP is in the area of feature construction. A good review of the use of ILP for this can be found in S. Kramer, N. Lavrac and P. Flach (2001), @emph{Propositionalization Approaches to Relational Data Mining}, in Relational Data Mining, S. Dzeroski and N. Lavrac (eds.), Springer. Aleph uses a simple procedure to construct boolean features. The procedure is invoked using the @code{induce_features/0} command. This is almost identical to the @code{induce_cover/0} command. Recall that @code{induce_cover/0} uses a a covering strategy to construct rules that explain the examples (the slight twist being that all positive examples are retained when evaluating clauses at any given stage). The difference with @code{induce_features/0} is that all good clauses that are found during the course of constructing such rules are stored as new features. A feature stored by Aleph contains two bits of information: (1) a number, that acts as a feature identifier; and (2) a clause @code{(Head:-Body)}. Here @code{Head} is a literal that unifies with any of the examples with the same name and arity as @code{Head} and @code{Body} is a conjunction of literals. The intent is that the feature is @code{true} for an example if and only if the example unifies with @code{Head} and @code{Body} is @code{true}. For classification problems, the user has to specify the the dependent variable. This is done using @code{set(dependent,...)}. The process of finding rules (and the corresponding features) continues until all examples are covered by the rules found or the number of features exceeds a pre-defined upper limit (controlled by @code{set(max_features,...)}). What constitutes a ``good clause'' is dictated by settings for various Aleph parameters. The following settings are an example of some parameters that are relevant: @example :- set(clauselength,10). :- set(minacc,0.6). :- set(minscore,3). :- set(minpos,3). :- set(noise,50). :- set(nodes,5000). :- set(explore,true). :- set(max_features,20000). @end example Features found by Aleph can be shown by the @code{show(features)} command. Aleph can be used to show the boolean vectors for the train and test examples using a combination of @code{set(portray_examples,...)}, @code{features/2} appropriate definitions for @code{aleph_portray/1} and @code{show(train_pos)}, @code{show(train_neg}) etc. Here is an example of the use of @code{aleph_portray/1} for examples in the training set: @example aleph_portray(train_pos):- setting(train_pos,File), show_features(File,positive). aleph_portray(train_neg):- setting(train_neg,File), show_features(File,negative). show_features(File,Class):- open(File,read,Stream), repeat, read(Stream,Example), (Example = end_of_file -> close(Stream); write_features(Example,Class), fail). write_features(Example,_):- features(_,(Example:- Body)), (Body -> write(1), write(' '); write(0), write(' ')), fail. write_features(_,Class):- writeq(Class), nl. @end example If @code{portray_examples} is set to @code{true}, Aleph will call @code{aleph_portray(Term)}, when the command @code{show(Term)} is executed (with @code{Term} being one of @code{train_pos}, @code{train_neg}, @code{test_pos} or @code{test_neg}). @node Other Commands, , Theory Learning, Advanced Use @section Other commands There are a number of other useful commands in Aleph. These are: @table @code @item rdhyp @findex rdhyp/0 Read a hypothesised clause from the user. @item addhyp @findex addhyp/0 Add current hypothesised clause to theory. If a search is interrupted, then the current best hypothesis will be added to the theory. @item sphyp @findex sphyp/0 Perform Generalised Closed World Specialisation (GCWS) on current hypothesis. This can result in the creation of new abnormality predicates which define exceptional conditions (see @ref{Notes}) @item addgcws @findex addgcws/0 Add hypothesis constructed by performing GCWS to theory. @item covers @findex covers/0 Show positive examples covered by hypothesised clause. @item coversn @findex coversn/0 Show negative examples covered by hypothesised clause. @item reduce @findex reduce/0 @cindex Reducing a single bottom clause Run a search on the current bottom clause, which can be obtained with the @code{sat/1} command. @item man(-@var{V}) @findex man/1 @var{V} is of location of the on-line manual. @item abducible(+@var{V}) @findex abducible/1 @var{V} is of the form @var{N/A}, where the atom @var{N} is the name of the predicate, and @var{A} its arity. Specifies that ground atoms with symbol @var{N/A} can be abduced if required. @item commutative(+@var{V}) @findex commutative/1 @var{V} is of the form @var{N/A}, where the atom @var{N} is the name of the predicate, and @var{A} its arity. Specifies that literals with symbol @var{N/A} are commutative. @item symmetric(+@var{V}) @findex symmetric/1 @var{V} is of the form @var{N/A}, where the atom @var{N} is the name of the predicate, and @var{A} its arity. Specifies that literals with symbol @var{N/A} are symmetric. @item lazy_evaluate(+@var{V}) @findex lazy_evaluate/1 @cindex Lazy evaluation @var{V} is of the form @var{N/A}, where the atom @var{N} is the name of the predicate, and @var{A} its arity. Specifies that outputs and constants for literals with symbol @var{N/A} are to be evaluated lazily during the search. This is particularly useful if the constants required cannot be obtained from the bottom clause constructed by using a single example. During the search, the literal is called with a list containing a pair of lists for each input argument representing `positive' and `negative' substitutions obtained for the input arguments of the literal. These substitutions are obtained by executing the partial clause without this literal on the positive and negative examples. The user needs to provide a definition capable of processing a call with a list of list-pairs in each argument, and how the outputs are to be computed from such information. For further details see A. Srinivasan and R. Camacho, @emph{Experiments in numerical reasoning with ILP}, To appear: Jnl. Logic Programming. @item model(+@var{V}) @findex model/1 @cindex Model tree construction @var{V} is of the form @var{N/A}, where the atom @var{N} is the name of the predicate, and @var{A} its arity. Specifies that predicate @var{N/A} will be used to construct and execute models in the leaves of model trees (see @ref{Tree Learning}). This automatically results in predicate @var{N/A} being lazily evaluated (see @code{lazy_evaluate/1}). @item positive_only(+@var{V}) @findex positive_only/1 @var{V} is of the form @var{N/A}, where the atom @var{N} is the name of the predicate, and @var{A} its arity. States that only positive substitutions are required during lazy evaluation of literals with symbol @var{N/A}. This saves some theorem-proving effort. @item random(@var{V},+@var{D}) @findex random/2 @var{V} is a random variable from distribution @var{D}. @var{D} is the specification of a discrete or normal distribution. The discrete distribution is specified as [p1-a,p2-b,...] where ``p1'' represents the probability of drawing element ``a'', ``p2'' the probability of drawing element ``b'' and so on. A normal distribution with mean ``m'' and standard deviation ``s'' is specified by the term ``normal(m,s)''. If @var{V} is bound to a value then the predicate succeeds if and only if the value has a non-zero probability of occurrence (which is trivially satisfied for a normal distribution). @item sat(+@var{V}) @findex sat/1 @cindex Saturating a single example @var{V} is an integer. Builds the bottom clause for positive example number @var{V}. Positive examples are numbered from 1, and the numbering corresponds to the order of appearance in the @file{.f} file. @item example_saturated(-@var{V}) @findex example_saturated/1 @var{V} is a positive example. This is the current example saturated. @item show(+@var{V}) @findex show/1 @cindex Show things Different values of @var{V} result in showing the following. @table @code @item bottom Current bottom clause. @item constraints Constraints found by @code{induce_constraints}. @item determinations Current determination declarations. @item features Propositional features constructed from good clauses found so far. @item gcws Hypothesis constructed by the @code{gcws} procedure. @item good Good clauses found in searches conducted so far (good clauses all have a utility above that specified by @code{minscore}). @item hypothesis Current hypothesised clause. @item modes Current mode declarations (including all modeh and modeb declarations). @item modehs Current modeh declarations. @item modebs Current modeb declarations. @item neg Current negative examples. @item pos Current positive examples. @item posleft Positive examples not covered by theory so far. @item rand Current randomly-generated examples (used when @code{evalfn} is @code{posonly}). @item search Current search (requires definition for @code{portray(search)}). @item settings Current parameter settings. @item sizes Current sizes of positive and negative examples. @item theory Current theory constructed. @item test_neg Examples in the file associated with the parameter @code{test_neg}. @item test_pos Examples in the file associated with the parameter @code{test_pos}. @item train_neg Examples in the file associated with the parameter @code{train_neg}. @item train_pos Examples in the file associated with the parameter @code{train_pos}. @item Name/Arity Current definition of the predicate Name/Arity. @end table @item redundant(+@var{Clause},+@var{Lit}) @findex redundant/2 A user-specified predicate that defines when a literal @code{Lit} is redundant in a clause @code{Clause}. @code{Clause} can be the special term @code{bot}, in which case it refers to the current bottom clause. Calls to this predicate are only made if the flag @code{check_redundant} is set to @code{true}. @item modeh(+@var{Recall},+@var{Mode}) @findex modeh/2 @var{Recall} is one of: a positive integer or @code{*}. @var{Mode} is a mode template as in a @code{mode/2} declaration. Declares a mode for the head of a hypothesised clause. Required when @code{evalfn} is @code{posonly}. @item modeb(+@var{Recall},+@var{Mode}) @findex modeb/2 @var{Recall} is one of: a positive integer or @code{*}. @var{Mode} is a mode template as in a @code{mode/2} declaration. Declares a mode for a literal in the body of a hypothesised clause. @item text(+@var{L},+@var{T}) @findex text/2 @var{L} is a literal that can appear in the head or body of a clause. @var{T} is a list of terms that contain the text to be printed in place of the literal. Variables in the list will be co-referenced to variables in the literal. For example, @code{text(active(X),[X, 'is active'])}. Then the clause @code{active(d1)} will be written as @code{d1 is active}. @item hypothesis(-@var{Head},-@var{Body},-@var{Label}) @findex hypothesis/3 @var{Head} is the head of the current hypothesised clause. @var{Body} is the body of the current hypothesised clause. @var{Label} is the list @code{[P,N,L]} where @code{P} is the positive examples covered by the hypothesised clause, @code{N} is the negative examples covered by the hypothesised clause, and @code{L} is the number of literals in the hypothesised clause, @item feature(+@var{Id},+@var{(Head:-Body)}) @findex feature/2 Declares a new feature. @var{Id} is a feature identifier (usually a number). @var{Head} is a literal that can unify with one or more of the examples. @var{Body} is a conjunction of literals that constitutes the feature. @item features(?@var{Id},?@var{(Head:-Body)}) @findex features/2 Checks for an existing feature. @var{Id} is a feature identifier (usually a number). @var{Head} is a literal that can unify with one or more of the examples. @var{Body} is a conjunction of literals that constitutes the feature. @end table @node Other Programs, Notes, Advanced Use, Top @chapter Related versions and programs @cindex Related versions and programs With appropriate settings, Aleph can emulate some the functionality of the following programs: P-Progol, CProgol, FOIL, FORS, Indlog, MIDOS, SRT, Tilde and WARMR. Descriptions and pointers to these programs are available at: @uref{http://www-ai.ijs.si/\~ilpnet2/systems/} In addition the following programs and scripts are relevant. @table @code @item T-Reduce @findex T-Reduce @cindex T-Reduce T-Reduce is a companion program to Aleph that can be used to process the clauses found by the commands @code{induce_cover} and @code{induce_max}. This finds a subset of these clauses that explain the examples adequately, and have lesser overlap in coverage. T-Reduce uses the @strong{Yap} Prolog compiler. A copy of this program is available (without support) at: @uref{http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/treduce.pl} This has not been used for several years and is vulnerable to the usual forces of decay that afflict old programs. @item GUI @findex GUI @cindex Graphical interface A graphical user interface to Aleph has been developed by J. Wielemaker and S. Moyle. This is written for SWI-Prolog and uses the XPCE library. Details of this can be obtained from S. Moyle (sam at comlab dot ox dot ac dot uk). @item Scripts @findex Scripts @cindex Useful scripts There are some scripts available for performing cross-validation with Aleph. Here is a copy of a Perl script written by M. Reid (mreid at cse dot unsw dot edu dot au): @uref{http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/xval\_pl.txt} S. Konstantopoulos (konstant at let dot rug dot nl) and colleagues have a shell script and a Python script for the same purpose. Copies of these are at: @uref{http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/xval\_sh.txt} and @uref{http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/xval\_py.txt} @end table @node Notes, Change Logs, Other Programs, Top @chapter Notes @cindex Notes This section contains ideas and suggestions that have surfaced during the development of Aleph and its predecessor programs. The topics themselves are in no particular order. They are written in a somewhat stylised manner and reflect various personal biases. They should therefore, not be considered normative in any way. @menu * Aleph Choice:: On the appropriateness of Aleph. * Aleph Predicates:: On predicate-name clashes with Aleph. * Aleph Bottom Clause:: On the role of the bottom clause. * Aleph Use:: On using Aleph interactively. * Aleph Theory:: On different ways of constructing a theory. * Aleph Parameters:: On when and how the parameter settings should be changed. * Aleph Implementation:: On how the search is implemented. * Aleph Search:: On how to reduce the search space. * Aleph Examples:: On how to use fewer examples. * Aleph Portray:: On a user-defined view of hypotheses and search. * Aleph Numerical Reasoning:: On numerical reasoning with Aleph. * Aleph Applications:: On applications of Aleph. * Aleph Combine:: On using Aleph with other techniques. * Aleph GCWS:: On using Aleph to perform closed-world specialisation. * ILP Basics:: On some basic concepts in ILP @end menu @node Aleph Choice,Aleph Predicates,,Notes @section On the appropriateness of Aleph @cindex Choice of Aleph @enumerate @item There are many ILP programs. Aleph is not particularly special. @item Check whether the problem needs a relational learning program. Is it clear that statistical programs, neural networks, Bayesian nets, tree-learners etc. are unsuitable or insufficient? @item Aleph's emulation of other systems is at the ``ideas'' level. For example, with a setting of @code{search} to @code{heuristic}, @code{evalfn} to @code{compression}, @code{construct_bottom} to @code{saturation}, and @code{samplesize} to @code{0}, the command @code{induce} will a construct a theory along the lines of the Progol algorithm described by S. Muggleton. This is, however, no substitute for the original. If you want an implementation of S. Muggleton's Progol algorithm exactly as described in his paper, then Aleph is not suitable for you. Try CProgol instead. The same comment applies to other programs listed in @ref{Other Programs}. @item Aleph is quite flexible in that it allows customisation of search, cost functions, output-display etc. This allows it to approximate the functionality of many other techniques. It could also mean that it may not be as efficient as special-purpose implementations. See also: @uref{http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/ilp\_and\_aleph.ps} @end enumerate @node Aleph Predicates,Aleph Bottom Clause,Aleph Choice,Notes @section On predicate-name clashes with Aleph @cindex Avoiding predicate-name clashes @enumerate @item You may get into trouble if predicate names in the background knowledge clash with those already used within Aleph. This may be benign (for example, two different predicates that encode the same relation) or malignant (with predicates that have the same name encoding quite different things). The list of predicate names already in use can be obtained by repeated calls to the @code{current_predicate(X)} goal provided by the Prolog engine. @item It would be better if Aleph predicates were renamed, or some modular approach was adopted. None of this is done so far. @end enumerate @node Aleph Bottom Clause,Aleph Use,Aleph Predicates,Notes @section On the role of the bottom clause @cindex Role of the bottom clause @enumerate @item Besides it's theoretical role of anchoring one end of the search space, the bottom clause is really useful to introduce constants (these are obtained from the seed example), and variable co-references. @item If you are not interested in particular constants or the bottom clause introduces too many spurious co-references, it may be better not to construct a bottom clause. Try using the automatic refinement operator, or write your own refinement operator. @item If the bottom clause is too large (> 500 literals), then simply printing it on screen takes a long time. Turn this off with setting verbosity to @code{0}. @item If the bottom clause is too large (> 500 literals), then you can construct it lazily (during the search) by setting the @code{construct_bottom} flag to @code{reduction}. @end enumerate @node Aleph Use,Aleph Theory,Aleph Bottom Clause,Notes @section On using Aleph interactively. @cindex Interactive use of Aleph @enumerate @item It is always worth experimenting with Aleph before constructing a full theory. The commands @code{sat/1} or @code{rsat/0}, followed by the command @code{reduce/0} are useful for this. @code{sat(N)} constructs the bottom clause for example number @code{N}. @code{rsat} constructs a bottom clause for a randomly selected example. @code{reduce} does a search for an acceptable clause. @item You can interrupt a search at any time. The command @code{addhyp/0} then adds the current best clause to the theory. This has the flavour of anytime-learning. @item The @code{induce_incremental} command is highly interactive. It requires the user to provide examples, and also categorise the result of searches. This may prove quite demanding on the user, but has the flavour of the kind of search done by a version-space algorithm. @item Setting @code{interactive} to @code{true} and calling @code{induce_clauses} has the same effect as calling @code{induce_incremental}. Trees can also be constructed interactively by setting @code{interactive} to @code{true} and calling @code{induce_tree}. @end enumerate @node Aleph Theory,Aleph Parameters,Aleph Use,Notes @section On different ways of constructing a theory @cindex Different ways for theory-construction @enumerate @item The routine way of using @code{induce/0} is often sufficient. @item @code{induce/0}, @code{induce_cover/0}, @code{induce_max/0}, @code{induce_clauses/0} and @code{induce_incremental/0} encode control strategies for clause-level search. They will use any user defined refinement operators, search and evaluation functions, beam-width restrcitions etc that are set. In terms of speed, @code{induce/0} is usually faster than @code{induce_cover/0}, which in turn is faster than @code{induce_max/0}. The time taken by @code{induce_incremental/0} is not as easily characterisable. @code{induce_clauses/0} is simply @code{induce/0} or @code{induce_incremental/0} depending on whether the flag @code{interactive} is @code{false} or @code{true} respectively. @item @code{induce_max/0} results in a set of clauses that is invariant of example ordering. Neither @code{induce_cover/0}, @code{induce/0} or @code{induce_incremental/0} have this property. @item Use the T-Reduce program after @code{induce_max/0} or @code{induce_cover/0} to obtain a compact theory for prediction. @item You can construct a theory manually by repeatedly using @code{sat/1} (or @code{rsat/0}), @code{reduce/0} and @code{addhyp/0}. @item You can mitigate the effects of poor choice of seed example in the saturation step by setting the @code{samplesize} flag. This sets the number of examples to be selected randomly by the @code{induce} or @code{induce_cover} commands. Each example seeds a different search and the best clause is added to the theory. @item If you set @code{samplesize} to @code{0} examples will be selected in the order of appearance in the positive examples file. This will allow replication of results without worrying about variations due to sampling. @item The @code{induce_tree} command will construct tree-structured theories. @item The @code{induce_theory} command is to be used at your own peril. @end enumerate @node Aleph Parameters,Aleph Implementation,Aleph Theory,Notes @section On a categorisation of parameters @cindex Categorisation of parameters @enumerate @item The following parameters can affect the size of the search space: @code{i}, @code{clauselength}, @code{nodes}, @code{minpos}, @code{minacc}, @code{noise}, @code{explore}, @code{best}, @code{openlist}, @code{splitvars}. @item The following parameters affect the type of search: @code{search}, @code{evalfn}, @code{refine}, @code{samplesize}. @item The following parameters have an effect on the speed of execution: @code{caching}, @code{lazy_negs}, @code{proof_strategy}, @code{depth}, @code{lazy_on_cost}, @code{lazy_on_contradiction}, @code{searchtime}, @code{prooftime}. @item The following parameters alter the way things are presented to the user: @code{print}, @code{record}, @code{portray_hypothesis}, @code{portray_search}, @code{portray_literals}, @code{verbosity}, @item The following parameters are concerned with testing theories: @code{test_pos}, @code{test_neg}, @code{train_pos}, @code{train_neg}. @end enumerate @node Aleph Implementation,Aleph Search,Aleph Parameters,Notes @section On how the single-clause search is implemented @cindex Implementation of single-clause search @enumerate @item The search for a clause is implemented by a restricted form of a general branch-and-bound algorithm. A description of the algorithm follows. It is a slight modification of that presented by C.H. Papadimitriou and K. Steiglitz (1982), @emph{Combinatorial Optimisation}, Prentice-Hall, Edgewood-Cliffs, NJ. In the code that follows, @emph{activeset} contains the set of ``live'' nodes at any point; the variable @emph{C} is used to hold the cost of the best complete solution at any given time. @example @strong{begin} active:= @{0@}; (@strong{comment}: ``0'' is a conventional starting point) C:= inf; currentbest:= anything; @strong{while} active is not empty @strong{do begin} remove first node k from active; (@strong{comment}: k is a branching node) generate the children i=1,...,Nk of node k, and compute corresponding costs Ci and lower bounds on costs Li; @strong{for} i = 1,...,Nk @strong{do} @strong{if} Li >= C @strong{then} prune child i @strong{else begin} @strong{if} child i is a complete solution and Ci < C @strong{then begin} C:= Ci, currentbest:= child i; prune nodes in active with lower bounds more than Ci @strong{end} add child i to active @strong{end} @strong{end} @strong{end} @end example @item The algorithm above results in a search tree. In Aleph, each node contains a clause. @item A number of choices are made in implementing a branch-and-bound algorithm for a given problem. Here are how these are made in Aleph: (a) @emph{Branch node}. The choice of node to branch on in the activeset is based on comparisons of a dual (primary and secondary) search key associated with each node. The value of this key depends on the search method and evaluation function. For example, with @code{search} set to @code{bf} and @code{evalfn} set to @code{coverage} (the default for Aleph), the primary and secondary keys are @code{-L,P-N} respectively. Here @code{L} is the number of literals in the clause, and @code{P,N} are the positive and negative examples covered by the clause. This ensures clauses with fewer literals will be chosen first. They will further be ordered on difference in coverage; (b) @emph{Branch set}. Children are generated by refinement steps that are either built-in (add 1 literal at a time) or user-specified. With built-in refinement, loop detection is performed to prevent duplicate addition of literals; (c) @emph{Lower bounds}. This represents the lowest cost that can be achieved at this node and the sub-tree below it. This calculation is dependent on the search method and evaluation function. In cases where no easy lower bound is obtainable, it is taken as @code{0} resulting in minimal pruning; (d) @emph{Restrictions}. The search need not proceed until activeset is empty. It may be terminated prematurely by setting the @code{nodes} parameter. Complete solutions are taken to be ones that satisfy the language restrictions and any other hypothesis-related constraints. @end enumerate @node Aleph Search,Aleph Examples,Aleph Implementation,Notes @section On how to reduce the search space @cindex Reducing the search space @enumerate @item Use smaller @code{i} setting or smaller @code{clauselength} or @code{nodes} setting. Avoid setting @code{splitvars} to @code{true} (it is not even clear whether this works correctly anyway). Try relaxing @code{minacc} or @code{noise} to allow clauses with lower accuracy. Set @code{minpos} to some larger value than the default. Set a different value to @code{best}. @item Write constraints and prune statements. @item Use a refinement operator that enumerates a smaller space. @item Restrict the language by allowing fewer determinations. @item Restrict the search space by setting beam-width (using parameter @code{openlist}); or using an iterative beam-width search (setting @code{search} to @code{ibs}); or using randomised local search (setting @code{search} to @code{rls}) with an appropriate setting for associated parameters); or using Camacho's language search (using parameter @code{language} or setting @code{search} to @code{ils}). @item Use a time-bounded search by setting @code{searchtime} to some small value. @end enumerate @node Aleph Examples,Aleph Portray,Aleph Search,Notes @section On how to use fewer examples @cindex Using fewer examples @enumerate @item It need not be necessary to test on the entire dataset to obtain good estimates of the cost of a clause. @item Methods like sub-sampling or windowing can be incorporated into ILP programs to avoid examining entire datasets. These are not yet incorporated within Aleph, although windowing can be achieved within a general purpose theory-revision program called @strong{T-Revise} which can use any ILP program as its generalisation engine (available from Ashwin Srinivasan, ashwin at comlab dot ox dot ac dot uk). More details on this are available in: A. Srinivasan (1999), @emph{A study of two sampling methods for analysing large datasets with ILP}, Data Mining and Knowledge Discovery, 3(1):95-123. @item Using the @code{posonly} evaluation function will allow construction of theories using positive examples only (thus, some savings can be made by ignoring negative examples). @end enumerate @node Aleph Portray,Aleph Numerical Reasoning,Aleph Examples,Notes @section On a user-defined view of hypotheses and search @cindex Portrayal of hypotheses and search @enumerate @item User-definitions of @code{portray/1} provide a general mechanism of altering the view of the hypotheses and search seen by the user. @item There are 3 flags that are used to control portrayal. These are @code{portray_hypothesis}, @code{portray_search} and @code{portray_literals}. If the first is set to @code{true} then the command @code{show(hypothesis)} will execute @code{portray(hypothesis)}. This has to be user-defined. If the second flag is set to @code{true} then the command @code{show(search)} will execute @code{portray(search)}. This has to be user-defined. If the third flag is set to @code{true} then any literal @code{L} in a clause constructed during the search will be shown on screen by executing @code{portray(L)}. This has to be user-defined. @item Examples of using these predicates can be found in the @code{portray} sub-directory in: @uref{http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/examples.zip} @end enumerate @node Aleph Numerical Reasoning,Aleph Applications,Aleph Portray,Notes @section On numerical reasoning with Aleph @cindex Numerical reasoning with Aleph @enumerate @item There are many programs specialised to accomplish numerical reasoning. Aleph is not one of them. Consider parametric techniques, regression trees etc. The ILP program @strong{FORS} is an example of an ILP program particularly suited to perform regression like tasks (see A. Karalic and I. Bratko (1997), @emph{First-Order Regression}, Machine Learning, 26:147-176). The program @strong{SRT} is a first-order variant of a regression tree builder (see S. Kramer (1996), @emph{Structural Regression Trees}, Proc. of the 13th National Conference on Artificial Intelligence (AAAI-96)), and the program @strong{Tilde} has the capability of performing regression-like tasks (see H. Blockeel, L. De Raedt and J. Ramon (1998), @emph{Top-down induction of clustering trees}, Proc of the 15th International Conference on Machine Learning, pp 55-63). Aleph does have a simple tree-based learner that can construct regression trees (see @ref{Tree Learning}). @item It is possible to attempt guesses at numerical constants that add additional literals to the bottom clause. An example of how this can be done with a predicate with multiple recall is in the Aleph files @code{guess.b}, @code{guess.f}, and @code{guess.n} in the @code{numbers} sub-directory in: @uref{http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/examples.zip} @item Guessing may not always work. The problem may then be amenable to the use of the technique of lazy evaluation. Here an appropriate constant in literal Li is obtained during the search by calling a definition in background knowledge that calculates the constant by collecting bindings from pos examples that are entailed by the ordered clause L0, L1, ... Li-1, and the neg examples inconsistent with the ordered clause L0, L1, ... Li-1 (ie the pos and neg examples ``covered'' by this clause). An example of how this can be done is in the Aleph files @code{ineq.b}, @code{ineq.f}, and @code{ineq.n} in the @code{numbers} sub-directory in: @uref{http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/examples.zip} @item The technique of lazy evaluation can be used with more than one input argument and to calculate more than one constant. With several input arguments, values in lists of substitutions can be paired off. An example where it is illustrated how a line can be constructed from a picking two such substitution-pairs can be found in the Aleph files @code{ineq.b}, @code{ineq.f}, and @code{ineq.n} in the @code{numbers} sub-directory in: @uref{http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/examples.zip} @item The use of lazy evaluation, in combination with user-defined search specifications can result in quite powerful (and complex) clauses. In the file: @uref{http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/mut.b} is the background knowledge used to construct theories in a subset of the ``mutagenesis'' problem. It illustrates the call to a C function to compute linear regression, user-defined refinement operators, and a user-defined cost function that forces clauses to be scored on mean-square -error (rather than coverage) @end enumerate @node Aleph Applications,Aleph Combine,Aleph Numerical Reasoning,Notes @section On applications of Aleph @cindex Applications of Aleph @enumerate @item Earlier incarnations of Aleph (called P-Progol) have been applied to a number of real-world problems. Prominent amongst these concern the construction of structure-activity relations for biological activity. In particular, the results for mutagenic and carcinogenic activity have received some attention. Also prominent has the been the use for identifying pharmacophores -- the three-dimensional arrangement of functional groups on small molecules that enables them to bind to drug targets. See: @uref{http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/applications.html}. @item Applications to problems in natural language processing have been done by James Cussens and others. See: @uref{http://www.cs.york.ac.uk/\~jc/} @end enumerate @node Aleph Combine,Aleph GCWS,Aleph Applications,Notes @section On using Aleph with other techniques @cindex Using Aleph with other techniques @enumerate @item There is often a significant advantage in combine the results of Aleph with those of established prediction methods. @item Three ways of doing this are evident: (a) @emph{As background knowledge.} Incorporate other prediction methods as part of the background knowledge for Aleph. An example is the use of linear regression as a background knowledge; (b) @emph{As new features.} Incorporate the results from Aleph into an established prediction method. An example is the conversion of Aleph derived alerts into ``indicator'' variables for linear regression; and (c) @emph{For outlier analysis.} Use Aleph to explain only those instances that are inadequately modelled by established techniques. An example is the use of Aleph to explain the non-linearities left after the linear component adequately explained by regression is removed. @end enumerate @node Aleph GCWS,ILP Basics,Aleph Combine,Notes @section On performing closed-world specialisation with Aleph @cindex Generalised Closed World Specialisation (GCWS) @enumerate @item Generalised Closed-World Specialisation (GCWS) is a way of obtaining structured theories in ILP. Given an overgeneral clause C, GCWS specialises it by constructing automatically new ``abnormality'' predicates that encode exceptions to C, exceptions to those exceptions, etc. @item A classic example is provided by the Gregorian Calendar currently in use in parts of the world. From 45 B.C.E to 1581 C.E the Holy Roman Empire subscribed to the Julian calendar commissioned by Julius Caesar. This specified that every year that was a multiple of 444 would contain an intercalary day to reconcile the calendar with a solar year (that is, one extra day would be added). This rule is correct up to around one part in a hundred and so up until 1582 errors could simply be treated as noise. In 1582 C.E Pope Gregory XIII introduced the Gregorian calendar. The following corrections were implemented. Every fourth year would be an intercalary year except every hundredth year. This rule was itself to be overruled every four hundredth year, which would be an intercalary year. As a set of clauses the Gregorian calendar is: @example normal(Y):- not(ab0(Y)). ab0(Y):- divisible(4,Y), not(ab1(Y)). ab1(Y):- divisible(100,Y), not(ab2(Y)). ab2(Y):- divisible(400,Y). @end example where @code{normal} is a year that does not contain an intercalary day. With background knowledge of @code{divisible/2} GCWS would automatically specialise the clause: @example normal(Y). @end example by constructing the more elaborate theory earlier. This involves invention of the @code{ab0,ab1,ab2} predicates. @item See M. Bain, (1991), @emph{Experiments in non-monotonic learning}, Eighth International Conference on Machine Learning, pp 380-384, Morgan Kaufmann, CA; and A. Srinivasan, S.H. Muggleton, and M. Bain (1992): @emph{Distinguishing Noise from Exceptions in Non-Monotonic Learning}, Second International Workshop on ILP, for more details of GCWS. @item The way to use GCWS within Aleph is as follows. First try to learn a clause in the standard manner (that is using the @code{sat} and @code{reduce} commands). If no acceptable clause is found, decrease the minimum accuracy of acceptable clauses (by setting @code{minacc} or @code{noise}). Now do the search again. You will probably get an overgeneral clause (that is, one that covers more negative examples than preferrable). Now use the @code{sphyp} command to specialise this hypothesis. Aleph will repeatedly create examples for new abnormality predicates and generalise them until the original overgeneral clause does not cover any negative examples. You can then elect to add this theory by using the @code{addgcws} command. @item The implementation of GCWS within Aleph is relatively inefficient as it requires creating new examples for the abnormality predicates on disk. @end enumerate @node ILP Basics,,Aleph GCWS,Notes @section On some basic ideas relevant to ILP @cindex ILP ideas @enumerate @item Some basic ideas relevant ILP can be found at: @uref{http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/misc/basic.html} @end enumerate @node Change Logs, Parameter Index, Notes, Top @chapter Change Logs @cindex Changes in versions @menu * Version 1:: Version 1 * Version 2:: Version 2 * Version 3:: Version 3 * Version 4:: Version 4 * Version 5:: Version 5 @end menu @node Version 1, Version 2, , Change Logs @section Changes in Version 1 @itemize @bullet @item @strong{Wed Nov 10 10:15:44 GMT 1999:} fixed bug in bug fix of Fri Oct 8 10:06:55 BST 1999. @item @strong{Mon Oct 25 14:06:07 BST 1999:} minor improvement to code for stochastic clause selection; added mailing list info in header @item @strong{Fri Oct 8 10:06:55 BST 1999:} fixed bug in @code{record_testclause} to add depth bound call to body literals. @item @strong{Mon Sep 20 09:50:23 BST 1999:} fixed bug in @code{continue_search} for user defined cost function; fixed bug in stochastic clause selection that attempts to select more literals than present in bottom clause. @end itemize @node Version 2, Version 3, Version 1, Change Logs @section Changes in Version 2 @itemize @bullet @item @strong{Fri Mar 31 17:12:52 BST 2000:} Some predicates called during variable-splitting did not account for change that allows arbitrary terms in mode declarations. Changed split_args/4 to split_args/5 to fix bug concerning multiple modes for the same predicate. @item @strong{Thu Mar 23 09:57:15 GMT 2000:} Minor fixes. Some predicates called during lazy evaluation did not account for change that allows arbitrary terms in mode declarations. @item @strong{Fri Jan 28 14:57:32 GMT 2000:} Arbitrary terms now allowed in mode declarations; logfile no longer records date of trace automatically (a system call to `date' causes Yap to crash on some non-Unix systems -- use set(date,...) to record date). @end itemize @node Version 3, Version 4, Version 2, Change Logs @section Changes in Version 3 @itemize @bullet @item @strong{Wed May 16 06:22:52 BST 2001:} @itemize @bullet @item Changed retractall to retract_all @item Added check for setting(refine,user) in check_auto_refine (reported by Khalid Khan) @item Added clause to select_nextbest/2 for RefineType = user @item Fixed call to get_gains in reduce(_) to include StartClause when using a refinement operator @item Some calls to idb entries for last_refinement and best_refinement were incorrectly using key: "search" instead of "aleph" @item Clause in get_refine_gain and get_refine_gain1 when RefineType \= rls had variable name clash for variable E. Renamed one of these to Example @item Changed representation of gains for openlist. This is now the term [P|S] where P is the primary key and S is the secondary key. This used to be converted into a unique number, which required the setting of a base. This is no longer required and so removed fix_base predicate. Corresponding changes to structure for gains idb also implemented by including P and S as first two arguments, and to uniq_insert to compare using lexicographically @item Call to reduce now catches aborts and reinstates the values of any parameters saved via the use of catch/3 @item Rls search type is now correctly heuristic (and not bf: reported by David Page and Khalid Khan) @item Incorporated Filip Zelezny's corrections to posonly estimate by ensuring that rm_seeds updates atoms_left for rand examples. @item sample_clauses can now use probability distribution over clauselengths @item Reinstated search `scs' to perform stochastic clause selection (after Aleph 0 this was being done as a special case of rls before) @item Removed call to store_cover to fix problem identified by Stasinos Konstantopoulos when get_hyp_label/2 calls covers/1 and coversn/1 @item Updated manual to mirror style of Yap's manual and added patches sent by Stasinos Konstantopoulos @end itemize @item @strong{Fri May 18 07:44:02 BST 2001:} Yap was unable to parse calls of the form recorded(openlist,[[K1|K2]|_],_) (reported by Khalid Khan). Worked around by changing to recorded(openlist,[H|_],_), H= [K1|K2]. @item @strong{Wed Jul 25 05:50:12 BST 2001:} @itemize @bullet @item Changed calls to val(ArgNo,Pos). This was causing variable-splitting to fail @item Both input and output variables can now be split in the head literal @item Posonly learning now adds an SLP generator clause for each modeh declaration @item Modes can now contain ground terms @item Restored proper operation of user-defined refinement operator @item Added facility for time-restricted proofs @item Added facility for new computation rule that selects leftmost literals with delaying @end itemize @item @strong{Mon Mar 18 12:49:10 GMT 2002:} @itemize @bullet @item Changed update atoms/2 to check the mode of the ground literal used to produce the bottom clause. This means copies of ground literals can now exist, if the corresponding variables are typed differently by the mode declarations. This was prompted by discussions with Mark Rich. @item continue_search/3 replaced by discontinue_search/3. @item Added setting for newvars. This bounds the maximum number of new variables that can be introduced in the body of a clause @item Added code developed by Filip Zelezny to implement randomised local search using `randomised rapid restarts'. @item Changed pos_ok/6 to check for minpos constraint for any refinement operator r that is such that for a hypothesis H, poscover(r(H)) <= poscover(H). This cannot be guaranteed when search = rls or refine = user. In other situations, the built-in refinement operator that adds literals is used and this property is holds. This was prompted by discussions with James Cussens. @item Fixed bug in randomised search: rls_nextbest/4 that had args for gain/4 in the wrong order. @item Fixed closed-world specialisation: was not checking for lazy evaluation, also changed tmp file names to alephtmp.[fn] @item subsumes/2 renamed aleph_subsumes/2. @item Changes to lazy evaluation code that allows a set of input bindings from an example. This makes multi-instance learning possible. @item Automatic general-to-specific refinement from modes now ensures that it does not generate clauses that would succeed on prune/1 . @item Built-in local clause moves in randomised search now ensures that it does not generate clauses that would succeed on prune/1 . @item Random sampling of clauses from hypothesis space now returns most general clause on failure. @item Added code check_recursive_calls/0. This allows calls to the positive examples when building the bottom clause if recursion is allowed. @item Changed covers/1 and and coversn/1 to check if being called during induce/0. @item Miscellaneous changes of write/1 to writeq/1. @end itemize @end itemize @node Version 4, Version 5, Version 3, Change Logs @section Changes in Version 4 @itemize @bullet @item @strong{Wed Nov 13 16🔞53 GMT 2002:} @itemize @bullet @item Added portability to SWI-Prolog. @item Lazy-evaluation now creates literals identified by numbers that are less than 0 (rather than by positive numbers beyond that obtained from the bottom clause). @item Fixed error in mark_redundant_lits/2 that checked for redundant literals in the bottom clause. @item Avoided overloading of the refine flag by introducing a secondary flag refineop that is actually used by Aleph. @item Avoided overloading of the search flag by introducing a secondary flag searchstrat that is actually used by Aleph. @item Removed defunct flags verbose, computation_rule. @item Added code symmetric_match/2 when checking for symmetric literals. @item Added new flags including minposfrac, minscore, mingain, prune_tree, confidence, classes, newvars etc. @item Changed flags so that noise/minacc can co-exist. Now the user's problem to check that these are consistent. @item Introduced new predicate find_clause/1 to perform basic searches (this was previously done by reduce/1). @item Miscellaneous rewrites of code for checking lang_ok and newvars_ok. @item Miscellaneous rewrites of code for optimising clauses. @item Rationalised pruning code. @item Fixed bug in pos_ok that affected posonly mode. @item Added code for dealing uniformly with plus and minus infinities in SWI and Yap. @item Added code for dealing uniformly with alarms in SWI and Yap. @item Added code for dealing uniformly with random number generation in SWI and Yap. @item Added code for dealing with cputime in cygnus (from Mark Reid). @item Added code for checking flag settings and specification of default values. @item Added code for new evaluation functions entropy, gini, wracc. @item Added code for new search strategy id. @item Added code for showing positive examples left, good clauses, constraints. @item Added code for calculating pos/neg cover of head of clauses. Needed for checking minposfrac and evaluating wracc. @item Added code for write_rules/0 (from Mark Reid) and rewrote code for reading input files to be compatible with patches used by Mark Reid and Stasinos Konstantopoulos. @item Added code in auto_refine to check for tautologies. @item Added code to add lookahead to automatic refinement operator. @item Added code to check whether clauses found by induce should be added to the background (controlled by the flag updateback). @item Added code for generating random vars from normal and chi-square distributions. @item Added code to check that clauses below minpos are not added to the theory. @item Added code for testing theory on sets of files pointed to be train_pos, train_neg, test_pos, and test_neg. @item Added code to store ``good'' clauses either in a file or in memory. @item Added code for Claudien-style induction of constraints in induce_constraints. @item Added code for Tilde-style induction of trees in induce_tree. @item Added code for McCreath-Sharma induction of modes in induce_modes. @item Added code for generation of propositional boolean features from good clauses. @item Removed code for list_profile/0. @item Removed code for probabilistic refinement operators. @item Removed code for doing pre-computation of background predicates. @item Removed code for Markov-Chain Monte-Carlo search. @end itemize @end itemize @node Version 5, , Version 4, Change Logs @section Changes in Version 5 @itemize @bullet @item @strong{Sun Jun 4 10:51:31 UTC 2006} @item Removed cut from call_with_depth_limit/3 for SWI @item Fixed bug in gen_layer/2 with negated predicates @item Changed call to portray/1 to aleph_portray/1 @item Included value of lookahead into automatic refinement in get_user_refinement @item Included check for LazyOnContra in prove_examples for evalfn=posonly @item Ensured update_gsample correctly updates counts of rand data @item Corrected bug in modes/2 to get Pred before checking for modes @item Corrected code generated for constructing automatic refinement using modes to account correctly for multiple mode declarations for the same predicate @item Corrected copy_modeterms to account for variables in mode declarations @item Added code for induce_features/0 @item Changed tree code to allow specification of the dependent variable @end itemize @item @strong{Sun Nov 6 12:49:12 UTC 2005} @itemize @bullet @item Allow @code{minacc} and @code{noise} settings when @code{evalfn} is set to @code{user}. Incorporated bug fixes to @code{get_max_negs} reported by Daniel Fredouille. @item Bug fix reported by Vasili Vrubleuski for removal of commutative literals with SWI @item Inserted code for abduction within the @code{induce} loop @end itemize @itemize @bullet @item @strong{Sun Jun 5 05:51:32 UTC 2005} @itemize @bullet @item Fixed miscellaneous bugs in the code. @item Modified code to generate features correctly @end itemize @item @strong{Sun Oct 10 06:59:50 BST 2004} @intemize @bullet @item Fixed code to alter odd behaviour with cut being introduced in hypothesised clauses by altering @code{gen_nlitnum}. @end itemize @item @strong{Wed Jun 30 14:38:44 BST 2004} @itemize @bullet @item Fixed @code{posonly} bug by fixing typo for @code{gsamplesize}. @end itemize @item @strong{Mon Jun 2 15:05:24 BST 2003} @itemize @bullet @item Complete rewrite of the code to remove references to internal databases. @item Preliminary support for concurrent operation on shared memory machines (using Prolog threads). @item Miscellaneous bug fixes in code. @end itemize @item @strong{Wed Jun 30 14:38:44 BST 2004} @itemize @bullet @item Corrections to @code{best_value/4} after discussions with James Cussens, Mark Reid and Jude Shavlik. @item Added @code{depth_bound_call} to cover test in @code{test_file/2} (reported by James Cussens). @item Changed code to ignore settings for @code{noise} and @code{minacc} when @code{evalfn} is @code{user}. @item @code{discontinue_search} now fails if @code{evalfn} is @code{user}. @item Added @code{interactive} flag to control interactive construction of clauses and trees. @item Added command @code{induce_clauses}. @item Added code for constructing model trees. @end itemize @end itemize @node Parameter Index, Concept Index, Change Logs, Top @unnumbered Commands and Parameters Index @printindex fn @node Concept Index, , Parameter Index, Top @unnumbered Concept Index @printindex cp @contents @bye </m1,m2,..m\{i-1\}>