Chunks and Rules (original) (raw)

This specification defines a cognitive database model based on graphs, rules that operate on them in conjunction with highly scalable graph algorithms, suitable for handling big data, and a format to serialize graphs. The model is designed with the aim of facilitating machine learning for vocabularies and rules, and inspired by advances in the cognitive sciences on the organisation of the mammalian brain.

This document is at early stages of development. Feedback is welcome through GitHub issues or on the public-cogai@w3.org mailing-list (with public archives).

Introduction

This specification defines a cognitive database model based on graphs and rules that operate on them, modeled after the organisation of the mammalian brain, as described by cognitive sciences, with the aim of facilitating machine learning processing of big data.

Architecture of the cognitive database model

At its heart, the model is based on [=graphs of chunks=] composed of a collection of [=chunks=], where each [=chunk=] represents a collection of basic familiar units that have been grouped together and stored in memory. To ease manipulation of procedural knowledge as declarative knowledge, [=chunks=] are used to model both declarative knowledge (i.e. data) as well as procedural knowledge (i.e. rules). See [[CHUNKS-INTRO]] for details.

The [=rule engine=] operates on a set of [=modules=], where each [=module=] has a [=graph of chunks=] and supports a common set of operations on chunks. Each module also has a single [=module buffer=] that the [=rule engine=] can process and that can hold one and only one [=chunk=] at a time.

This specification also defines a serialization format for graphs of chunks, used in examples throughout this specification.

The grammatical rules in this document are to be interpreted as described in [[[RFC5234]]] [[RFC5234]].

Conformance classes

Conformance to this specification is defined for four conformance classes:

Chunks document

A serialization of a [=graph of chunks=] as a file. A [=chunks document=] is conformant to this specification if it follows the grammar described in .

Authoring tool

An application that writes a [=chunks document=]. An [=authoring tool=] is conformant to this specification if it writes conforming [=chunks documents=].

Parser

A [=parser=] transforms a [=chunks document=] into another representation. A [=parser=] is conformant to this specification if it accepts any conforming [=chunks document=].

Rule engine

A processing application that operates on graphs of chunks and rules, organized following the cognitive agent architecture described in this specification. A [=rule engine=] is conformant to this specification if it follows the algorithms defined in .

Data types

This document uses the following restricted set of data types to describe [=chunks=]. See for a formal definition of their serialization.

A number represents a double-precision 64-bit format value as specified in the IEEE Standard for Binary Floating-Point Arithmeticis [[IEEE-754-2019]]. It is serialized in base 10 using decimal digits, following the same grammar as numbers in JSON [[RFC8259]].

A boolean represents a logical entity having two values. It is serialized as either the literal name true, which gets interpreted as a truthy value, or the literal name false, which gets interpreted as a falsy value.

A date is an [[ISO8601]] string that represents a date. A [=date=] value implicitly creates a read-only chunk whose type is iso8601 with properties that match actual date components.

Clarify that only a subset of ISO8601 is supported (see issue #8).

To prepend iso8601 and related properties with @ or not to prepend with @, that is the question (see issue #2).

A string literal is an arbitrary set of characters. It is serialized enclosed in double quotes, following the same grammar as strings in JSON [[RFC8259]].

A name is a string that can include letters, digits, period, hyphen, underscore and slash characters, and that cannot be interpreted as a [=number=], a [=boolean=]. Additionally, depending on the context under which it is used, a [=name=] may start with one of the name operators (see [[[#name-operators]]]).

Chunks and graphs

A chunk is a named typed collection of [=properties=]. A [=chunk=] is used to model both declarative knowledge and procedural knowledge as a collection of basic familiar units that have been grouped together and stored in memory.

A [=chunk=] has a [=type=] and an optional [=identifier=].

Chunk type

A chunk type is a [=name=] that documents the nature of a chunk. The [=type=] is used to group and index chunks. [=Rules=] typically apply to chunks of a given [=type=].

As a special case, the [=type=] may be formed by a single asterisk (*), which is used to describe a [=condition=] or [=action=] that matches any chunk [=type=].

Chunk identifier

The chunk identifier is a [=name=] that uniquely identifies a chunk within the graph it is defined in.

The chunk [=identifier=] is optional.

Chunk properties

A chunk property is a [=name=]/[=value=] pair that describes a chunk across the particular dimension identified by the property name.

A value is either an [=atomic value=] or an ordered list of [=atomic values=] (values are comma-separated in serialized form).

An atomic value is either:

A property [=value=] |a| equals property [=value=] |b| when the following algorithm returns true:

Chunk context

A [=chunk=] may be scoped to a context, which identifies the specific situation under which the [=chunk=] should be considered to be true. This mechanism allows [=chunks=] to describe things that are only true in hypothetical situations.

[=Contexts=] can be used to express situations that involve the use of statements about statements, including beliefs, stories, reported speech, examples in lessons, abductive reasoning and even search query patterns. They are also useful for episodic memory when one wants to describe facts that are true in a given situation, for instance an even when a peson visited a restaurant for lunch, sat by the window, and had soup for starters followed by mushroom risotto for the main course. A sequence of episodes can then be modelled as relationships between contexts.

A [=chunk=] gets associated with a specific [=context=] through an [=@context=] [=property=].

A [=chunk=] that is not explicitly associated with a [=context=] (i.e. in the absence of an [=@context=] property) belongs to the default context.

As illustrated in the previous example, [=contexts=] can be chained, e.g. to describe the beliefs of someone in a fictional story or movie, and to indicate when a context is part of several other contexts, thus creating a tree of [=contexts=].

Practically speaking, [=contexts=] make it possible to filter out non relevant [=chunks=] in [=conditions=] and [=actions=]. Two [=chunks=] may only [=match=] when they belong to the same context. For instance, a [=chunk=] that belongs to the context tom-belief-1 can only [=match=] [=chunks=] that also belong to that context, and de facto cannot match [=chunks=] that belong to the [=default context=]. In particular, a [=chunk=] that belongs to the [=default context=] can only [=match=] [=chunks=] that also belong to the [=default context=].

In this document, a link is a directed and labeled connection between two [=chunks=]. A [=link=] is automatically created whenever a chunk property [=value=] is a [=name=] that references an existing chunk [=identifier=].

The subject of the [=link=] identifies the [=chunk=] at the origin of the connection. The object of the [=link=] identifies the [=chunk=] targeted by the connection. The label of the [=link=] is the property [=name=].

When a [=chunk=] links to another [=chunk=], this implicitly creates a third [=chunk=] whose [=type=] is the name of the [=property=] that creates the [=link=], and that has two [=properties=]:

Graph of chunks

A graph of chunks is simply a collection of [=chunks=]. The vertices of the graph are the [=chunks=]. The edges of the graph are the [=links=] between the chunks.

Since [=links=] are directed, a [=graph of chunks=] is a directed graph.

Rules and modules

Rules

A rule is a [=chunk=] whose [=type=] is rule and that has:

A [=rule=] represents a unit of procedural knowledge. Rules consist of [=conditions=] and [=actions=].

Conditions

A condition is a [=chunk=] that describes the premises that must hold true for a [=rule=] to apply. A [=condition=] identifies which [=module=] it relates to through an [=@module=] property, defaulting to the goal module. A [=condition=] holds true when the [=chunk=] in the related [=module buffer=] is a [=matching chunk=] for the [=condition=].

Actions

An action is a [=chunk=] that can directly update [=module buffers=], or can do so indirectly, e.g. by sending messages to the [=module=] to invoke graph algorithms, such as graph queries and updates, or to carry out operations, e.g. instructing a robot to move its arm. When the algorithm or operation is complete, a response can be sent back to update the module's buffer. This in turn can trigger further rules as needed.

In many cases, the actual operation that an [=action=] will carry out will appear as an [=@do=] property. Built-in operations are always supported (see ). Additional actions may be supported.

Matching chunks

A [=chunk=] |A| matches [=chunk=] |B| if the conditions below are all met:

@-properties for conditions and actions

The [=reserved names=] defined in this section may be used as [=property=] names in [=conditions=] and [=actions=] to control their behavior.

The @context property

When used in a regular [=chunk=], identifies a chunk's [=context=]. When used in a [=condition=] or in an [=action=], [=matches=] a [=chunk=]'s [=context=].

The @do property

Specifies the graph algorithm or operation to execute. See for a list of common operations that are supported across modules.

The @for property

Iterates over a set of items in a comma separated list. The [=@from=] and [=@to=] properties may be used to restrict the iteration range.

The @from property

Specifies the zero-based starting index of an [=@for=] iteration. Value must be an integer.

The @id property

[=Matches=] a [=chunk=]'s [=identifier=], or binds a variable to the [=chunk=]'s [=identifier=].

The @kindof property

[=Matches=] a [=chunk=]'s [=type=] when that [=type=] is linked to the value of the [=@kindof=] property through a chain of kindof links. The property should be used in conjunction with a * type to match subclasses of a given class in a taxonomy.

The @module property

References the [=module=] a [=condition=] or [=action=] relates to. Value must be the [=module name=] of the targeted [=module=]. In the absence of an [=@module=] property, [=conditions=] and [=actions=] apply to the goal module.

The @more property

Queries the [=boolean=] flag set to true by the [=rule engine=] on the current [=chunk=] in [=@for=] and [=@do properties=] iterations when there are remaining [=chunks=] to iterate over.

The @pop property

An [=action=] property that removes the last [=atomic value=] from a [=value=]. If the [=value=] to process is already an [=atomic value=], the underlying property is removed.

If a [=@to=] property is also present, the removed [=atomic value=] is assigned to the [=property=] identified by the [=@to=] property. In the absence of a [=@to=] property, the removed [=atomic value=] is discarded.

The @push property

An [=action=] property that pushes an [=atomic value=] to the end of the [=value=] of the property identified by a companion [=@to=] property. If the targeted property does not exist yet, it is created.

In the absence of a [=@to=] property, this operation has no effect.

The @shift property

An [=action=] property that removes the first [=atomic value=] from a [=value=]. If the [=value=] to process is already an [=atomic value=], the underlying property is removed.

If a [=@to=] property is also present, the removed [=atomic value=] is assigned to the [=property=] identified by the [=@to=] property. In the absence of a [=@to=] property, the removed [=atomic value=] is discarded.

The @status property

Queries the [=module buffer/status=] of a [=module buffer=]. The [=rule engine=] sets the status of a [=module buffer=] with the outcome of the [=rule=]'s execution. Most operations are asynchronous, except [=@do clear=], [=@do update=] and [=@do queue=].

The @to property

Companion [=action=] property used in [=@do properties=], [=@for=], [=@pop=], [=@push=], [=@shift=], [=@unshift=] operations.

Meaning and value constraints depend on the operation. See individual operations for details. For instance, when used in a [=@for=] operation, the property specifies the zero-based ending index of the iteration. Value must be an integer. When used in a [=@do properties=] operation, the property specifies the name of the [=module buffer=] onto which to write the current [=chunk=].

The @type property

[=Matches=] a [=chunk=]'s [=type=], or binds a variable to the [=chunk=]'s [=type=].

The @unshift property

An [=action=] property that pushes an [=atomic value=] to the beginning of the [=value=] of the property identified by a companion [=@to=] property. If the targeted property does not exist yet, it is created.

In the absence of a [=@to=] property, this operation has no effect.

TODO: Complete list with additional reserved names: @undefined, @unique, @compile, @index, @map, @priority, @source, @tag, @uncompile, @undefine.

Modules

A module is a [=graph of chunks=] associated with one and only one [=module buffer=]. A [=module=] has a module name that follows the [=name=] data type, and that is typically used to target the [=module=] in [=@module=] properties.

A [=module=] supports [=built-in operations=], and may support additional operations defined by the application when the [=module=] is initialized.

A [=module=] represents a cognitive database on which the [=rule engine=] may operate. It may be viewed as a region in the cerebral cortex, where the [=module buffer=] corresponds to the bundle of nerve fibres connecting to that region.

The [=rule engine=] automatically creates a module named goal, which will therefore always exist in a rule execution context.

The [=@module=] property allows [=conditions=] and [=actions=] to reference the [=module name=] of the [=module=] they relate to. In the absence of an [=@module=] property, [=conditions=] and [=actions=] apply to the goal module.

Module buffers

A module buffer is a container for at most one [=chunk=]. The mammalian brain is richly connected locally, and weakly remotely. A [=module buffer=] mimics the constrained communication capacity of the mammalian brains for such long range communication.

The [=rule engine=] operates on a module's [=graph of chunks=] through its [=module buffer=].

A [=module buffer=] has a status, whose value is initially [=status/okay=], and which reflects the outcome of the last [=action=] held and performed by the [=module buffer=]. Values can be:

pending

The operation is still pending.

okay

The operation completed successfully.

Switch to ok? This seems more common in technologies (e.g. HTTP) than okay.

forbidden

The operation was not allowed.

nomatch

The operation failed because there was no [=matching chunk=] for the [=action=] in the targeted [=module=].

failed

The operation failed.

A [=module buffer=] has a queue, which is a set of [=chunks=], initially empty. Each chunk in the [=queue=] has a priority, represented by an integer from 1 to 10, with 10 the highest priority. The default [=priority=] is 5. [=Chunks=] are ordered by descending [=priority=] in a [=queue=]. When [=priorities=] match, [=chunks=] are ordered by insertion order (first in, first out).

The @priority property lets [=actions=] set the [=priority=] of a [=chunk=] when they add it to a [=queue=].

A [=module buffer=] is automatically cleared when the [=actions=] associated with the [=rule=] it contained did not update the contents of targeted [=module buffers=]. This pops the [=queue=] if it is not already empty.

Built-in operations

The [=@do=] property lets an [=action=] specify the graph algorithm or operation to execute. The default operation is to update the [=module buffer=], similar to calling [=@do update=].

All [=modules=] support the built-in operations defined in this section.

All [=modules=] also support the [=@for=] property to iterate over a set of items in a comma separated list. This has the effect of loading the [=module buffer=] with the first item in the list. The index range can optionally be specified with [=@from=] and [=@to=], where the first item in the list has index 0.

The @do clear operation

Clears the [=module buffer=] and pops the [=queue=].

The @do delete operation

Forgets [=matching chunks=] in the [=graph of chunks=].

The @do get operation

Looks for a [=matching chunk=] in the module's [=graph of chunks=] and puts a copy of it in the [=module buffer=] if found. Modifying the [=properties=] of a [=chunk=] copied from a [=graph of chunks=] (e.g. through a [=@do update=] operation) will not alter the underlying [=graph of chunks=]. To save an updated [=chunk=], a [=@do put=] or [=@do patch=] command needs to be issued.

The @do next operation

Loads the next [=matching chunk=] to the targeted [=module buffer=] in an implementation dependent order.

The @do patch operation

If the [=chunk=] in the targeted [=module buffer=] has the same [=identifier=] as a [=chunk=] in the underlying [=graph of chunks=], patches the [=chunk=] in the [=graph of chunks=] with the [=properties=] that appear in the [=module buffer=], excluding [=properties=] prefixed with an @ character.

What is the expected behavior when the action has an @id property?

The @do properties operation

Initiates an iteration over the [=properties=] of the [=matching chunk=] that do not begin with @. Each [=property=] is mapped to a new [=chunk=] with the same [=type=] as the [=action=]. The action's properties are copied over, and name and value properties are used to pass the [=property=] name and value respectively. The [=@more=] property is given the value true unless this is the final [=chunk=] in the iteration, in which case [=@more=] is given the value false. By default, the iteration is written to the same [=module buffer=] as designated by the [=action=] that initiated it. However, you can designate a different [=module buffer=] with the [=@to=] property. By setting additional properties in the initiating action, you can ensure that the rules used to process the property name and value are distinct from other such iterations.

The @do put operation

Saves the contents of the [=module buffer=] as a [=chunk=] to the module's [=graph of chunks=]. If the [=action=] has an [=@id=] property, this operation will overwrite the [=chunk=] with the same [=identifier=] or will create a new [=chunk=] with the given [=identifier=] if it does not exist already. This operation will also create a new [=chunk=] in the absence of an [=@id=] property.

If a chunk was loaded with @do get, then updated with @do update, would a call to @do patch create a new chunk if @id is not set? Or would it rather update the chunk in the graph?

The @do queue operation

Pushes a [=chunk=] to the [=queue=] for the [=module buffer=]. If a [=@priority=] property is set to an integer value between 1 and 10, the [=priority=] of the [=chunk=] in the [=queue=] is set to that value.

The @do update operation

Directly updates the [=module buffer=] if the chunk [=type=] for the [=action=] is the same as the [=chunk=] currently held in the [=module buffer=]. The operation updates the properties given in the [=action=], leaving aside properties prefixed with an @ character, and leaving other existing properties unchanged. If the chunk [=type=] for the action is not the same as the [=chunk=] currently held in the [=module buffer=], a new [=chunk=] is created with the properties given in the [=action=], excluding properties prefixed with an @ character. This is the default action when an [=action=] has neither an [=@do=] property nor an [=@for=] property.

How can one update the properties prefixed with an @ character in a [=chunk=] such as [=@context=], [=@subject=] or [=@object=]?

Name operators

Operators defined in this section may be used on their own as [=names=] or in front of [=names=] to alter their meaning.

The variable operator ?

The variable operator ? may be prepended to a [=name=] when used as a [=property=] value to turn the [=name=] into a variable which represents a symbolic name to a [=value=]. A [=variable=] gets bound to a [=value=] in a [=condition=]. The [=value=] can then be referenced in [=actions=] using the [=variable=]'s name. Effectively, [=variables=] allow applications to copy information from rule [=conditions=] to rule [=actions=].

[=Variables=] are scoped to the [=rule=] where they appear.

[=Variables=] are [=bound=] to a [=value=] the first time they appear in a [=condition=]. Subsequent occurrences of the same [=variable=] in a [=condition=] reference their [=value=].

[=Variables=] may represent any type of [=value=]. In particular, a [=variable=] that [=matches=] a property whose value is a list of [=atomic values=] gets bound to the list of [=atomic values=].

A [=condition=] that contains a [=property=] with a list of [=variables=] [=matches=] a list of [=atomic values=] that has the same length. The [=condition=] does not [=match=] when the lengths of the lists differ.

The wild card operator *

The wild card operator * may be used on its own in lieu of a [=name=] in one of the following cases:

The negation operator !

The negation operator ! may be prepended to [=names=] to negate the outcome of their evaluation. The [=negation operator=] may be used in one of the following cases:

The [=negation operator=] cannot be prepended to a [=variable=] that has not yet been [=bound=]. Similarly, the [=negation operator=] cannot be used on its own as an [=atomic value=] in a list. More generally, the [=negation operator=] cannot be used elsewhere than in the cases detailed above.

A second [=negation operator=] prepended to a [=name=] that starts with a [=negation operator=] cancels the effect of the first [=negation operator=].

The reserved name operator @

The reserved name operator @ may be prepended to a [=name=] to denote a reserved name with specific meaning. Most [=reserved names=] are to be used as [=property=] names, typically in [=conditions=] and [=actions=] to control their behavior (see [[[#properties-for-conditions-and-actions]]]). Some of them may be used as chunk [=types=] to denote a chunk with specific meaning (see [[[#mapping-to-rdf]]]).

Rule engine execution

TODO: Describe algorithms for the [=rule engine=].

Chunks documents

Parsing a chunks document

TODO: Formally describe parsing algorithm

If multiple chunk definitions share the same [=identifier=] in a set of chunks, the last definition overrides former definitions.

Chunks grammar

A [=chunks document=] MUST follow the grammar defined below.

Railroad diagrams

This section presents an informative view of the tokens that the grammar defines, in the form of railroad diagrams. These diagrams are provided solely to make it easier to get an intuitive grasp of the syntax of each token.

Mapping to RDF

Linked Data [[LINKED-DATA]], at the basis of RDF [[RDF11-CONCEPTS]], is a way to create a network of standards-based machine interpretable data across different documents and Web sites. It allows an application to start at one piece of Linked Data, and follow embedded links to other pieces of Linked Data that are hosted on different sites across the Web.

[=Names=] used in [=chunks=] are local to the [=graph of chunks=] in which they appear. For [=names=] to be usable as linked data, there needs to be a way to associate them to global identifiers. [=Reserved names=] defined in this section may be used to create such a mapping.

In turn, this mechanism can be used to map a [=graph of chunks=] to an RDF graph and vice versa.

Algorithms to serialize/deserialize a [=graph of chunks=] to an RDF graph and vice versa are out of scope of this document but may be specified in future revisions of it.

The @rdfmap type

The [=@rdfmap=] chunk [=type=] identifies a chunk that defines a mapping between [=names=] and IRIs [[IRI]]. Each [=property=] it defines whose [=name=] does not start with one of the name operators (see [[[#name-operators]]]) creates a mapping between this [=name=] and the property [=value=], interpreted as an IRI.

The [=@rdfmap=] keyword plays the same role in [=chunks=] as the @context keyword in JSON-LD. See the notion of Context in JSON-LD for details [[JSON-LD]]. The term [=context=] and the [=@context=] keyword have a different meaning in [=chunks=] where they identify the specific situation under which a [=chunk=] should be considered to be true. A distinct [=@rdfmap=] keyword is used here to avoid any confusion.

If multiple [=@rdfmap=] chunks create a mapping for the same [=name=], the last definition in overrides previous ones.

An [=@rdfmap=] chunk may also contain a [=@base=] property to create a default IRI namespace for [=names=] and [=@prefix=] properties to define prefixes for compact IRIs.

The @base property

The [=@base=] [=property=] defines a default IRI namespace for [=names=] that are not explicitly declared in an [=@rdfmap=].

There can be only one default IRI namespace for a given [=graph of chunks=]. If [=@base=] is used in multiple [=@rdfmap=] chunks, the last definition overrides previous ones.

The @prefix property

The [=@prefix=] [=property=] can be used in an [=@rdfmap=] chunk to reference a [=chunk=] that defines IRI prefixes, which in turn allow the use of compact IRIs in the [=@rdfmap=] chunk.

The actual [=chunk=] that defines prefixes can have any [=type=]. Each [=property=] it defines whose [=name=] does not start with one of the name operators (see [[[#name-operators]]]) creates a prefix between the property [=name=] and the property [=value=], interpreted as an IRI.

As with [=@base=], in case of conflicts, the definition that gets referenced last overrides former definitions.