Atomese - OpenCog (original) (raw)

Atomese is a collection of structural primitives meant to describe structural relationships as they are witnessed in "reality". This includes descriptions of physical nature, biological nature, psychological, social, cultural, political and economic, and, of course, mathematical and technological. So, software and programming.

Motivation

The idea of representing "everything" is as old as Aristotle. Set theory is an early mathematical framework. This is followed by combinators and lambda calulus, by means of which "anything sayable can be said". Modern math offers Category Theory and Topos Theory, along with Proof Theory and Model Theory as ways of talking about "anything". The goals of mathematicians, however, are not the same as the more entrepreneurial-minded, and the latter have created the trillion dollar computer industry, with only token acknowledgement of the mathematical foundations. The computer industry gives us relational databases, knowledge representation, upper ontologies, and now LLM's, transformers and weights as mechanisms by which "anything" can be represented.

Atomese is an ongoing attempt to roll all of this up into one, and to do so in a way that makes general intelligence algorithmically accessible. Until now, all attempts to extract structure from the universe are complex systems hand-crafted by human engineers. These might be financial credit-worthiness rating systems, or astronomical stellar-redshift analysis tools. The software for these systems are written by humans, applying conventional software development methodologies, using conventional programming languages, designed to make it easy for the human software engineer to perform their task.

What if, instead, we ask: what would it take to make it easy for algorithmic systems to automatically explore and extract structure? To create world-models that can be stored in short-term or long-term memory, to process and transform sensory information, to drive motors and perform actions in the real world? That is, rather than having a small army of humans hand-crafting custom robots for others to use, to instead provide a recursive infrastructure to allow, umm, err, robots to craft themselves? This is the driving vision of Atomese.

History

Atomese originally arose as an attempt by Ben Goertzel and company to combine symbolic AI methods with probability theory, resulting in the definition of PLN, Probabilistic Logic Networks, articulated in several books devoted to the topic. In this articulation, the primitives of knowledge representation theory are mashed up with mathematical logic to provide Nodes and Links, which are general enough to represent almost any kind of relational structure. The base object then becomes a collection of graphs, or, more properly, hypergraphs. To be able to process, digest, reason and manipulate these, these are placed in a (hyper-)graph database, the AtomSpace.

To layer on probability theory onto what is an otherwise purely symbolic representation of nature, the SimpleTruthValue is introduced. This is a pair of floating-point numbers, representing the probability, and the confidence of any given symbolic factual assertion. The goal is to support logical reasoning systems of any type, not only conventional Bayesian inference, but any collection of rule systems and axioms, as might be encountered in mathematical proof theory. This would include, for example, any of the rich varieties of modal logic, but also fuzzy logic, the so-called "non-axiomatic reasoning systems" and statistical-mechanical systems like Markov logic.

The word-phrase TruthValue, and more generally Value, has its roots in mathematical logic, where any given assertion in first-order logic (or higher-order logic) can be assigned a "valuation", indicating it's binary truth/falsehood. Probability theory forces a replacement of crisp 0/1 by a floating-point number. Probabilistic logic (along with neural nets) famously has issues with converging rapidly enough to a given solution. For this reason, an extra float is introduced, the "confidence". This helps, but is still not enough to capture the concept of an ensemble, e.g. a "Bell curve", a Gaussian, or more generally any kind of probability distribution: a "histogram" or more simply "a vector of numbers". This leads to the idea of a FloatValue, and then rapidly to a Value in general, which is a vector of anything at all, representing truths in any ensemble, hypothetical modal universe, a set of Bayesian priors, as the case may be. Of course, vectors of floats are the bread-n-butter of neural nets.

Parallel universes, such as the hypothetical worlds of modal logic, thermodynamic canonical ensembles, the infinite collection of Bayesian priors, or, god forbid, quantum-mechanical decompositions, are often imagined to live "in parallel" or to somehow co-exist temporally. In physical reality, though, the changing network of relationships and likelihoods is time-varying, and usually accessible only through sensory devices, rather than through pure reason. This motivates the recasting of Values as streams that flow data. This relegates the AtomSpace to being a form of memory, a repository for world-models, while flowing streams encapsulate the process of, well, "processing information". This fits well with present-day software theory, which includes descriptions of generators, futures and promises as software primitive constructs for creating sensory agentic systems. The backends of large commercial websites use futures and promises as extremely low-level programming constructs to implement millisecond reaction times when customers click on their favorite TikTok influencer. The point of having streams in Atomese is not to be hopelessly abstract, but to capture an idea that is already widespread in the design and development of agentic software systems.

This brings Atomese to it's present-day state: an infrastructure for symbolic AI, together with a (hyper-)graph database, offering dynamic sensori-motor processing primitives. The hope is that this is an appropriate toolset for agentic systems that can reify, transform and transmute their own content. It remains a research platform to figure out how this is possible, or, perhaps being more honest, if this is possible.

Reification

A short note on reification and transformation. Compilers and assemblers take a structure, represented in high-level language, and convert it to assembly code and thence machine code. This is a form of transformation. Many kinds of rewriting ideas, such as those of values flowing from one CPU register to another due to the action of an instruction, are used as an inspiration for Atomese. It is no accident that Atomese looks like an intermediate representation language. Other influences include VHDL and Verilog: these describe electronic circuits programmatically, and the graphical networks they describe are intended to flow actual electrons, actual charge carriers, when realized as physical wires and transistors. Atomese is again inspired by this, but the idea is that any kind of Value can flow, and not just electrons. Hardly a new idea. The TensorFlow language is for describing GPU networks suitable for flowing deep-learning neural-network data. Flowing tensors is a "thing". (BTW, Links are like tensors, elaborated in detail elsewhere on this wiki.)

Neural Nets

The advent of transformers and LLM's poses a challenge to the vision of Atomese. Why develop an arcane, difficult-to-understand strange programming-database-flow-agentic language, optimized for algorithmic transformation, when you can just ask Mirosoft's Copilot (an LLM-driven code wizard) for help, directly in the English language? If Copilot understands python natively, then why invent an abstract representational language, like Atomese, that only seems to make things harder? Good question, and we don't have a short, sweet, simple answer to that.

The broader answer is that neural-net weight matrices remain inscrutable, and we hope to capture relationship patterns within those weight matrices using Atomese. Exactly how this is to work is a topic of current research. More general hand-waving leads us to blurt out that there is some interplay between the smooth, analytic, differentiable physical spacetime and the abstract mathematical world of symbolic algebra. Forty years ago, we called this "fractals and chaos theory", and the connections between recursion and physics began to be made. Some of the most elegant examples of this are found in Prusinkiewicz's "Algorithmic Botany". LLM's and neural nets remain opaque, however, and do little to expose or explain "what is really going on", never mind their magical-seeming abilities and obvious commercial applications.

So, yes, Atomese is a research project, and no, we are not ignoring neural nets.

A narrower definition of Atomese

Atomese is the concept of writing programs with atoms. It is a programming language, but not one intended for humans; rather, it is a programming language that algorithms and AI's can use. Its a bit verbose and ugly and complicated for humans, but is meant to be easy to manipulate, transform and re-write by software systems, including algorithms written in Atomese itself.

As a language, it borrows ideas from prolog, but with types, from SQL and relational algebra, but without tables, from functional-programming, but with the side-effect of being a graph database. It borrows ideas from intermediate languages, discarding the rest of the compiler. It is a graph-rewriting, term rewriting system with types.

Atomese was originally intended to be a language for knowledge representation (KR): that is, a way of encoding facts and hypothesis, in a machine-readable way, such that the knowledge can be manipulated, data-mined, reasoned with. This language subset was vaguely inspired by Prolog and Datalog. More correctly, it was constructed by layering concepts from mathematical logic onto a graph database: representing logical, symbolic statements as graphs.

Accessing data in the KR graph database required a query language, and as this was created and explored, concepts from relational algebra crept in: ideas from SQL, but without tables, or various concepts about graph query languages. These can be found in the GetLink and BindLink atoms. It is here where the query language (a pattern matching language) becomes closely tied to the concept of a graph rewriting system and a rule engine. Formally, the AtomSpace pattern matcher solves the subgraph isomorphism problem.

However, with increasing sophistication and usage, it was found to be convenient to add programming-language-like features to the knowledge-representation system. One motivation for Atomese was the need to implement procedural scripting and behavior trees within OpenCog -- that is, to define algorithmic snippets that could "do things" in the real world, such as turn robot motors and the like. This resulted in the appropriation of ideas from lambda calculus and functional programming, such as the LambdaLink, PutLink and FunctionLink. It resulted in an extensive type system, with many or most of the basic facilities provided by advanced type systems, such as F# or Agda.

Atomese is not, and was never intended to be a programming language with which humans would write "code", like they might with Java, Python, LISP or Haskell. Rather, it is a language that computer algorithms, such as term rewriting systems and rule engines, would be able to manipulate. It is a language that genetic programming systems, such as MOSES, could manipulate. It is a graph database that pattern mining algorithms could search, manipulate and transform. It is a language designed for easy self-introspection As such, it goes well beyond ordinary object reflection or metaprogramming, and provides an implementation of self-modifying code that is actually usable in ordinary machine-learning scenarios. All Atomese programs are always accessible as term trees, as a core design principle. As such, it resembles, in many ways, the intermediate languages found inside of compilers: the intermediate language is where a compiler can best "understand" the program, and analyze it's structure to perform the needed rewrites and optimizations.

As a runtime, atomese has horrible performance: it is many thousands of times slower than a native compiled language. For now, this is OK, because atomese was not meant to be a runtime; rather, it is meant for knowledge representation and self-introspection. Well, actually, that is not OK; this needs to be fixed. This is an open call for experienced compiler developers to join hands with experienced database developers and create a high-performance system for storing, querying and executing atomese. (So, at this time, Atomese is like the intermediate language in a compiler, but without any existing backed to assembly or byte-code.)

Despite this orientation away from human programmers, some exploratory work is being done on allowing human programmers to create Atomese more simply and elegantly. One idea is to syntactically sugar the Scheme or python shells for making Atomese more concise and transparent when coded in those languages. Another idea is to do something more radical, such as making an Agda Atomese, exploiting dependent type theory (Atomese has a relatively rich type system). Moving in a different direction, ghost allows authors to create chat subsystems that can control robot movements and respond to sensory input.

See also

References