Clojure - The Reader (original) (raw)

Table of Contents

Clojure is a homoiconic language, which is a fancy term describing the fact that Clojure programs are represented by Clojure data structures. This is a very important difference between Clojure (and Common Lisp) and most other programming languages - Clojure is defined in terms of the evaluation of data structures and not in terms of the syntax of character streams/files. It is quite common, and easy, for Clojure programs to manipulate, transform and produce other Clojure programs.

That said, most Clojure programs begin life as text files, and it is the task of the reader to parse the text and produce the data structure the compiler will see. This is not merely a phase of the compiler. The reader, and the Clojure data representations, have utility on their own in many of the same contexts one might use XML or JSON etc.

One might say the reader has syntax defined in terms of characters, and the Clojure language has syntax defined in terms of symbols, lists, vectors, maps etc. The reader is represented by the function read, which reads the next form (not character) from a stream, and returns the object represented by that form.

Since we have to start somewhere, this reference starts where evaluation starts, with the reader forms. This will inevitably entail talking about data structures whose descriptive details, and interpretation by the compiler, will follow.

Reader forms

Symbols

Literals

Lists

Lists are zero or more forms enclosed in parentheses: (a b c)

Vectors

Vectors are zero or more forms enclosed in square brackets: [1 2 3]

Maps

Map namespace syntax

Added in Clojure 1.9

Map literals can optionally specify a default namespace context for keys in the map using a #:ns prefix, where ns is the name of a namespace and the prefix precedes the opening brace { of the map. Additionally, #:: can be used to auto-resolve namespaces with the same semantics as auto-resolved keywords.

A map literal with namespace syntax is read with the following differences from a map without:

For example, the following map literal with namespace syntax:

#:person{:first "Han"
         :last "Solo"
         :ship #:ship{:name "Millennium Falcon"
                      :model "YT-1300f light freighter"}}

is read as:

{:person/first "Han"
 :person/last "Solo"
 :person/ship {:ship/name "Millennium Falcon"
               :ship/model "YT-1300f light freighter"}}

Sets

Sets are zero or more forms enclosed in braces preceded by #: #{:a :b :c}

deftype, defrecord, constructor calls

Added in Clojure 1.3

Macro characters

The behavior of the reader is driven by a combination of built-in constructs and an extension system called the read table. Entries in the read table provide mappings from certain characters, called macro characters, to specific reading behavior, called reader macros. Unless indicated otherwise, macro characters cannot be used in user symbols.

Quote (')

'form(quote form)

Character (\)

As per above, yields a character literal. Example character literals are: \a \b \c.

The following special character literals can be used for common characters: \newline, \space, \tab, \formfeed, \backspace, and \return.

Unicode support follows Java conventions with support corresponding to the underlying Java version. A Unicode literal is of the form \uNNNN, for example \u03A9 is the literal for Ω.

Single-line comment, causes the reader to ignore everything from the semicolon to the end-of-line.

Deref (@)

@form ⇒ (deref form)

Metadata (^)

Metadata is a map associated with some kinds of objects: Symbols, Lists, Vector, Sets, Maps, tagged literals returning an IMeta, and record, type, and constructor calls. The metadata reader macro first reads the metadata and attaches it to the next form read (see with-meta to attach meta to an object):
^{:a 1 :b 2} [1 2 3] yields the vector [1 2 3] with a metadata map of {:a 1 :b 2}.

A shorthand version allows the metadata to be a simple symbol or string, in which case it is treated as a single entry map with a key of :tag and a value of the (resolved) symbol or string, e.g.:
^String x is the same as ^{:tag java.lang.String} x.

A shorthand version for type signatures allows the metadata to be a vector, in which case it is treated as a single entry map with a key of :param-tags and a value of the (resolved) type hints, a vector of :tag values or _, e.g.: ^[String long _] is the same as ^{:param-tags [java.lang.String long _]}. See :param-tags on how param-tags are used.

Another shorthand version allows the metadata to be a keyword, in which case it is treated as a single entry map with a key of the keyword and a value of true, e.g.:
^:dynamic x is the same as ^{:dynamic true} x

Metadata can be chained in which case they are merged from right to left.

Dispatch (#)

The dispatch macro causes the reader to use a reader macro from another table, indexed by the character following

Syntax-quote (`, note, the "backquote" character), Unquote (~) and Unquote-splicing (~@)

For all forms other than Symbols, Lists, Vectors, Sets and Maps, `x is the same as 'x.

For Symbols, syntax-quote resolves the symbol in the current context, yielding a fully-qualified symbol (i.e. namespace/name or fully.qualified.Classname). If a symbol is non-namespace-qualified and ends with '#', it is resolved to a generated symbol with the same name to which '_' and a unique id have been appended. e.g. x# will resolve to x_123. All references to that symbol within a syntax-quoted expression resolve to the same generated symbol.

For Lists/Vectors/Sets/Maps, syntax-quote establishes a template of the corresponding data structure. Within the template, unqualified forms behave as if recursively syntax-quoted, but forms can be exempted from such recursive quoting by qualifying them with unquote or unquote-splicing, in which case they will be treated as expressions and be replaced in the template by their value, or sequence of values, respectively.

For example:

user=> (def x 5)
user=> (def lst '(a b c))
user=> `(fred x ~x lst ~@lst 7 8 :nine)
(user/fred user/x 5 user/lst a b c 7 8 :nine)

The read table is currently not accessible to user programs.

extensible data notation (edn)

Clojure’s reader supports a superset of extensible data notation (edn). The edn specification is under active development, and complements this document by defining a subset of Clojure data syntax in a language-neutral way.

Tagged Literals

Tagged literals are Clojure’s implementation of edn tagged elements.

When Clojure starts, it searches for files named data_readers.clj or data_readers.cljc at the root of the classpath. Each such file must contain a Clojure map of symbols, like this:

{foo/bar my.project.foo/bar
 foo/baz my.project/baz}

The key in each pair is a tag that will be recognized by the Clojure reader. The value in the pair is the fully-qualified name of a Var which will be invoked by the reader to parse the form following the tag. For example, given the data_readers.clj file above, the Clojure reader would parse this form:

by invoking the Var #'my.project.foo/bar on the vector [1 2 3]. The data reader function is invoked on the form AFTER it has been read as a normal Clojure data structure by the reader. For your own data reader functions, you should report errors by throwing instances of RuntimeException with messages providing error information.

Reader tags without namespace qualifiers are reserved for Clojure. Default reader tags are defined in default-data-readers but may be overridden in data_readers.clj / data_readers.cljc or by rebinding *data-readers*. If no data reader is found for a tag, the function bound in *default-data-reader-fn* will be invoked with the tag and value to produce a value. If *default-data-reader-fn* is nil (the default), a RuntimeException will be thrown.

If a data_readers.cljc is provided, it is read with the same semantics as any other cljc source file with reader conditionals.

Built-in tagged literals

Clojure 1.4 introduced the instant and UUID tagged literals. Instants have the format #inst "yyyy-mm-ddThh:mm:ss.fff+hh:mm". NOTE: Some of the elements of this format are optional. See the code for details. The default reader will parse the supplied string into a java.util.Date by default. For example:

(def instant #inst "2018-03-28T10:48:00.000")
(= java.util.Date (class instant))
;=> true

Since *data-readers* is a dynamic var that can be bound, you can replace the default reader with a different one. For example, clojure.instant/read-instant-calendar will parse the literal into a java.util.Calendar, while clojure.instant/read-instant-timestamp will parse it into a java.util.Timestamp:

(binding [*data-readers* {'inst read-instant-calendar}]
  (= java.util.Calendar (class (read-string (pr-str instant)))))
;=> true

(binding [*data-readers* {'inst read-instant-timestamp}]
  (= java.util.Timestamp (class (read-string (pr-str instant)))))
;=> true

The #uuid tagged literal will be parsed into a java.util.UUID:

(= java.util.UUID (class (read-string "#uuid \"3b8a31ed-fd89-4f1b-a00f-42e3d60cf5ce\"")))
;=> true

Default data reader function

If no data reader is found when reading a tagged literal, the *default-data-reader-fn* is invoked. You can set your own default data reader function and the provided tagged-literal function can be used to build an object that can store an unhandled literal. The object returned by tagged-literal supports keyword lookup of the :tag and :form:

(set! *default-data-reader-fn* tagged-literal)

;; read #object as a generic TaggedLiteral object
(def x #object[clojure.lang.Namespace 0x23bff419 "user"])

[(:tag x) (:form x)]
;=> [object [clojure.lang.Namespace 599782425 "user"]]

Reader Conditionals

Clojure 1.7 introduced a new extension (.cljc) for portable files that can be loaded by multiple Clojure platforms. The primary mechanism for managing platform-specific code is to isolate that code into a minimal set of namespaces, and then provide platform-specific versions (.clj/.class or .cljs) of those namespaces.

In cases where is not feasible to isolate the varying parts of the code, or where the code is mostly portable with only small platform-specific parts, 1.7 also introduced reader conditionals, which are supported only in cljc files and at the default REPL. Reader conditionals should be used sparingly and only when necessary.

Reader conditionals are a new reader dispatch form starting with #? or #?@. Both consist of a series of alternating features and expressions, similar to cond. Every Clojure platform has a well-known "platform feature" - :clj, :cljs, :cljr. Each condition in a reader conditional is checked in order until a feature matching the platform feature is found. The reader conditional will read and return that feature’s expression. The expression on each non-selected branch will be read but skipped. A well-known :default feature will always match and can be used to provide a default. If no branches match, no form will be read (as if no reader conditional expression was present).

| | Implementors of non-official Clojure platforms should use a qualified keyword for their platform feature to avoid name collisions. Unqualified platform features are reserved for official platforms. | | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |

The following example will read as Double/NaN in Clojure, js/NaN in ClojureScript, and nil in any other platform:

#?(:clj     Double/NaN
   :cljs    js/NaN
   :default nil)

The syntax for #?@ is exactly the same but the expression is expected to return a collection that can be spliced into the surrounding context, similar to unquote-splicing in syntax quote. Use of reader conditional splicing at the top level is not supported and will throw an exception. An example:

[1 2 #?@(:clj [3 4] :cljs [5 6])]
;; in clj =>        [1 2 3 4]
;; in cljs =>       [1 2 5 6]
;; anywhere else => [1 2]

The read and read-string functions optionally take a map of options as a first argument. The current feature set and reader conditional behavior can be set in the options map with these keys and values:

  :read-cond - :allow to process reader conditionals, or
               :preserve to keep all branches
  :features - persistent set of feature keywords that are active

An example of how to test ClojureScript reader conditionals from Clojure:

(read-string
  {:read-cond :allow
   :features #{:cljs}}
  "#?(:cljs :works! :default :boo)")
;; :works!

However, note that the Clojure reader will always inject the platform feature :clj as well. For platform-agnostic reading, see tools.reader.

If the reader is invoked with {:read-cond :preserve}, the reader conditional and non-executed branches will be preserved, as data, in the returned form. The reader-conditional will be returned as a type that supports keyword retrieval for keys with :form and a :splicing? flag. Read but skipped tagged literals will be returned as a type that supports keyword retrieval for keys with :form and :tag keys.

(read-string
  {:read-cond :preserve}
  "[1 2 #?@(:clj [3 4] :cljs [5 6])]")
;; [1 2 #?@(:clj [3 4] :cljs [5 6])]