Bidirectional parsing a functional / logic perspective (original) (raw)
Related papers
The design and implementation of Object Grammars
Science of Computer Programming, 2014
An Object Grammar is a variation on traditional BNF grammars, where the notation is extended to support declarative bidirectional mappings between text and object graphs. The two directions for interpreting Object Grammars are parsing and formatting. Parsing transforms text into an object graph by recognizing syntactic features and creating the corresponding object structure. In the reverse direction, formatting recognizes object graph features and generates an appropriate textual presentation. The key to Object Grammars is the expressive power of the mapping, which decouples the syntactic structure from the graph structure. To handle graphs, Object Grammars support declarative annotations for resolving textual names that refer to arbitrary objects in the graph structure. Predicates on the semantic structure provide additional control over the mapping. Furthermore, Object Grammars are compositional so that languages may be defined in a modular fashion. We have implemented our approach to Object Grammars as one of the foundations of the Ensō system and illustrate the utility of our approach by showing how it enables definition and composition of domain-specific languages (DSLs).
DyLan: Parser for Dynamic Syntax
2011
This document describes some of the details of the prototype implementation of the Dynamic Syntax (DS) grammar formalism, DyLan. As such, it should be read in conjunction with the documentation (Javadoc) that comes with the implementation, and various papers/books that describe the Dynamic Syntax framework itself (
Compiling declarative specifications of parsing algorithms
Database and Expert Systems …, 2007
The parsing schemata formalism allows us to describe parsing algorithms in a simple, declarative way by capturing their fundamental semantics while abstracting low-level detail. In this work, we present a compilation technique allowing the automatic transformation of parsing schemata to efficient executable implementations of their corresponding algorithms. Our technique is general enough to be able to handle all kinds of schemata for context-free grammars, tree adjoining grammars and other grammatical formalisms, providing an extensibility mechanism which allows the user to define custom notational elements.
Concrete Syntax with Black Box Parsers
The Art, Science, and Engineering of Programming, 2019
Context: Meta programming consists for a large part of matching, analyzing, and transforming syntax trees. Many meta programming systems process abstract syntax trees, but this requires intimate knowledge of the structure of the data type describing the abstract syntax. As a result, meta programming is errorprone, and meta programs are not resilient to evolution of the structure of such ASTs, requiring invasive, fault-prone change to these programs. Inquiry: Concrete syntax patterns alleviate this problem by allowing the meta programmer to match and create syntax trees using the actual syntax of the object language. Systems supporting concrete syntax patterns, however, require a concrete grammar of the object language in their own formalism. Creating such grammars is a costly and error-prone process, especially for realistic languages such as Java and C++. Approach: In this paper we present Concretely, a technique to extend meta programming systems with pluggable concrete syntax patterns, based on external, black box parsers. We illustrate Concretely in the context of Rascal, an open-source meta programming system and language workbench, and show how to reuse existing parsers for Java, JavaScript, and C++. Furthermore, we propose Tympanic, a DSL to declaratively map external AST structures to Rascal's internal data structures. Tympanic allows implementors of Concretely to solve the impedance mismatch between object-oriented class hierarchies in Java and Rascal's algebraic data types. Both the algebraic data type and AST marshalling code is automatically generated. Knowledge: The conceptual architecture of Concretely and Tympanic supports the reuse of pre-existing, external parsers, and their AST representation in meta programming systems that feature concrete syntax patterns for matching and constructing syntax trees. As such this opens up concrete syntax pattern matching for a host of realistic languages for which writing a grammar from scratch is time consuming and error-prone, but for which industry-strength parsers exist in the wild. Grounding: We evaluate Concretely in terms of source lines of code (SLOC), relative to the size of the AST data type and marshalling code. We show that for real programming languages such as C++ and Java, adding support for concrete syntax patterns takes an effort only in the order of dozens of SLOC. Similarly, we evaluate Tympanic in terms of SLOC, showing an order of magnitude of reduction in SLOC compared to manual implementation of the AST data types and marshalling code. Importance: Meta programming has applications in reverse engineering, reengineering, source code analysis, static analysis, software renovation, domain-specific language engineering, and many others. Processing of syntax trees is central to all of these tasks. Concrete syntax patterns improve the practice of constructing meta programs. The combination of Concretely and Tympanic has the potential to make concrete syntax patterns available with very little effort, thereby improving and promoting the application of meta programming in the general software engineering context.
Generic conversions of abstract syntax representations
Proceedings of the 8th ACM SIGPLAN workshop on Generic programming - WGP '12, 2012
In this paper we present a datatype-generic approach to syntax with variable binding. A universe specifies the binding and scoping structure of object languages, including binders that bind multiple variables as well as sequential and recursive scoping. Two interpretations of the universe are given: one based on parametric higher-order abstract syntax and one on well-typed de Bruijn indices. The former provides convenient interfaces to embedded domain-specific languages, but is awkward to analyse and manipulate directly, while the latter is a convenient representation in implementations, but is unusable as a surface language. We show how to generically convert from the parametric HOAS interpretation to the de Bruijn interpretation thereby taking the pain from DSL developer to write the conversion themselves.
We provide a systematic transformation of an LL(1) grammar to an object model that consists of • an object structure representing the non-terminal symbols and their corresponding grammar production rules, • a union of classes representing the terminal symbols (tokens). We present a variant form of the visitor pattern and apply it to the above union of token classes to model a predictive recursive descent parser on the given grammar. Parsing a non-terminal is represented by a visitor to the tokens. For non-terminals that have more than one production rule, the corresponding visitors are chained together according to the chain of responsibility pattern in order to be processed correctly by a valid token. The abstract factory pattern, where each concrete factory corresponds to a nonterminal symbol, is used to manufacture appropriate parsing visitors. Our object-oriented formulation for predictive recursive descent parsing eliminates the traditional construction of the predictive parsing table and yields a parser that is declarative and has minimal conditionals. It not only serves to teach standard techniques in parsing but also as a non-trivial exercise of object modeling for objects-first introductory courses.
Maptool — supporting modular syntax development
Lecture Notes in Computer Science, 1996
In building textual translators, implementors often distinguish between a concrete syntax and an abstract syntax. The concrete syntax describes the phrase structure of the input language and the abstract syntax describes a tree structure that can be used as the basis for performing semantic computations. Having two grammars imposes the requirement that there exist a mapping from the concrete syntax to the abstract syntax. The research presented in this paper led to a tool, called Maptool, that is designed to simplify the development of the two grammars. Maptool supports a modular approach to syntax development that mirrors the modularity found in semantic computations. This is done by allowing users to specify each of the syntaxes only partially as long as the sum of the fragments allows deduction of the complete syntaxes.
Towards a flexible syntax/semantics interface
2004
We present a syntax/semantics interface that was developed having in mind a set of problems identified in system Edite, which wasbased on a traditional syntax/semantics interface. In our syntax/semantics interface, syntactic and semantic rules are independent, semantic rules are hierarchically organized, and partial analysis can be produced. Resumo. Apresentamos uma interface sintaxe/semântica que foi desenvolvida tendo em mente um conjunto de problemas identificados no sistema Edite, uma interface sintaxe/semântica tradicional. Na nossa interface sintaxe/semântica as regras sintácticas e semânticas são independentes, as regras semânticas estão hierarquicamente organizadas, e podem ser produzidos resultados parciais.
Lecture Notes in Computer Science, 2009
It is a time-honored fashion to implement a domain-specific language (DSL) by translation to a general-purpose language. Such an implementation is more portable, but an unidiomatic translation jeopardizes performance because, in practice, language implementations favor the common cases. This tension arises especially when the domain calls for complex control structures. We illustrate this tension by revisiting Landin's original correspondence between Algol and Church's lambdanotation.