Fennel: ExecStreamDesign Struct Reference (original) (raw)

Detailed Description

Overview

This document describes the components making up the ExecStream library, which is Fennel's infrastructure for execution of queries and data manipulation. The focus is on theory; for practice, see ExecStreamHowTo.

ExecStreamGraph Structure

Fennel queries are physically implemented as dataflow graphs, where each graph vertex is a specialized execution processor called an ExecStream (sometimes also referred to as an execution object or XO). A related collection of streams is manipulated as a unit called an ExecStreamGraph.

Traditional DBMS executors use a tree dataflow structure together with a simple "iterator" model, where a fetch request on a top-level stream is implemented by recursively fetching from lower-level streams until leaves are reached. Fennel departs from this in two important ways:

Non-tree graph structures such as DAG's are allowed. Currently cycles are prohibited, but that may change when CONNECT BY query processing is implemented. This makes possible non-traditional query processing such as global query optimization (commonly used subqueries can be shared by multiple user-level queries).
Instead of relying on implicit scheduling by allowing streams to invoke each other directly, Fennel streams are completely passive and never call other streams. Instead, a separate scheduler object is responsible for governing dataflow order and invoking streams in series or in parallel. This enables a variety of features such as efficient TOP N query processing, latency/throughput tradeoffs, and adaptive vertical parallelism.

The diagram below illustrates a graph for carrying out a join query:

In this case, the graph structure is a tree, where the "leaves" read from table storage and produce tuples. These tuples flow rightward, getting combined and transformed, until they are emitted by the "root" node (the Calc stream) and returned to the user who issued the query.

NOTE: the DiskBuffer stream implies additional dataflow to and from disk, but this external flow is not managed or understood by the ExecStreamGraph. It is entirely encapsulated by the DiskBuffer stream. Likewise, the BTreeScan streams imply dataflow from disk.

Here is some common ExecStreamGraph terminology:

stream: a vertex in an ExecStreamGraph.
dataflow: a directed edge in an ExecStreamGraph. Streams are connected by dataflow edges. In the example diagram, there is dataflow between the CartesianJoin and the Calc, but there is no dataflow between the two instances of BTreeScan.
upstream: reachable by traversing dataflow edges in the opposite direction of the arrows.
downstream: reachable by traversing dataflow edges in the direction of the arrows.
producer: in a dataflow edge, the stream which produces the data. In diagrams, the dataflow arrow is oriented so that it originates with the producer and terminates with the consumer (pointed to by the arrowhead). In the example diagram, the CartesianJoin stream is a producer with respect to the Calc. A producer is upstream from its consumer.
consumer: in a dataflow edge, the stream which consumes the data generated by the producer. In the example diagram, the Calc stream is a consumer with respect to the CartesianJoin. A consumer is downstream from its producer.
source: a vertex which is not a consumer. In the example diagram, both instances of BTreeScan are sources. When the graph structure is a tree, sources are also referred to as leaves.
sink: a vertex which is not a producer. In the example diagram, the Calc stream is a sink. When the graph structure is a tree, the (unique) sink is also referred to as the root.
input: a dataflow edge as seen by its consumer. Somtimes also used to refer to the corresponding producer. In the example diagram, the edge originating from the CartesianJoin is an input to the Calc.
output: a dataflow edge as seen by its producer. Sometimes also used to refer to the corresponding consumer. In the example diagram, the edge originating from a BTreeScan is an output of that scan.

ExecStream Classes

The diagram below shows some typical vertex types used in building query graphs. These are common enough that they have corresponding abstract base classes from which concrete stream implementations derive:

Memory Buffers

Streams consume tuple data from input buffers and produce tuple data into output buffers. This interaction is mediated by objects known as buffer accessors (encapsulated by class ExecStreamBufAccessor). The buffer access design allows for both by-value and by-reference semantics, with the goal being to minimize the number of copy operations required throughout the tree.

Below is the same graph from the earlier example, but this time embellished with extra buffers and buffer accessors:

The small rectangles are buffer accessors. One buffer accessor is associated with each dataflow edge, and for each such edge, the producer and consumer streams retain references to the corresponding accessor. Also note that two extra MemBuffer streams have been added to the graph. The job of these "adapter" streams is to allocate memory to be written and read by the adjacent streams. For example, as the CartesianJoin stream produces join tuples, it writes them into the downstream MemBuffer. The Calc stream reads them from that same MemBuffer.

There are no MemBuffer streams adjacent to the DiskBuffer stream. The reason is that the DiskBuffer stream is capable of allocating pages directly from the cache for I/O purposes. It can provide these pages to the BTreeScan for writing, and to the CartesianJoin for reading. This way, two extra copies are avoided. Each stream is responsible for declaring its buffer provisioning requirements so that each dataflow can be optimized automatically.

ExecStream Lifecycle

The ExecStream and ExecStreamGraph classes have a similar lifecyle. An ExecStream is always an element in an ExecStreamGraph; in the simple case this is the same graph, so the lifecycles coincide. It is also possible for a stream to change graphs: that is, it can be constructed in one graph (its preparation context), and then that graph can be merged into a larger graph (its execution context). Note that merger is allowed but not arbitrary edits, which could easily produce an invalid graph. This feature is intended as a basis for query optimization across multiple statements, etc.

After construction, the stream starts out in "embryonic" form, meaning that it is not yet fully ready to execute, but all of its parameters are defined. This state is represented via the ExecStreamEmbryo class (and corresponding ExecStreamGraphEmbryo).
Embryonic streams are constructed and added to an embryonic graph, and then dataflow edges are added. As dataflow is defined, extra adapter streams are inserted automatically as needed.
Once the graph is fully defined, it can be prepared. This involves
1. performing a topological sort by dataflow dependencies
2. creating buffer accessors for each dataflow edge
3. binding buffer accessors to the adjacent streams which will employ them during execution
4. preparing each stream in dataflow order (from sources to sinks); stream implementations provide a prepare method which precomputes data structures needed during execution
After the graph has been prepared, resources are set aside for the graph by an ExecStreamGovernor, based on the resource requirements of each stream in the graph. See ResourceGovernorDesign for a detailed description of how resources are allocated to streams within a graph.
Before execution, a prepared graph must be opened. Opening a graph clears the state of all buffer accessors and opens each of the constituent streams; stream implementations provide an open method which clears stream-specific state and allocates any resources (such as memory, connections, or threads) needed during execution.
Once opened, a graph can be executed via an ExecStreamScheduler.
While open, streams can be reopened individually (usually by adjacent streams). For example, a nested loop join needs to reset the state of its "inner" input for each row from its "outer" input.
Once execution completes or is aborted, the graph should be closed to release all resources acquired during execution.
A graph can be closed and then opened again repeatedly.
A graph which is currently closed (possibly never opened) can be deleted.

ExecStream Execution

For details on how ExecStreams are executed, please see the SchedulerDesign.

Definition at line 273 of file ExecStreamDesign.cpp.

The documentation for this struct was generated from the following file:

/home/pub/open/dev/fennel/exec/ExecStreamDesign.cpp

Generated on Mon Jun 22 04:00:30 2009 for Fennel by 1.5.1