Batch HTTP endpoint (original) (raw)

Not everybody can use Java embedded to do fast batch graph import, so we should have a new streaming http endpoint to dump tons of vertices and edges in CSV and JSONL format.

POST /api/v1/batch/{database}

Should support two input formats: JSONL (newline-delimited JSON) and CSV. Both are streamed — the server never loads the entire payload into memory, so you can push millions of records in a single request.

JSONL Format

{"@type":"vertex","@class":"Person","@id":"t1","name":"Alice","age":30} {"@type":"vertex","@class":"Person","@id":"t2","name":"Bob","age":25} {"@type":"edge","@class":"KNOWS","@from":"t1","@to":"t2","since":2020}

CSV Format

@type,@class,@id,name,age vertex,Person,t1,Alice,30 vertex,Person,t2,Bob,25

@type,@class,@from,@to,since edge,KNOWS,t1,t2,2020

In both formats, vertices come first, then edges. Vertices can have temporary IDs (@id) that edges reference via @from/@to. Edges can also reference existing database RIDs directly (e.g., #12:0).

Temporary ID Mapping

The response includes an idMapping object so you know what RIDs were assigned:

{ "verticesCreated": 2, "edgesCreated": 1, "elapsedMs": 42, "idMapping": {"t1": "#9:0", "t2": "#9:1"} }

Tuning via Query Parameters

All GraphBatch configuration options are exposed as query parameters:

Parameter Default Description
batchSize 100000 Max edges buffered before auto-flush
lightEdges false Property-less edges stored as connectivity only (saves ~33% I/O)
wal false Enable Write-Ahead Logging for crash safety
parallelFlush true Parallelize edge connection across async threads
preAllocateEdgeChunks true Pre-allocate edge segments on vertex creation
edgeListInitialSize 2048 Initial segment size in bytes (64–8192)
bidirectional true Connect both outgoing and incoming edges
commitEvery 50000 Edges per sub-transaction within a flush
expectedEdgeCount 0 Hint for auto-tuning batch size

Examples

curl (JSONL):

curl -X POST "http://localhost:2480/api/v1/batch/mydb?lightEdges=true"
-u root:password
-H "Content-Type: application/x-ndjson"
--data-binary @graph-data.jsonl

curl (CSV):

curl -X POST "http://localhost:2480/api/v1/batch/mydb"
-u root:password
-H "Content-Type: text/csv"
--data-binary @graph-data.csv

Python:

import requests

data = ( '{"@type":"vertex","@class":"Person","@id":"p1","name":"Alice"}\n' '{"@type":"vertex","@class":"Person","@id":"p2","name":"Bob"}\n' '{"@type":"edge","@class":"KNOWS","@from":"p1","@to":"p2"}\n' )

resp = requests.post( "http://localhost:2480/api/v1/batch/mydb?lightEdges=true", auth=("root", "password"), headers={"Content-Type": "application/x-ndjson"}, data=data, ) print(resp.json())

{'verticesCreated': 2, 'edgesCreated': 1, 'elapsedMs': 15, 'idMapping': {'p1': '#9:0', 'p2': '#9:1'}}

JavaScript (Node.js):

const resp = await fetch("http://localhost:2480/api/v1/batch/mydb", { method: "POST", headers: { "Content-Type": "application/x-ndjson", Authorization: "Basic " + btoa("root:password"), }, body: [ '{"@type":"vertex","@class":"Person","@id":"p1","name":"Alice"}', '{"@type":"vertex","@class":"Person","@id":"p2","name":"Bob"}', '{"@type":"edge","@class":"KNOWS","@from":"p1","@to":"p2"}', ].join("\n"), }); console.log(await resp.json());

Tip: For maximum throughput, group vertices by type in the input. The endpoint batches consecutive same-type vertices into a single createVertices() call. Interleaving types forces smaller batches.

Tip: The endpoint is NOT atomic by design — GraphBatch commits internally in chunks for maximum throughput. Treat it as a bulk-loading operation, not a transactional one. The response tells you exactly how many records were committed.