Batch HTTP endpoint (original) (raw)
Not everybody can use Java embedded to do fast batch graph import, so we should have a new streaming http endpoint to dump tons of vertices and edges in CSV and JSONL format.
POST /api/v1/batch/{database}
Should support two input formats: JSONL (newline-delimited JSON) and CSV. Both are streamed — the server never loads the entire payload into memory, so you can push millions of records in a single request.
JSONL Format
{"@type":"vertex","@class":"Person","@id":"t1","name":"Alice","age":30} {"@type":"vertex","@class":"Person","@id":"t2","name":"Bob","age":25} {"@type":"edge","@class":"KNOWS","@from":"t1","@to":"t2","since":2020}
CSV Format
@type,@class,@id,name,age vertex,Person,t1,Alice,30 vertex,Person,t2,Bob,25
@type,@class,@from,@to,since edge,KNOWS,t1,t2,2020
In both formats, vertices come first, then edges. Vertices can have temporary IDs (@id) that edges reference via @from/@to. Edges can also reference existing database RIDs directly (e.g., #12:0).
Temporary ID Mapping
The response includes an idMapping object so you know what RIDs were assigned:
{ "verticesCreated": 2, "edgesCreated": 1, "elapsedMs": 42, "idMapping": {"t1": "#9:0", "t2": "#9:1"} }
Tuning via Query Parameters
All GraphBatch configuration options are exposed as query parameters:
| Parameter | Default | Description |
|---|---|---|
| batchSize | 100000 | Max edges buffered before auto-flush |
| lightEdges | false | Property-less edges stored as connectivity only (saves ~33% I/O) |
| wal | false | Enable Write-Ahead Logging for crash safety |
| parallelFlush | true | Parallelize edge connection across async threads |
| preAllocateEdgeChunks | true | Pre-allocate edge segments on vertex creation |
| edgeListInitialSize | 2048 | Initial segment size in bytes (64–8192) |
| bidirectional | true | Connect both outgoing and incoming edges |
| commitEvery | 50000 | Edges per sub-transaction within a flush |
| expectedEdgeCount | 0 | Hint for auto-tuning batch size |
Examples
curl (JSONL):
curl -X POST "http://localhost:2480/api/v1/batch/mydb?lightEdges=true"
-u root:password
-H "Content-Type: application/x-ndjson"
--data-binary @graph-data.jsonl
curl (CSV):
curl -X POST "http://localhost:2480/api/v1/batch/mydb"
-u root:password
-H "Content-Type: text/csv"
--data-binary @graph-data.csv
Python:
import requests
data = ( '{"@type":"vertex","@class":"Person","@id":"p1","name":"Alice"}\n' '{"@type":"vertex","@class":"Person","@id":"p2","name":"Bob"}\n' '{"@type":"edge","@class":"KNOWS","@from":"p1","@to":"p2"}\n' )
resp = requests.post( "http://localhost:2480/api/v1/batch/mydb?lightEdges=true", auth=("root", "password"), headers={"Content-Type": "application/x-ndjson"}, data=data, ) print(resp.json())
{'verticesCreated': 2, 'edgesCreated': 1, 'elapsedMs': 15, 'idMapping': {'p1': '#9:0', 'p2': '#9:1'}}
JavaScript (Node.js):
const resp = await fetch("http://localhost:2480/api/v1/batch/mydb", { method: "POST", headers: { "Content-Type": "application/x-ndjson", Authorization: "Basic " + btoa("root:password"), }, body: [ '{"@type":"vertex","@class":"Person","@id":"p1","name":"Alice"}', '{"@type":"vertex","@class":"Person","@id":"p2","name":"Bob"}', '{"@type":"edge","@class":"KNOWS","@from":"p1","@to":"p2"}', ].join("\n"), }); console.log(await resp.json());
Tip: For maximum throughput, group vertices by type in the input. The endpoint batches consecutive same-type vertices into a single
createVertices()call. Interleaving types forces smaller batches.
Tip: The endpoint is NOT atomic by design — GraphBatch commits internally in chunks for maximum throughput. Treat it as a bulk-loading operation, not a transactional one. The response tells you exactly how many records were committed.