Kuzu - Graph Database (original) (raw)

Kuzu implements a structured property graph model and requires a pre-defined schema.

Persistence

Kuzu supports both on-disk and in-memory modes of operation. The mode is determined at the time of creating the database, as explained below.

On-disk database

If you specify a database path when initializing a database, such as example.kuzu, Kuzu will operate in the on-disk mode. In this mode, Kuzu persists all data to disk at the given path. All transactions are logged to a Write-Ahead Log (WAL) and updates are periodically merged into the database files during checkpoints.

In-memory database

If you omit the database path, by specifying it as "" or :memory:, Kuzu will operate in the in-memory mode. In this mode, there are no writes to the WAL, and no data is persisted to disk. All data is lost when the process finishes.

Quick start

Ensure that you have installed Kuzu using the CLI or your preferred client API. Also download the example CSV files from our GitHub repo.


mkdir ./data/

curl -L -o ./data/city.csv https://raw.githubusercontent.com/kuzudb/kuzu/refs/heads/master/dataset/demo-db/csv/city.csv

curl -L -o ./data/user.csv https://raw.githubusercontent.com/kuzudb/kuzu/refs/heads/master/dataset/demo-db/csv/user.csv

curl -L -o ./data/follows.csv https://raw.githubusercontent.com/kuzudb/kuzu/refs/heads/master/dataset/demo-db/csv/follows.csv

curl -L -o ./data/lives-in.csv https://raw.githubusercontent.com/kuzudb/kuzu/refs/heads/master/dataset/demo-db/csv/lives-in.csv

In this example, we will create a graph with two node types, User and City, and two relationship types, Follows and LivesIn.

Because Kuzu is an embedded database, there are no servers to set up. You can simply import thekuzu module in your code and run queries on the database. The examples for different client APIs below demonstrate how to create a graph schema and import data into an on-disk Kuzu database.


import kuzu

def main():

    # Create an empty on-disk database and connect to it

    db = kuzu.Database("example.kuzu")

    conn = kuzu.Connection(db)

    # Create schema

    conn.execute("CREATE NODE TABLE User(name STRING PRIMARY KEY, age INT64)")

    conn.execute("CREATE NODE TABLE City(name STRING PRIMARY KEY, population INT64)")

    conn.execute("CREATE REL TABLE Follows(FROM User TO User, since INT64)")

    conn.execute("CREATE REL TABLE LivesIn(FROM User TO City)")

    # Insert data

    conn.execute('COPY User FROM "./data/user.csv"')

    conn.execute('COPY City FROM "./data/city.csv"')

    conn.execute('COPY Follows FROM "./data/follows.csv"')

    conn.execute('COPY LivesIn FROM "./data/lives-in.csv"')

    # Execute Cypher query

    response = conn.execute(

        """

        MATCH (a:User)-[f:Follows]->(b:User)

        RETURN a.name, b.name, f.since;

        """

    )

    for row in response:

        print(row)

if __name__ == "__main__":

    main()


['Adam', 'Karissa', 2020]

['Adam', 'Zhang', 2020]

['Karissa', 'Zhang', 2021]

['Zhang', 'Noura', 2022]

The approach shown above returned a list of lists containing query results. See below for more output options for Python.

Output as a dictionary

You can also get the results of a Cypher query as a dictionary.


response = conn.execute(

    """

    MATCH (a:User)-[f:Follows]->(b:User)

    RETURN a.name, b.name, f.since;

    """

)

for row in response.rows_as_dict():

    print(row)


{'a.name': 'Adam', 'b.name': 'Karissa', 'f.since': 2020}

{'a.name': 'Adam', 'b.name': 'Zhang', 'f.since': 2020}

{'a.name': 'Karissa', 'b.name': 'Zhang', 'f.since': 2021}

{'a.name': 'Zhang', 'b.name': 'Noura', 'f.since': 2022}

Pandas

You can also pass the results of a Cypher query to a Pandas DataFrame for downstream tasks. This assumes that pandas is installed in your environment.


# pip install pandas

response = conn.execute(

    """

    MATCH (a:User)-[f:Follows]->(b:User)

    RETURN a.name, b.name, f.since;

    """

)

print(response.get_as_df())


    a.name   b.name  f.since

0     Adam  Karissa     2020

1     Adam    Zhang     2020

2  Karissa    Zhang     2021

3    Zhang    Noura     2022

Polars

Polars is another popular DataFrames library for Python, and you can process the results of a Cypher query in much the same way you did with Pandas. This assumes that polars is installed in your environment.


# pip install polars

response = conn.execute(

    """

    MATCH (a:User)-[f:Follows]->(b:User)

    RETURN a.name, b.name, f.since;

    """

)

print(response.get_as_pl())


shape: (4, 3)

┌─────────┬─────────┬─────────┐

│ a.name  ┆ b.name  ┆ f.since │

│ ---     ┆ ---     ┆ ---     │

│ str     ┆ str     ┆ i64     │

╞═════════╪═════════╪═════════╡

│ Adam    ┆ Karissa ┆ 2020    │

│ Adam    ┆ Zhang   ┆ 2020    │

│ Karissa ┆ Zhang   ┆ 2021    │

│ Zhang   ┆ Noura   ┆ 2022    │

└─────────┴─────────┴─────────┘

Arrow Table

You can also use the PyArrow library to work with Arrow Tables in Python. This assumes that pyarrow is installed in your environment. This approach is useful when you need to interoperate with other systems that use Arrow as a backend. In fact, the get_as_pl() method shown above for Polars materializes a pyarrow.Table under the hood.


# pip install pyarrow

response = conn.execute(

    """

    MATCH (a:User)-[f:Follows]->(b:User)

    RETURN a.name, b.name, f.since;

    """

)

print(response.get_as_arrow())


pyarrow.Table

a.name: string

b.name: string

f.since: int64

----

a.name: [["Adam","Adam","Karissa","Zhang"]]

b.name: [["Karissa","Zhang","Zhang","Noura"]]

f.since: [[2020,2020,2021,2022]]