Use SQLAlchemy with Databricks | Databricks on AWS (original) (raw)

Databricks provides a SQLAlchemy dialect (the system SQLAlchemy uses to communicate with various types of database API implementations and databases) for Databricks. SQLAlchemy is a Python SQL toolkit and Object Relational Mapper (ORM). SQLAlchemy provides a suite of well known enterprise-level persistence patterns, designed for efficient and high-performing database access, adapted into a simple and Pythonic domain language. See Features and Philosophy.

The SQLAlchemy dialect for Databricks needs to be installed to use the SQLAlchemy features with Databricks. This article covers SQLAlchemy dialect for Databricks version 1.0 and 2.0, which will be based on Databricks SQL Connector for Python version 4.0.0 or above.

Requirements

Get started

Authentication

The SQLAlchemy dialect for Databricks supports Databricks personal access token authentication.

To create a Databricks personal access token, follow the steps in Create personal access tokens for workspace users.

To authenticate the SQLAlchemy dialect, use the following code snippet. This snippet assumes that you have set the following environment variables:

To set environment variables, see your operating system's documentation.

Python

import os
from sqlalchemy import create_engine

access_token    = os.getenv("DATABRICKS_TOKEN")
server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME")
http_path       = os.getenv("DATABRICKS_HTTP_PATH")
catalog         = os.getenv("DATABRICKS_CATALOG")
schema          = os.getenv("DATABRICKS_SCHEMA")

engine = create_engine(
  url = f"databricks://token:{access_token}@{server_hostname}?" +
        f"http_path={http_path}&catalog={catalog}&schema={schema}"
)

# ...

You use the preceding engine variable to connect to your specified catalog and schema through your Databricks compute resource.

SQLAlchemy v1

For connection examples, refer this example.py

SQLAlchemy v2

For connection examples, see the following section and the sqlalchemy_example.py file in GitHub.

DBAPI reference

Additional resources