Accept SQLAlchemy Connection as con param in pd.read_sql_query() · Issue #8533 · pandas-dev/pandas (original) (raw)

Currently pd.read_sql_query() accepts an sqlalchemy.engine.base.Engine instance as the con parameter. Using that, it presumably creates a new sqlalchemy.engine.base.Connection to connect to the DB server and issue the query.
However, as most (all?) DB servers associate a session with a connection, this precludes issuing queries for an existing session/connection.

For example, it is common to break up large queries into a series of steps that create TEMPorary intermediate result tables, possibly joining some into further intermediate tables, and then querying for the final result. The lifetime and scope of a TEMP table is that of the session/connection.

Hence, something like:

con = engine.connect() con.execute('CREATE TEMP TABLE tmptable ...') result = pd.read_sql_query('SELECT * FROM tmptable ..', engine)

will fail as the tmptable isn't available to the connection/session created by read_sql_query().

Concatenating the SQL statements into one query string with ';' statement seperator, or using the SQLAlchemy text class to attempt to create the table and query it in one SQL query also fails as leads to SQLAlchemy claiming the query doesn't return results (perhaps because it doesn't being with "SELECT"?)

This, and other similar use-cases involving multiple queries within a single session/connection context, could be easily supported by allowing a sqlalchemy.engine.base.Connection to be passed to the read_sql_query() and similar functions, whereby they would use the supplied connection rather than creating a new one. An added benefit would be the reduction in overhead of repeatedly creating and tearing down connections to the DB server and also allowing users to implement connection pooling when necessary.

(Issue created at suggestion of @joris in this StackOverflow question )