Poor performance when running a large statement. sqlalchemy + oracledb · Issue #172 · oracle/python-oracledb (original) (raw)

Hi,
I have the following code:

import pandas as pd from sqlalchemy.orm.session import Session

def compute(engine: Session) -> pd.DataFrame: # some steps # .... df = engine.query(...)

    with engine.get_bind().connect() as conn:
            df = pd.read_sql(sql=df.statement, con=conn)

    return df
  1. If I compile SQL code from the df variable and then run it directly in DBeaver - this statement will complete in 17-20 seconds. And it's okay. Compiled SQL contains 347k characters.
  2. If I run the calculation through pd.read_sql, then this code will run for about 16 minutes. I noticed that the delay happens in the _prepare method (see trace below).

File "/opt/homebrew/Caskroom/miniforge/base/envs/venv/lib/python3.10/site-packages/oracledb/cursor.py", line 137, in _prepare self._impl.prepare(statement, tag, cache_statement) File "src/oracledb/impl/thin/cursor.pyx", line 213, in oracledb.thin_impl.ThinCursorImpl.prepare File "src/oracledb/impl/thin/connection.pyx", line 235, in oracledb.thin_impl.ThinConnImpl._get_statement File "src/oracledb/impl/thin/connection.pyx", line 239, in oracledb.thin_impl.ThinConnImpl._get_statement File "src/oracledb/impl/thin/statement.pyx", line 184, in oracledb.thin_impl.Statement._prepare File "/opt/homebrew/Caskroom/miniforge/base/envs/venv/lib/python3.10/re.py", line 209, in sub return _compile(pattern, flags).sub(repl, string, count)

  1. At the same time, I can run simpler scripts via pd.read_sql and they execute normally.
  2. Also, I build the same script like in df variable, but for PostgreSQL. And it run for about 35 seconds.

Issue

  1. Is this expected behavior when working with large scripts?
  2. Is there any way I can bypass this prepare step?

requirements.txt

Apple M1 macOS Ventura 13.0

python 3.10

SQLAlchemy==2.0.9 numpy==1.23.5 pandas==1.5.2 oracledb==1.3.0

Thank you!