mysql support by danielballan · Pull Request #2482 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation23 Commits7 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

I added mysql support and (untested!) support for oracle. Some parts of the code are ready to support other flavors -- it should be easy to extend.

Thanks for the PR!
one thing that will make it easier is if you can make the test cases optional. If you look at pandas/io/tests/test_excel.py for example, xlwt/xlrd/openpyxl are all optional dependencies. If people don't have those drivers installed, the test suite should still pass

The next step here would be for someone to review the pull request and see if you can add a commit on top that makes the test cases optional. See @changhiskhan 's most recent comment for sample code to look at.

OK, I think everything is in order.

marked for review for 0.10.1

@danielballan I merged in the MySQL flavor but took out the other flavors for lack of tests. I made some tweaks to clean things up just a little bit. Unfortunately while I was merging I f'ed up somewhere in the process and the whole thing ended up marked as my commit. I added a line note in each file in that commit. Since there's still other SQL flavors in there, I'm moving this to a later milestone so when you get a chance to implement more test cases, we'll merge in the rest.

ok, properly attributed a835118
sorry for the snafu @danielballan

@wesm let's keep this open until someone has a chance to write tests for Postgres/Oracle/odbc?

yarikoptic added a commit to neurodebian/pandas that referenced this pull request

Jan 23, 2013

Version 0.10.1

tag 'v0.10.1': (195 commits) RLS: set released to true RLS: Version 0.10.1 TST: skip problematic xlrd test Merging in MySQL support pandas-dev#2482 Revert "Merging in MySQL support pandas-dev#2482" BUG: don't let np.prod overflow int64 RLS: note changed return type in DatetimeIndex.unique RLS: more what's new for 0.10.1 RLS: some what's new for 0.10.1 API: restore inplace=TRue returns self, add FutureWarnings. re pandas-dev#1893 Merging in MySQL support pandas-dev#2482 BUG: fix python 3 dtype issue DOC: fix what's new 0.10 doc bug re pandas-dev#2651 BUG: fix C parser thread safety. verify gil release close pandas-dev#2608 BUG: usecols bug with implicit first index column. close pandas-dev#2654 BUG: plotting bug when base is nonzero pandas-dev#2571 BUG: period resampling bug when all values fall into a single bin. close pandas-dev#2070 BUG: fix memory error in sortlevel when many multiindex levels. close pandas-dev#2684 STY: CRLF BUG: perf_HEAD reports wrong vbench name when an exception is raised ...

I'm interested by the postgre support and testing in Pandas. Do you know if someone work on it? I would like to deep into io.sql & postresql stuff in a few weeks if it's possible. I'll begin to create a branch from changhiskhan@a835118 (better idea?) in order to avoid any painful conflicts and to keep the mysql test framework.

Cheers.

Sounds good to me. Sqlite and MySQL are the flavors I use in my work, so I'm glad to see someone else pick up postgresql. I will direct my efforts to more specific data type detection, as noted in the comments of the current release.

@garaud I don't think I merged @danielballan's postgre support in my fork there. If you don't want to mess with merging, just fork from @danielballan's PR branch and add test cases. It looked perfectly fine to me I just didn't have time to add test cases for it. I can take care of merging into master at the end if you don't want to mess with it.

OK. Thanks. I forked the branch mysql from @danielballan and created a branch postgre from it. I'll keep you posted.

Just wondering, if this is to be an optional module in any case, wouldn't it make more sense to take advantage of SQLAlchemy's well tested DB support and instead add SQLAlchemy as an optional dependancy, with sqlite as a fallback. I'm interested in this approach because i use pandas and sqlalchemy together quite a bit anyway, it allows you to abstract away the differences in SQL flavours.

I was not aware of SQLAlchemy. This looks dead useful. Can you post a gist of an example where you've used them together?

I actually tried doing the integration myself, i have a work in progress here (very very alpha):
https://github.com/mangecoeur/pandas/blob/sqlalchemy-integration/pandas/io/sql.py

I'm currently working in the "write_frame2" function so that i can compare results with the existing "write_frame" function (the idea would be to merge those later). It doesn't work for me yet because I need to find out the best way to convert from numpy dtypes to SQL DB supported types - preferably with the least effort duplication. I'm thinking there might be a way to re-use the CSV read/write parsers.

you guys might useful #2752
(convert datetimes to nan when u astype to object)
and df.blocks property
(which gives u a dict of dtype to homogeneous frame)
these are both new in 0.11 and in current master

you also might want to take a look at pandas/io/pytables.py
Table.create_axes for this kind of type mapping (which is somewhat non-trivial)

Hi,

Is this issue still opened? There are some features with dedicated tests in pandas/io/sql.py. See the commit a835118

However, this feature does not occur in the RELEASE.rst file for the 0.11 release.

@garaud, This is still open because it remains to write tests and writing capabilities for several more flavors of SQL. (See above.) Also see SQLAlchemy discussion. As for the docs, yes, they are incomplete. This was my first PR and I didn't put documentation everywhere that it belongs. I think changhiskhan took care of some of it though. Feel free to elaborate.

Does it make more sense to break this into (several?) new issues, at least these two I think may make more sense being separate:

SQLalchemy (how's this coming along @mangecoeur ? Is there much still to do/can we help?)
postgres (would this be free with sqlalchemy?)

Or maybe also a global "SQL support" issue with several parts/roadmap?

I think a SQL support issue is the way to go.

There are several open bugs.
We can support postgresql and oracle with very little additional effort. I'd rather see them tested by people who regularly use those flavors of SQL, but I think I could get them off the ground and minimally test them.
That said, I support adopting SQL alchemy and eventually relying on it for all frame writing, if that branch works.

One additional thought: One way or another, we need to infer data types more carefully. jrebeck provides some helpful references in other parts of pandas. Before anyone takes that on, we should decide whether we are ultimately going to toss all of this out in favor of SQLAlchemy.

hayd mentioned this pull request

Jul 8, 2013

20 tasks