COMPAT: reading json with lines=True from s3, xref #17200 by alph486 · Pull Request #17201 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation81 Commits22 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Attempt to decode the bytes array with encoding
passed to the call.
@alph486 : Thanks for the PR! You will need to write tests for us to merge this. I would also take a look at what @TomAugspurger suggested in terms of patching this.
Is there a particular way you guys go about handling s3 in your tests? I see some files in a bucket somewhere, but idk if there are any jsonl
type files or how to add one. For that matter, I don't see any tests for read_json
where the file is in s3.
Hello @alph486! Thanks for updating the PR.
Cheers ! There are no PEP8 issues in this Pull Request. 🍻
Comment last updated on November 27, 2017 at 00:04 Hours UTC
@gfyoung After going back to clean this up I realized jsonl from URL based file also suffers from the same issue. I have a cleaner fix that I think works in both cases.
Can you let me know how I should proceed for getting a JSONL file uploaded to the public s3 bucket for testing this in the test suite? I don't have write access.
Can you let me know how I should proceed for getting a JSONL file uploaded to the public s3 bucket for testing this in the test suite? I don't have write access.
@TomAugspurger @jreback : thoughts?
if you put the file on a gist somewhere @TomAugspurger or I can upload to the bucket.
jreback changed the title
Fix for #17200 COMPAT: reading json with lines=True from s3, xref #17200
@jreback @gfyoung Thanks for the assist here. I'll wrap this up soon; just after I get my current project completed that requires this change.
@alph486 : Are you still waiting on the updated URL to patch the failing tests?
@gfyoung yeah, i just need a target s3 url to put in my code
yeah I don't think I have this either. @TomAugspurger IIRC you can login using the Continuum dashboard.
if that works we should prob change it :>
sorry to keep nudging here, but any progress from IT @TomAugspurger ? Id love to try to a patched official version out so we can use our feature in production.
No word yet.
Really, we should be using https://github.com/spulec/moto to mock all our S3 calls. Now that s3fs (and boto3) are handling all the network stuff, we don't need to be testing it. Just our text parsing. I'll open an issue about that (and you can look at fixing it if you're interested :)
I can try to implement my tests using moto... we'll see if that works Thanks, Kevin
On Aug 24, 2017, 6:04 AM -0500, Tom Augspurger ***@***.***>, wrote: No word yet. Really, we should be using https://github.com/spulec/moto to mock all our S3 calls. Now that s3fs (and boto3) are handling all the network stuff, we don't need to be testing it. Just our text parsing. I'll open an issue about that (and you can look at fixing it if you're interested :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Thanks, I'd be totally fine with just using moto for your test for now, and following up moving most of the other S3 tests to use moto.
You'll need to edit one of the ci files to include moto. Probably ci/requirements-3.6.run
Ah, thanks for that. I hope to get to trying that locally today, I'll update here when I know. Thanks, Kevin
On Aug 24, 2017, 8:28 AM -0500, Tom Augspurger ***@***.***>, wrote: Thanks, I'd be totally fine with just using moto for your test for now, and following up moving most of the other S3 tests to use moto. You'll need to edit one of the ci files to include moto. Probably ci/requirements-3.6.run — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Any guidance on which requirements files I should change for both release packaging, development, and CI? (I currently have ci/requirements-3.6.run
and requirements_dev.txt
). I just have to add s3fs
to some things and moto
where appropriate.
you shouldn't add s3fs
anywhere
Unless I'm misunderstanding: you cannot run any of the test suites that require pandas.io.s3
locally unless s3fs is part of requirements_dev.txt
no?
Edit:
For example from my attempt of running pytest pandas/tests/io/json/test_pandas.py -v
:
pandas/tests/io/json/test_pandas.py:996:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/io/json/json.py:324: in read_json
encoding=encoding)
pandas/io/common.py:208: in get_filepath_or_buffer
from pandas.io import s3
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
""" s3 support for remote file interactivity """
from pandas import compat
try:
import s3fs
from botocore.exceptions import NoCredentialsError
except:
> raise ImportError("The s3fs library is required to handle s3 files")
E ImportError: The s3fs library is required to handle s3 files
pandas/io/s3.py:7: ImportError
=====================================
Kevin Kuhl added 2 commits
@TomAugspurger Yeah, but part of those unused imports are a backwards way of me leveraging the s3_resource
that arent necessarily just cleanup. I replied to a comment of yours earlier on about using s3_resource
. I'd need to move it and potentially refactor some of that code to use it in my module. Can you check that out and lmk your thoughts?
@alph486 sorry I missed your comment earlier. I'll push a commit in a minute that refactors the s3_resource stuff.
Ok thanks, sorry about it taking so long. I'm not super confident in here.
OK pushed two commits. One gets you back up to date with master, and 6979fb8 is the actual changes. I added a pandas/tests/io/conftest.py
with the s3_resource
that both places use now.
If you git pull
to test locally, you'll have to rebuild the C extensions (`python setup.py build_ext --inplace). Hopefully everything is OK though.
@jreback are you okay with the changes made per your review?
Should be all good. ping on green @alph486
@@ -1,2 +1,3 @@ |
---|
xarray==0.9.1 |
pandas-gbq |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needds to be reverted
@@ -5,4 +5,4 @@ cython |
---|
pytest>=3.1.0 |
pytest-cov |
flake8 |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert this, s3fs is NOT a requirement for dev; we should be robust to not having this installed
@@ -26,3 +26,4 @@ sqlalchemy |
---|
bottleneck |
pymysql |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is ok
""" |
---|
If PY3 and/or isinstance(json, bytes) |
""" |
if isinstance(json, bytes): |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 line comment
@@ -0,0 +1,2 @@ |
---|
{"a": 1, "b": 2} |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of this file?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, ok you have to have this named .json
otherwise it won't be picked up by setup.py
(IOW the install test will fail).
@pytest.fixture(scope='module') |
---|
def tips_file(): |
return os.path.join(tm.get_data_path(), 'tips.csv') |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool
@@ -0,0 +1,2 @@ |
---|
{"a": 1, "b": 2} |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, ok you have to have this named .json
otherwise it won't be picked up by setup.py
(IOW the install test will fail).
pandas objects compatability with Numpy or Python functions
label
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request
Labels
pandas objects compatability with Numpy or Python functions
read_json, to_json, json_normalize