ENH: Add tranparent compression to json reading/writing by simongibbons · Pull Request #17798 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation18 Commits6 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

simongibbons

This works in the same way as the argument to read_csvand to_csv.

I've added tests confirming that it works with both file paths, and S3 URLs. (obviously there will be edge cases I've missed - please let me know if there are important ones that I should add coverage for).

The implementation is mostly plumbing, using the logic that was in place for the same functionality in read_csv.

@pep8speaks

Hello @simongibbons! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on October 06, 2017 at 08:18 Hours UTC

@simongibbons

This works in the same way as the argument to read_csv and to_csv.

I've added tests confirming that it works with both file paths, as well and file URLs and S3 URLs.

@simongibbons

@codecov

@codecov

jreback

COMPRESSION_TYPES = [None, 'bz2', 'gzip', 'xz']
def test_compress_gzip():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls parametrize all of this

see how this is done in other compression tests

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the pattern that was used in the tests of compression with pickle.

Where there is a function which is used to decompress files by compression type.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The zip tests will IMO always need to be special cases, as there isn't a writer we will always need to read from a fixture.

@@ -195,7 +195,7 @@ Other Enhancements
- :func:`read_json` now accepts a ``chunksize`` parameter that can be used when ``lines=True``. If ``chunksize`` is passed, read_json now returns an iterator which reads in ``chunksize`` lines with each iteration. (:issue:`17048`)
- :meth:`DataFrame.assign` will preserve the original order of ``**kwargs`` for Python 3.6+ users instead of sorting the column names
- Improved the import time of pandas by about 2.25x (:issue:`16764`)
- :func:`read_json` and :func:`to_json` now accept a ``compression`` argument which allows them to transparently handled compressed files. (:issue:`XXXXXXX`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update this

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

jreback

@jreback

@simongibbons

Let me know if you want me to squash this when it's ready to merge.

@jreback

@simongibbons no need to squash, its done automatically on merging.

TomAugspurger

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.

Does the ZIP file need to be added to MANIFEST.IN?

@pytest.mark.parametrize('compression', COMPRESSION_TYPES)
def test_with_s3_url(compression):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shares some code with the (to be merged) #17201

I think it's fine for now, but we'll want to clean it up whenever the later is merged. Since this is clean at the moment, I think we'll merge it, and then refactor this test in #17201.

@jreback

@TomAugspurger

Probably covered by pandas.tests.io: ['json/data/*.json'] in the setup.py.

@TomAugspurger

@jreback

@TomAugspurger NO its NOT covered by that. See the failing tests. This NEEDS to be in setup.py

@jreback

you can change it to '/json/data/*.json*' and it will work I think

jreback added a commit to jreback/pandas that referenced this pull request

Oct 6, 2017

@jreback

jreback added a commit that referenced this pull request

Oct 6, 2017

@jreback

ghost pushed a commit to reef-technologies/pandas that referenced this pull request

Oct 16, 2017

@simongibbons

…7798)

This works in the same way as the argument to read_csv and to_csv.

I've added tests confirming that it works with both file paths, as well and file URLs and S3 URLs.

ghost pushed a commit to reef-technologies/pandas that referenced this pull request

Oct 16, 2017

@jreback

alanbato pushed a commit to alanbato/pandas that referenced this pull request

Nov 10, 2017

@simongibbons @alanbato

…7798)

This works in the same way as the argument to read_csv and to_csv.

I've added tests confirming that it works with both file paths, as well and file URLs and S3 URLs.

alanbato pushed a commit to alanbato/pandas that referenced this pull request

Nov 10, 2017

@jreback @alanbato

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request

Nov 28, 2017

@simongibbons @No-Stream

…7798)

This works in the same way as the argument to read_csv and to_csv.

I've added tests confirming that it works with both file paths, as well and file URLs and S3 URLs.

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request

Nov 28, 2017

@jreback @No-Stream