Raise error in read_csv when arguments header and prefix both are not None by rushabh-v · Pull Request #31383 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation39 Commits17 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

closes pd.read_csv prefix parameter seems do not works #27394
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rushabh-v. can you add a what's new note. What's the current behaviour? Does this need a deprecation cycle?

All checks successful.
No, in my view it doesn't need depreciation cycle.

And I have made all the requested changes @simonjayhawkins

No, it doesn't need depreciation cycle.

IIUC then currently prefix is ignored if header passed.

from io import StringIO s = StringIO("0,1\n2,3")

import pandas as pd pd.version '1.0.0rc0+228.g4edcc5541'

pd.read_csv(s, header=0, prefix="_X") 0 1 0 2 3

This would now raise a ValueError.

IMO this is a breaking change. see what others think.

IMO this is a breaking change

oh, I think you are right.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. The intention here is to raise a ValueError (its a bug and confusing to silently ignore)

@TomAugspurger

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look OK, though the test is likely in the wrong file. Should be in pandas/tests/io/parser/test_common.py probably.

@@ -575,6 +575,13 @@ def test_to_csv_headers(self):
recons.reset_index(inplace=True)
tm.assert_frame_equal(to_df, recons)

def test_to_csv_raises_on_header_prefix(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this testing to_csv or read_csv? Looks like it's in the to_csv file, but is testing read_csv?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's testing read_csv that's a mistake in naming it.

But I actually tried putting this test in pandas/tests/io/parser/test_common.py. But the problem there is that all the tests use io.parsers.read_csv there. And io.parsers.read_csv doesn't seem to be going into CParserWrapper. So I tried putting it here. But now I am putting it back in test_common.py and will import the pandas.read_csv there.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you take a look, please?

@@ -2040,6 +2040,14 @@ def test_read_csv_memory_growth_chunksize(all_parsers):
pass


def test_read_csv_raises_on_header_prefix():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use the all_parsers fixture here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I first tried to do that in c053a8f but looks likey parsers.read_csv is not going to CParserWrapper. Which is what we want to test.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure it is, what exactly was failing? this is the standard pattern for parser tests.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is to check whether the error is being raised or not, and the error is raised from CParserWrapper So, the whole test fails here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to apply your fix to all engines. You just applied to CParserWrapper it seems.

I think the fix should actually be in ParserBase so that all engines get the fix.

c053a8f is the correct test.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rushabh-v : You're on the right track, but I think you tried too hard to adjust the tests to your fixes, when it really should be the other way around. See my earlier comments.

Hello @rushabh-v! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-02-03 09:25:48 UTC

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good. Just want to make sure that tests pass.

If they fail, we can update those accordingly.

Indeed, those test failures look unrelated. I just restarted the build.

Indeed, those test failures look unrelated. I just restarted the build.

same test failure again.

You may need to rebase or merge. Unclear why these warnings are popping up.

cc @pandas-dev/pandas-core : is there something with matplotlib that changed recently?

vinceatbluelabs added a commit to bluelabsio/records-mover that referenced this pull request

Aug 12, 2020