BUG: Respect 'usecols' parameter even when CSV rows are uneven by gfyoung · Pull Request #12551 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation23 Commits1 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

gfyoung

Closes #12203 by overriding the row alignment checks for both engines when the usecols parameter is passed into read_csv.

@jreback

@gfyoung its not necessary for you to keep rebasing every time master is slightly updated.

@gfyoung

@jreback : Oh whoops, sorry! I have a small script that updates my branches (including master in my fork) whenever I see there are commits, and I forgot to specify to just update the master branch.

@jreback

oh ok maybe just do it manually for a while

trying to get out another rc

@jreback

the issue is that Travis runs and it can get very busy

@gfyoung

No, I perfectly understand. As a precaution, I'll add [ci skip] to all of my open PR's, since I imagine there will be changes to be made once reviewed after which I'll remove it so that Travis can run.

jreback

# make sure that an error is still thrown
# when the 'usecols' parameter is not provided
name = self.__class__.__name__
msg = "Expected \d+ fields in line \d+, saw \d+"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is too much gymnastics here. Why is this needed? e.g. diffs in the errors from python/c parser? if that IS the case. then I'd like to change the raise for this kind of error to a ValueError in the cparser.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that was why I had initially created a PythonParserError to keep it consistent with what you would see with the C engine. ValueError though makes sense for consistency and semantic purposes. Will change.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, looking at where the CParserError is raised, it's buried in the tokenizer.c code, which is supposed to be for parsing errors in general. Suggestions on how to externalize this particular CParserError?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no its in parser.pyx (but it inherits from ValueError). I would actually change that, but that might cause some issues (you can check that soln though)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, sorry! Do you mean just getting rid of CParserError altogether and replacing with just ValueError?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you wont' be able to remove it (well shouldn't as its in the public API). But it inherites from Exception. if you can make it inherit from ValueError then I think we'd take that. See what happens. lmk.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. That seems like a good compromise because ValueError inherits from Exception anyways. Will give it a shot.

@gfyoung

@jreback : Travis has no problems making CParserError inherit from ValueError, so I think this is also good to merge if there is nothing else.

@jreback

need s note in the API changes section wrt error class

@gfyoung

Added note in the API changes. Should be good to merge if there is nothing else.

jreback

@@ -50,6 +50,12 @@ API changes
- ``CParserError`` is now a ``ValueError`` instead of just an ``Exception``

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number for the PR here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@gfyoung

@gfyoung

@jreback : Made all of the requested changes. Should be ready to merge if there is nothing else.

@jreback

@gfyoung gfyoung deleted the usecol_long_lines branch

March 20, 2016 15:16

This was referenced

Mar 20, 2016

2 participants

@gfyoung @jreback