API: add dtype= option to python parser by chris-b1 · Pull Request #14295 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation28 Commits24 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Current coverage is 85.22% (diff: 100%)
@@ master #14295 diff @@
Files 143 143
Lines 50804 50833 +29
Methods 0 0
Messages 0 0
Branches 0 0
- Hits 43292 43324 +32
- Misses 7512 7509 -3
Partials 0 0
Powered by Codecov. Last update 75b606a...3abb0bd
| return result |
|---|
| def _convert_types(self, values, na_values, try_num_bool=True): |
| def _infer_types(self, values, na_values, try_num_bool=True): |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you are at it, can you add a docstring here?
| not interpret dtype. |
|---|
| Use `str` or `object` to preserve and not interpret dtype. |
| If converters are specified, they will be applied AFTER |
| dtype conversion. |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a test that exercises this assumption?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wrong, the c-parser actually ignores the dtype and just uses the converter - changed docstring and added matching test.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice to me! (added two minor comments)
| values, conv_f, na_values, |
|---|
| col_na_values, col_na_fvalues) |
| else: |
| try_num_bool = True |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just this:
try_num_bool = cast_type and is_object_dtype(cast_type)
| return result, na_count |
|---|
| def _cast_types(self, values, cast_type, column): |
| """ cast column to type specified in dtypes= param """ |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I know this isn't done too much for internal functions, it would be good to start documenting these functions more thoroughly for development purposes.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I know this isn't done too much for internal functions,
But we should do it more! (therefore my comment above) It makes familiarizing yourself with a module a lot easier.
BTW, a PR to go through a file and document some of the functions (eg things you haven been working on in recent PRs and so you know better) is always very welcome :-)
@chris-b1 xref #14558. You say that dtype is ignored when converters is specified. In that case it would be nice to raise a warning about this if both are specified I think.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small comments, looks good to me!
| .. ipython:: python |
| from io import StringIO |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will not work on python 2. Should check how it is done in other places in the docs
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from pandas.compat import StringIO is imported in a suppressed ipython code block at the top of the file, so you can just use StringIO
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, the io module was backported to 2.6, but I will remove this, as it is already imported like you mentioned.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chris-b1 That's true, but then the input must be unicode strings, not plain python 2 strings (which I think is not the case in the docs, but not sure)
| If converters are specified, they will be applied INSTEAD |
|---|
| of dtype conversion. |
| .. versionadded:: 0.20.0 support for the Python parser. |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally would leave this out of the docstring (this is already so long ..), but not strong feelings, your take
chris-b1 changed the title
API: add dtype= option to python parser (WIP) API: add dtype= option to python parser
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me as well
| Specifying ``dtype`` with ``engine`` other than 'c' raises a |
|---|
| ``ValueError``. |
| .. versionadded:: 0.20.0 support for the Python parser. |
| The ``dtype`` option is supported by the 'python' engine |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need a blank line here (to avoid warnings)
| ^^^^^^^^^^^^^^^^^^ |
|---|
| - ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would make a sub-section
| if conv: |
| if col_dtype is not None: |
| warnings.warn(("Both a converter and dtype were specified " |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a test for this? (IOW, this was prob tested for c-engine, but now need to test for all)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this was added here:e0d1606
@chris-b1 : I'm putting together a PR to patch #14712. As this directly relates to your PR, I would suggest waiting for mine to make sure the bug isn't an issue for you.
@chris-b1 can you rebase this? Then this can be merged I think (depending on the status of @gfyoung new PR)
@chris-b1 I merged #14717. This added a dtype test to c_parser_only, so added a commit to move this to the general dtype tests as well. Feel free to merge if you're ok with this and the tests are green.
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|---|
| The ``dtype`` keyword argument in the :func:`read_csv` function for specifying the types of parsed columns |
| is now supported with the ``'python'`` engine. See the :ref:`io docs <io.dtypes>` for more information. |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue references here
…into textreader-dtype
Conflicts: pandas/io/tests/parser/dtypes.py