API: add dtype= option to python parser by chris-b1 · Pull Request #14295 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation28 Commits24 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

@chris-b1

@codecov-io

Current coverage is 85.22% (diff: 100%)

Merging #14295 into master will increase coverage by 0.01%

@@ master #14295 diff @@

Files 143 143
Lines 50804 50833 +29
Methods 0 0
Messages 0 0
Branches 0 0

Powered by Codecov. Last update 75b606a...3abb0bd

jorisvandenbossche

return result
def _convert_types(self, values, na_values, try_num_bool=True):
def _infer_types(self, values, na_values, try_num_bool=True):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you are at it, can you add a docstring here?

jorisvandenbossche

not interpret dtype.
Use `str` or `object` to preserve and not interpret dtype.
If converters are specified, they will be applied AFTER
dtype conversion.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test that exercises this assumption?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong, the c-parser actually ignores the dtype and just uses the converter - changed docstring and added matching test.

jorisvandenbossche

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice to me! (added two minor comments)

gfyoung

values, conv_f, na_values,
col_na_values, col_na_fvalues)
else:
try_num_bool = True

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just this:

try_num_bool = cast_type and is_object_dtype(cast_type)

gfyoung

return result, na_count
def _cast_types(self, values, cast_type, column):
""" cast column to type specified in dtypes= param """

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I know this isn't done too much for internal functions, it would be good to start documenting these functions more thoroughly for development purposes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I know this isn't done too much for internal functions,

But we should do it more! (therefore my comment above) It makes familiarizing yourself with a module a lot easier.
BTW, a PR to go through a file and document some of the functions (eg things you haven been working on in recent PRs and so you know better) is always very welcome :-)

@jorisvandenbossche

@chris-b1 xref #14558. You say that dtype is ignored when converters is specified. In that case it would be nice to raise a warning about this if both are specified I think.

jorisvandenbossche

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small comments, looks good to me!

.. ipython:: python
from io import StringIO

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will not work on python 2. Should check how it is done in other places in the docs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from pandas.compat import StringIO is imported in a suppressed ipython code block at the top of the file, so you can just use StringIO

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, the io module was backported to 2.6, but I will remove this, as it is already imported like you mentioned.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chris-b1 That's true, but then the input must be unicode strings, not plain python 2 strings (which I think is not the case in the docs, but not sure)

If converters are specified, they will be applied INSTEAD
of dtype conversion.
.. versionadded:: 0.20.0 support for the Python parser.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally would leave this out of the docstring (this is already so long ..), but not strong feelings, your take

@chris-b1 chris-b1 changed the titleAPI: add dtype= option to python parser (WIP) API: add dtype= option to python parser

Nov 13, 2016

@chris-b1

gfyoung

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

jorisvandenbossche

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me as well

jreback

Specifying ``dtype`` with ``engine`` other than 'c' raises a
``ValueError``.
.. versionadded:: 0.20.0 support for the Python parser.
The ``dtype`` option is supported by the 'python' engine

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need a blank line here (to avoid warnings)

jreback

^^^^^^^^^^^^^^^^^^
- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would make a sub-section

jreback

if conv:
if col_dtype is not None:
warnings.warn(("Both a converter and dtype were specified "

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a test for this? (IOW, this was prob tested for c-engine, but now need to test for all)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this was added here:e0d1606

@gfyoung

@chris-b1 : I'm putting together a PR to patch #14712. As this directly relates to your PR, I would suggest waiting for mine to make sure the bug isn't an issue for you.

@jorisvandenbossche

@chris-b1 can you rebase this? Then this can be merged I think (depending on the status of @gfyoung new PR)

@chris-b1

@jorisvandenbossche

@chris-b1 I merged #14717. This added a dtype test to c_parser_only, so added a commit to move this to the general dtype tests as well. Feel free to merge if you're ok with this and the tests are green.

jreback

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The ``dtype`` keyword argument in the :func:`read_csv` function for specifying the types of parsed columns
is now supported with the ``'python'`` engine. See the :ref:`io docs <io.dtypes>` for more information.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue references here

@chris-b1

…into textreader-dtype

Conflicts: pandas/io/tests/parser/dtypes.py

@chris-b1

@jorisvandenbossche

@jorisvandenbossche

Labels