ENH: Add skipna parameter to infer_dtype by cpcloud · Pull Request #17066 · pandas-dev/pandas (original) (raw)

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

cpcloud

@cpcloud cpcloud added this to the Next Major Release milestone

Jul 24, 2017

gfyoung

gfyoung

jreback

jreback

cpcloud

@cpcloud

@cpcloud

@cpcloud

@cpcloud cpcloud deleted the add-skipna-for-infer-dtype branch

July 25, 2017 18:23

rs2 added a commit to rs2/pandas that referenced this pull request

Aug 30, 2017

@rs2

closes pandas-dev#13873

This reverts commit dfebd8a.

closes pandas-dev#16634

Deprecated back in 0.18.1

xref pandas-devgh-12882

kwargs which are meant for the opening of the HDFStore should be filtered out before passing the remaining kwargs to the select function to load the data.

closes pandas-dev#16734

Author: Pietro Battiston me@pietrobattiston.it

This patch had conflicts when merged, resolved by Committer: Jeff Reback jeff.reback@twosigma.com

Closes pandas-dev#16736 from toobaz/index_what_you_can and squashes the following commits:

f77e2b3 [Pietro Battiston] BUG: do not raise UnsortedIndexError if sorting is not required

Deprecated in 0.18.1

xref pandas-devgh-12591, pandas-devgh-12594

Deprecated in 0.18.0

xref pandas-devgh-12190

closes pandas-dev#16767

In Python3, reading a DataFrame with a PeriodIndex from an HDF file created in Python2 would incorrectly return a DataFrame with an Int64Index.

This reverts commit 09d8c22.

closes pandas-dev#16730

xref pandas-dev#16814

xref pandas-dev#16826

[ci skip]

It wasn't being picked up in our package data otherwise

xref pandas-dev#16826

Deprecated since 0.18.0

xref pandas-devgh-11834

xref pandas-dev#16826

We import it, set it as an attribute, and then don't use it.

For the SciPy sprint tomorrow, until the cause of the doc-building slowdown is fully identified.

All value labels to be read before the iterator has been used Fix issue where categorical data was incorrectly reformatted when write_index was False

closes pandas-dev#16923

Better to ignore the warning from the bug, rather than assert the bug is still there

After this change, numpy/numpy#9412 could be backported to fix the bug

Closes pandas-devgh-16935

The parameter was not being respected for secondary_y.

Closes pandas-devgh-12565

closes pandas-dev#16416

Author: aernlund awe220@nyumc.org

Closes pandas-dev#16967 from aernlund/reset_index_docs and squashes the following commits:

3c6a4b6 [aernlund] DOC: added examples to reset_index 4838155 [aernlund] DOC: added examples to reset_index 2a51e2b [aernlund] DOC: added examples to reset_index

Closes pandas-devgh-9313

This reverts commit 9c096d2, as it was prematurely made.

closes pandas-dev#16917

Closes pandas-devgh-16893.

Closes pandas-devgh-16988.

Closes pandas-devgh-16773.

Deprecated since 0.11 and 0.12 respectively.

Closes pandas-devgh-16971.

Closes pandas-devgh-16772

Closes pandas-devgh-16957.

Closes pandas-devgh-16998.

closes pandas-dev#16012

Author: Morgan Stuart morgansstuart243@gmail.com

Closes pandas-dev#16969 from Morgan243/large_array_isin and squashes the following commits:

31cb4b3 [Morgan Stuart] Removed unneeded details from whatsnew description 4b59745 [Morgan Stuart] Linting errors; additional test clarification 186607b [Morgan Stuart] BUG pandas-dev#16012 - fix isin for large object arrays

closes pandas-dev#16770

Author: ri938 r_irv938@hotmail.com Author: Jeff Reback jeff@reback.net Author: Tuan tuan.d.tran@hotmail.com Author: Forbidden Donut forbdonut@gmail.com

This patch had conflicts when merged, resolved by Committer: Jeff Reback jeff@reback.net

Closes pandas-dev#16820 from ri938/bug_issue16770 and squashes the following commits:

0e2d315 [ri938] Merge branch 'master' into bug_issue16770 9802288 [ri938] Update v0.20.3.txt 1f2865e [ri938] Update v0.20.3.txt 83fd749 [ri938] Update v0.20.3.txt eab3192 [ri938] Merge branch 'master' into bug_issue16770 7acc09f [ri938] Minor correction to previous submit 6e8f1b3 [ri938] Minor corrections to previous submit (pandas-dev#16820) 9ed80f0 [ri938] Bring documentation into line with master branch. 26e1a60 [ri938] Move documentation of change to the next major release 0.21.0 59b17cd [Jeff Reback] BUG: rolling.cov with multi-index columns should presever the MI (pandas-dev#16825) 5362447 [Tuan] fix BUG: ValueError when performing rolling covariance on multi indexed DataFrame (pandas-dev#16814) 800b40d [ri938] BUG: render dataframe as html do not produce duplicate element id's (pandas-dev#16780) (pandas-dev#16801) a725fbf [Forbidden Donut] BUG: Fix read of py3 PeriodIndex DataFrame HDF made in py2 (pandas-dev#16781) (pandas-dev#16790) 8f8e3d6 [ri938] TST: register slow marker (pandas-dev#16797) 0645868 [ri938] Add backticks in documentation 0a20024 [ri938] Minor correction to previous submit 69454ec [ri938] Minor corrections to previous submit (pandas-dev#16820) 3092bbc [ri938] BUG: reindex would throw when a categorical index was empty pandas-dev#16770

Empty Series initializes to float64, even when the data type is object for .isin, leading to an error with membership.

Closes pandas-devgh-16991.

Closes pandas-devgh-9313 Redo of pandas-devgh-16958

Follow-up to pandas-devgh-16978.

See current stable docs for the issue: https://pandas.pydata.org/pandas-docs/stable/reshaping.html#reshaping-by-melt

The double is causing the entire paragraph to be fixed width until the next double. This commit removes the extra "`"

COMPAT: safe_sort will only coerce list-likes to object, not a numpy string type

xref: pandas-dev#17003 (comment)

Cocumentation to Documentation

closes pandas-dev#17005

closes pandas-dev#16900

Author: Dave Willmer dave.willmer@gmail.com

This patch had conflicts when merged, resolved by Committer: Jeff Reback jeff@reback.net

Closes pandas-dev#16986 from dwillmer/cat_fix and squashes the following commits:

1ea1977 [Dave Willmer] Minor tweaks + comment 21a35a0 [Dave Willmer] Merge branch 'cat_fix' of https://github.com/dwillmer/pandas into cat_fix 04d5404 [Dave Willmer] Update tests 3cc5c24 [Dave Willmer] Merge branch 'master' into cat_fix 5e8e23b [Dave Willmer] Add whatsnew item b82d117 [Dave Willmer] Lint fixes a81933d [Dave Willmer] Remove unused import 218da66 [Dave Willmer] Generic solution to categorical problem 48e7163 [Dave Willmer] Test inner join 8843c10 [Dave Willmer] Fix TypeError when merging categorical dates

supersedes pandas-dev#14145 closes pandas-dev#14001

This is a reprise of pandas-dev#14145 & pandas-dev#16408.

This removes some code from the core structures & pushes it to internals, where the primitives are made more consistent.

This should all us to be a bit more consistent for pandas2 type things.

closes pandas-dev#16402 supersedes pandas-dev#14145 closes pandas-dev#14001

CLN: remove uneeded code in internals; use split_and_operate when possible

Closes pandas-devgh-17046.

A continuation of pandas-dev#16178 closes pandas-dev#16112 closes pandas-dev#16178

Author: Kernc kerncece@gmail.com Author: keitakurita kris337jbn@yahoo.co.jp

This patch had conflicts when merged, resolved by Committer: Jeff Reback jeff@reback.net

Closes pandas-dev#16892 from kernc/sparse-fillna and squashes the following commits:

c1cd33e [Kernc] fixup! BUG: Made SparseDataFrame.fillna() fill all NaNs 2974232 [Kernc] fixup! BUG: Made SparseDataFrame.fillna() fill all NaNs 4bc01a1 [keitakurita] BUG: Made SparseDataFrame.fillna() fill all NaNs

Fix a few locations where a parser's error_msg buffer is written to without having been previously allocated. This manifested as a double free during exception handling code making use of the error_msg. Additionally, use size_t/ssize_t where array indices or lengths will be stored. Previously, int32_t was used and would overflow on columns with very large amounts of data (i.e. greater than INTMAX bytes).

xref pandas-dev#14696 closes pandas-dev#16798

Author: Jeff Knupp jeff.knupp@enigma.com Author: Jeff Knupp jeff@jeffknupp.com

Closes pandas-dev#17040 from jeffknupp/16790-core-on-large-csv and squashes the following commits:

6a1ba23 [Jeff Knupp] Clear up prose a5d5677 [Jeff Knupp] Fix linting issues 4380c53 [Jeff Knupp] Fix linting issues 7b1cd8d [Jeff Knupp] Fix linting issues e3cb9c1 [Jeff Knupp] Add unit test plus '--high-memory' option, off by default. 2ab4971 [Jeff Knupp] Remove debugging code 2930eaa [Jeff Knupp] Fix line length to conform to linter rules e4dfd19 [Jeff Knupp] Revert printf format strings; fix more comment alignment 3171674 [Jeff Knupp] Fix some leftover size_t references 0985cf3 [Jeff Knupp] Remove debugging code; fix type cast 669d99b [Jeff Knupp] Fix linting errors re: line length 1f24847 [Jeff Knupp] Fix comment alignment; add whatsnew entry e04d12a [Jeff Knupp] Switch to use int64_t rather than size_t due to portability concerns. d5c75e8 [Jeff Knupp] BUG: Use size_t to avoid array index overflow; add missing malloc of error_msg

TST: move highmemory test to proper location in c_parser_only

xref pandas-dev#16798

Follow-up to pandas-devgh-17057.

envars.sh doesn't exist anymore. In fact, it's been gone for awhile.

Duplicate boolean validation check for sort_index in series/test_validate.py

Addresses pandas-devgh-17064.

Also add some additional build information when calling pd.show_versions

closes pandas-dev#14636

Author: Pietro Battiston me@pietrobattiston.it

Closes pandas-dev#16994 from toobaz/set_axis_inplace and squashes the following commits:

8fb9d0f [Pietro Battiston] REF: adapt NDFrame.set_axis() calls to new signature 409f502 [Pietro Battiston] ENH: provide "inplace" argument to set_axis(), change signature

Closes pandas-devgh-17063

closes pandas-dev#15001

Currently defaults to False for backwards compatibility. Will default to True in the future.

Closes pandas-devgh-17059.

The "expected" variable is unused at the end of a test in indexing/test_scalar.py

Looks like we just forgot about them. Oops.

closes pandas-dev#17013

This fix actually exposed an occurrence of pandas-dev#17035 in an existing test (as well as in one I added).

Author: Pietro Battiston me@pietrobattiston.it

Closes pandas-dev#17062 from toobaz/pivot_margin_int and squashes the following commits:

2737600 [Pietro Battiston] Removed now obsolete workaround 956c4f9 [Pietro Battiston] BUG: respect dtype when calling pivot_table with margins=True

"2< heuristic" --> "2 < heuristic"

Stray verbose print statement in parsers.pyx was bare without any parentheses.

xref pandas-dev#16972

Closes pandas-devgh-13279

closes pandas-dev#17115

Closes pandas-devgh-7757.

Author: Brock Mendel jbrockmendel@gmail.com

Closes pandas-dev#17185 from jbrockmendel/dont_import_nan and squashes the following commits:

ee260b8 [Brock Mendel] remove direct import of nan

closes pandas-dev#7175 closes pandas-dev#5904

Closes pandas-devgh-17170

closes pandas-dev#17193

Remove leading space from task-list so that tasks aren't nested.

Imports should succeed on all versions of Python that pandas supports.

closes pandas-dev#17228

Author: mattip matti.picus@gmail.com

Closes pandas-dev#17229 from mattip/getsizeof-unavailable and squashes the following commits:

d2623e4 [mattip] COMPAT: avoid calling getsizeof() on PyPy

Replaced %s syntax with .format in pandas.core.reshape. Additionally, made some of the existing positional .format code more explicit.

Closes pandas-dev#13595

Closes pandas-devgh-17251

xref pandas-dev#17234

Test parameters with marks are updated according to the updated API of Pytest. https://docs.pytest.org/en/latest/changelog.html#pytest-3-2-0-2017-07-30 https://docs.pytest.org/en/latest/parametrize.html

Previously we had a Float64Index, which is inconsistent with, e.g., the regular Index constructor.

Previously these relied worked around the return type by wrapping list-likes in np.array and relying on that to cast to float. These workarounds are no longer nescessary.

This relied on NaN being a float and empty being a float. Not a necessary test anymore.

Stricter cutoffs for considering regressions

[ci skip]

Initializes unsupported and deprecated argument sets with set literals instead of the set constructor in pandas/io/parsers.py, as the former is slightly faster than the latter.

Patches several spelling errors and expands current doc to a proper doc-string.

In pandas-dev#17293 I messed up the syntax. I used a glob instead of a regex. According to the docs at http://asv.readthedocs.io/en/latest/asv.conf.json.html#regressions-thresholds we want to use a regex. I've actually manually tested this change and verified that it works.

[ci skip]

closes pandas-dev#17276

Author: Michael Gasvoda mgasvoda@mercatus.gmu.edu Author: mgasvoda mgasvoda01@gmail.com

Closes pandas-dev#17288 from mgasvoda/master and squashes the following commits:

a1dbdf2 [mgasvoda] Merge branch 'master' into master 9333952 [Michael Gasvoda] Checking output of tests 4e0464e [Michael Gasvoda] fixing whatsnew text c442040 [Michael Gasvoda] formatting fixes 7e23678 [Michael Gasvoda] formatting updates 781ea72 [Michael Gasvoda] whatsnew entry d9627fe [Michael Gasvoda] adding clip tests 9aa0159 [Michael Gasvoda] Treating na values as none for clips

closes pandas-dev#15206, numpy >= 1.9 closes pandas-dev#15543, matplotlib >= 1.4.3 scipy >= 0.14.0

Replaced %s syntax with .format in missing.py, nanops.py, ops.py. Additionally, made some of the existing positional .format code more explicit.

closes pandas-dev#16843

closes pandas-dev#16842

Author: step4me prosikeffect@gmail.com

Closes pandas-dev#17244 from step4me/step4me-feature and squashes the following commits:

09d051d [step4me] BUG: Cannot use tz-aware origin in to_datetime (pandas-dev#16842)

Progress toward issue pandas-dev#16130. Converted old string formatting to new string formatting in core/indexing.py.

[ci skip]

Moves test_intersect_str_dates from tests/indexes/test_range.py to tests/indexes/test_base.py.

When the indexer is identical to the elements. We should still return duplicates when the indexer contains duplicates.

Closes pandas-devgh-17323.

closes pandas-dev#17344

Progress toward issue pandas-dev#16130. Converted old string formatting to new string formatting in io/formats/format.py.

Declaring build-time requirements: https://www.python.org/dev/peps/pep-0518/

Remove references to Panel

This removes the special case for MultiIndex constructors returning an Index if all the levels are length-1. Now this will return a MultiIndex with a single level.

This is a backwards incompatabile change, with no clear method for deprecation, so we're making a clean break.

Closes pandas-dev#17178

alanbato pushed a commit to alanbato/pandas that referenced this pull request

Nov 10, 2017

@cpcloud @alanbato

Currently defaults to False for backwards compatibility. Will default to True in the future.

Closes pandas-devgh-17059.

This was referenced

Oct 30, 2018