BUG: DataFrame.append with timedelta64 by jbrockmendel · Pull Request #39574 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation29 Commits16 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

closes #xxxx
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

AFAICT several of the existing tests are just wrong, xref #39122 cc @jorisvandenbossche

Fixes (but havent yet added a test for) #39037 (comment)

@@ -97,7 +97,7 @@ def _cast_to_common_type(arr: ArrayLike, dtype: DtypeObj) -> ArrayLike:
return arr.astype(dtype, copy=False)


def concat_compat(to_concat, axis: int = 0):
def concat_compat(to_concat, axis: int = 0, pretend_axis1: bool = False):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you just allow axis=None and call it that case. this is very odd naming here (appreciate the de-duplication that this allows); so i guess a better question is this temporary?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you just allow axis=None and call it that case. this is very odd naming here

I want the naming to be very clear that this is a dont-try-this-at-home kludge (need to update the docstring to that effect)

so i guess a better question is this temporary?

inasmuch as it wont be needed once we have 2D EAs, yes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok to the extent this is not removed anytime soon, can you come up with a better argument name or another way of doing this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed + green

Can you summarize the behaviour that is changed / bug that is fixed?

Can you summarize the behaviour that is changed / bug that is fixed?

df = DataFrame({"a": pd.array([1], dtype="Int64")})
other = DataFrame({"a": [np.timedelta64("NaT", "ns")]})
result = df.append(other, ignore_index=True)

>>> result
      a
0     1
1  <NA>
>>> result.dtypes
a    Int64
dtype: object

df = DataFrame(columns=["a"]).astype("Int64")
other = DataFrame({"a": [np.timedelta64(1, "ns")]})

>>> result = df.append(other, ignore_index=True)
ValueError: Wrong number of dimensions. values.ndim != ndim [1 != 2]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am personally a bit hesitant to change the all-NaN special case. That's longstanding behaviour (also for practical reasons), that will break code I think.

@@ -72,6 +72,8 @@ def concat_compat(to_concat, axis: int = 0):
----------
to_concat : array of arrays
axis : axis to provide concatenation
ea_compat_axis : bool, default False
For ExtensionArray compat, behave as if axis == 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what this means in practice to behave "as if axis=1"? Because I assume the arrays are still concatenated with axis=0?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just matters for the dropping-empty check; will edit to clarify

@jorisvandenbossche helpful comments, thank you. i found a nice way to retain the None/nan behavior while fixing the original bug: supplementing JoinUnit.is_na with a check that it is a compatible NA.

Comment on lines 239 to 243

if self.dtype == object:
values = self.block.values
return all(
is_valid_nat_for_dtype(x, dtype) for x in values.ravel(order="K")
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not only required for object dtype, I think. Also float NaN is considered "all NaN" when it comes to ignoring the dtype in concatting dataframes (and other dtypes as well I think):

In [39]: pd.concat([pd.DataFrame({'a': [np.nan, np.nan]}), pd.DataFrame({'a': [pd.Timestamp("2012-01-01")]})])
Out[39]: 
           a
0        NaT
1        NaT
0 2012-01-01

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the non-object case is handled below on L245-246. or do you have something else in mind?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does my snippet above work with this PR?
(if so, then I don't fully understand why the changes to test_append_empty_frame_to_series_with_dateutil_tz are needed)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does my snippet above work with this PR?

yes it does

(if so, then I don't fully understand why the changes to test_append_empty_frame_to_series_with_dateutil_tz are needed)

I think that's driven by something sketchy-looking in get_reindexed_values, will see if that can be addressed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better?

Co-authored-by: Joris Van den Bossche jorisvandenbossche@gmail.com

tm.assert_frame_equal(result_b, expected)

# column order is different
expected = expected[["c", "d", "date", "a", "b"]]
result = df.append([s, s], ignore_index=True)
dtype = Series([date]).dtype
expected["date"] = expected["date"].astype(dtype)