DEPR: Deprecate NDFrame.as_matrix by topper-123 · Pull Request #18458 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation44 Commits3 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

topper-123

Deprecating NDFrame.as_matrix as per discussion in #18262.

@codecov

jorisvandenbossche

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Left some comments

@@ -82,7 +82,7 @@ Deprecations
~~~~~~~~~~~~
- ``Series.from_array`` and ``SparseSeries.from_array`` are deprecated. Use the normal constructor ``Series(..)`` and ``SparseSeries(..)`` instead (:issue:`18213`).
-
- ``NDFrame.as_matrix`` is deprecated. Use ``NDFrame.values`` instead (:issue:`18458`).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use DataFrame instead of NDFrame (basically NDFrame should never show up in the docs, it is an internal implementation detail, although there are some places where it leaks through ..)

@@ -3735,6 +3735,9 @@ def _get_bool_data(self):
def as_matrix(self, columns=None):
"""
DEPRECATED: This method will be removed in a future version.
Use :meth:`NDFrame.values` instead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here

@@ -3770,6 +3773,8 @@ def as_matrix(self, columns=None):
--------
pandas.DataFrame.values
"""
warnings.warn("This method will be removed in a future version. "
"Use ``.values`` instead.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a FutureWarning, and you also need to set a stacklevel (this is a top-level method, stacklevel=2 should be the correct one)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also mention as_matrix explicitly in the message (that is clearer as you don't always directly see what line or what method on a certain line is causing the warning).
And also follow a bit the typical message like "'as_matrix' is deprecated and will be removed in a future version. Use .."

@@ -243,9 +243,9 @@ def test_itertuples(self):
def test_len(self):
assert len(self.frame) == len(self.frame.index)
def test_as_matrix(self):
def test_values(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave one test (or now add a test_as_matrix_deprecated that copies eg the first case of this test) that uses as_matrix and that asserts it raises a warning and is the same as .values

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I've added in below test_repr_with_mi_nat

@jorisvandenbossche

General question: I am wondering if we shouldn't rather recommend to do np.asarray(df) instead of df.values ?

@topper-123

Interesting, does it conform to .values in all cases?.

Anyway, I suggest that that will be a separate issue from this one.

@topper-123

I've changed the PR according to the comments. Is this ok?

jreback

@@ -3791,7 +3796,10 @@ def values(self):
int32. By numpy.find_common_type convention, mixing int64 and uint64
will result in a flot64 dtype.
"""
return self.as_matrix()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just return .values

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the def of .values

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, then we should push this entirely down to the BlockManager, so pass the axis as well. The BM can then do the transpose, we don't like to do things in user code like this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should actually deprecate .values as well, in favor of .to_numpy() which is the future spelling anyhow.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already almost entirely pushed down, since it is calling directly BlockManager.as_matrix (but OK, the transpose could be done inside BlockManager.as_matrix). The consolidate_inplace needs to happens in NDFrame I think.

Deprecating .values is a no-go IMO, but let's deprecate that in another issue if you want

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, then we should push this entirely down to the BlockManager, so pass the axis as well. The BM can then do the transpose, we don't like to do things in user code like this.

I'm not too familiar with the BlockManager, except accessing it with ._data. How do I pass the axis to it, and how do I transpose in a BlockManager? E.g. it doesn't have a .transpose method.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look at _take in pandas/core/generic.py, you just pass the get_block_manager_axis to it.

then add axis= to def as_matrix in pandas/core/internals and do the transpose there.

The reason for pushing this to internals is to avoid all of the wrapping layers (e.g. frame/series, etc.) to know about internal details like this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as_matrix doesn't have a axis parameter. Do you mean adding a new axis parameter there and transposing inside, if axis=0 (where I assume from my limited knowledge that BlockManager.axes[0] is always the same as dataFrame.axes[1], correct?)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jreback, I've tried quite a bit today and I can't push this down while getting the tests to pass.

Could you look at my latest commit (not passing ATM) and give some advice?

@@ -369,6 +369,12 @@ def test_values(self):
self.frame.values[:, 0] = 5.
assert (self.frame.values[:, 0] == 5).all()
def test_as_matrix_deprecated(self):
with tm.assert_produces_warning(FutureWarning):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

jreback

if len(self.blocks) == 0:
return np.empty(self.shape, dtype=float)
other_axis = abs(axis-1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't do this

if items is not None:
mgr = self.reindex_axis(items, axis=0)
mgr = self.reindex_axis(items, axis=other_axis)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave this alone

return self.as_matrix()
self._consolidate_inplace()
bm_axis = self._get_block_manager_axis(axis=1)
return self._data.as_matrix(axis=bm_axis)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this, just pass transpose = self._AXIS_REVERSED in.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But _data.as_matrix has no transpose parameter, do you mean axis=self._AXIS_REVERSED?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no I mean add the argument transpose=False rather than axis prob simpler for now.

else:
mgr = self
if self._is_single_block or not self.is_mixed_type:
return mgr.blocks[0].get_values()
arr = mgr.blocks[0].get_values()
else:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this to transpose instead. should be straightforward from here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, transpose is of course much better than axis.

The issue was actually in the if len(self.blocks) == 0: block, as the empty array also must be transposed.

Everything is green now locally and I've pushed that upstream.

@topper-123

A thought: BlockManager.as_matrix is a strange name as it implies a np.matrix is returned.

Should I not just change it to BlockManager.as_array? Or, do any external libraries rely on calling BlockManager directly, e.g. statsmodel?

@jreback

@jreback

Should I not just change it to BlockManager.as_array? Or, do any external libraries rely on calling BlockManager directly, e.g. statsmodel?

this is completely private and internal. as_array would be fine. while you are at it, can you add a doc-string.

ping when ready.

jreback

self._consolidate_inplace()
if self._AXIS_REVERSED:
return self._data.as_matrix(columns).T

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should just return self.values

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning self.values implies that columns=None which isn't necessary true for user code.

@@ -3842,7 +3848,7 @@ def as_blocks(self, copy=True):
.. deprecated:: 0.21.0
NOTE: the dtypes of the blocks WILL BE PRESERVED HERE (unlike in
as_matrix)
as_array)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave this spelling, this is a top-level method name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

@@ -3670,19 +3670,22 @@ def copy(self, deep=True, mgr=None):
return self.apply('copy', axes=new_axes, deep=deep,
do_integrity_check=False)
def as_matrix(self, items=None):
def as_array(self, transpose=False, items=None):
if len(self.blocks) == 0:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc-string

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

jreback

@@ -3770,10 +3773,12 @@ def as_matrix(self, columns=None):
--------
pandas.DataFrame.values
"""
warnings.warn("method ``as_matrix`` will be removed in a future version. "
"Use ``values`` instead.", FutureWarning, stacklevel=2)
self._consolidate_inplace()
if self._AXIS_REVERSED:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so this should be the same as below (e.g. passing transpose=). make sure we still have a test on this (e.g. that passes columns).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, and test_as_matrix_deprecated has been modified to a take columns param.

jreback

@@ -3770,10 +3773,11 @@ def as_matrix(self, columns=None):
--------
pandas.DataFrame.values
"""
warnings.warn("method ``as_matrix`` will be removed in a future version. "

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"method method .as_matrix()...."
"Use .values instead"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I not sure I understand, assume you want the backticks gone. Change uploaded.

@topper-123

jreback

@jreback

giantZorg added a commit to giantZorg/multivariate_time_series_forecasting that referenced this pull request

Feb 4, 2020

@giantZorg

atangfan added a commit to atangfan/Deep-Portfolio-Management that referenced this pull request

May 11, 2020

@atangfan

See GH18458(pandas-dev/pandas#18458). Since as_matrix is deprecated for pd.DataFrame and pd.Series, use data.values instead.

jnirschl added a commit to jnirschl/risk-slim that referenced this pull request

Aug 10, 2020

@jnirschl