API: Restore implicit converter registration by TomAugspurger · Pull Request #18307 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation64 Commits26 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

TomAugspurger

Closes #18301

TODO:

@TomAugspurger

@TomAugspurger

I added the "special" section in the whatsnew for this and the changes we're making to NaN/empty sum.

@TomAugspurger

jorisvandenbossche

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me!

This caused problems for some users, so we're temporarily reverting that change;
pandas will again register the converters on import. Using the converters
without explicitly registering the formatters will cause a ``FutureWarning``:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not use 'the formatters' as it actually points to the same as 'converters' but with another word, just making it confusing. I would either just leave it out or use 'them' to avoid repeating converters

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meant to use formatters converters everywhere. Fixing.

>>> from pandas.tseries import converter
>>> converter.register()
As the error message says, you'll need to register the converts if you intend to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

converts -> converters

As the error message says, you'll need to register the converts if you intend to
use them with matplotlib plotting functions. Pandas plotting functions, such as
``Series.plot``, will register them for you. (:issue:`18301`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a bit more explicit like: "When using pandas plotting functions, such as Series.plot, the converters will be registered for you and you do not need to do this explicitly"

@@ -2352,7 +2354,6 @@ def boxplot_frame_groupby(grouped, subplots=True, column=None, fontsize=None,
>>> grouped = df.unstack(level='lvl1').groupby(level=0, axis=1)
>>> boxplot_frame_groupby(grouped, subplots=False)
"""
_setup()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one doesn't need to be replaced with _converter.WARN = False ?

plt = pytest.importorskip('matplotlib.pyplot')
def test_warns():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this duplicate with the other test file?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, forgot to remove it from my first commit.

@TomAugspurger

What are people's thoughts on moving the register import to pandas.plotting instead of pandas.tseries? I'm not sure I see the benefit, unless we're planning to remove pandas.tseries in the future.

@TomAugspurger

@codecov

@codecov

@jorisvandenbossche

Should we also add some language on how to avoid this warning without registering the converters by default? So if you in the future don't want to manually register them, but rely on the matplotlib ones (eg when working with datetime.datetime) ?

@TomAugspurger

I guess a pandas option?

pd.options.register_formatters : bool

And we check that before ever calling register?

@TomAugspurger

Though, future-proofing a bit, should that be spread across several options, one per type? We currently register for

    units.registry[lib.Timestamp] = DatetimeConverter()
    units.registry[Period] = PeriodConverter()
    units.registry[pydt.datetime] = DatetimeConverter()
    units.registry[pydt.date] = DatetimeConverter()
    units.registry[pydt.time] = TimeConverter()
    units.registry[np.datetime64] = DatetimeConverter()

In the ideal future, we would I think just do Timestamp and Period, and matplotlib would have datetime64, I don't know who wants to own python dates / times / datetimes.

cc @tacaswell if any of this interests you.

@TomAugspurger

@TomAugspurger

Added a rough implementation of the option in 16c33f1.

In [1]: import pandas as pd

In [2]: import matplotlib.pyplot as plt

In [3]: fig, ax = plt.subplots()

In [4]: s = pd.Series(range(12), index=pd.date_range('2017', periods=12))

In [5]: pd.options.plotting.matplotlib.register_formatters = False

In [6]: ax.plot(s) Out[6]: [<matplotlib.lines.Line2D at 0x1131ce588>]

@TomAugspurger

@jreback

What are people's thoughts on moving the register import to pandas.plotting instead of pandas.tseries? I'm not sure I see the benefit, unless we're planning to remove pandas.tseries in the future.

if its not too much trouble. I would like to at least deprecate pandas.tseries

@TomAugspurger

@tacaswell

jorisvandenbossche

As the error message says, you'll need to register the converters if you intend
to use them with matplotlib plotting functions. Pandas plotting functions, such
as ``Series.plot``, will register them for you calling ``converter.register()``

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"for you calling" -> "for you, and calling"

as ``Series.plot``, will register them for you calling ``converter.register()``
first is not necessary.
Finally, control the formatters, we've added a new option:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to" control

@jorisvandenbossche

Some other things related to the option:

jorisvandenbossche

converter.register()
with ctx:
with tm.assert_produces_warning(None) as w:
ax.plot(s.index, s.values)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you did set the option to False, it should not warn?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forget, that is what is being tested :-) (and they have been registered manually afterwards)

Should we test what happens if you do not register manually after you set the option?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean set the option to True or False? I think my latest commit does that.

After setting it to False, I think we should not warn even if they don't register manually (which is what my tests checks hopefully)

@TomAugspurger

It seems you kept the terminology of 'converters' (and not 'formatters')

Switched to formatters everywhere.

they will also not get registered when using pandas' plot methods ?

Correct. To be more specific, they're always registered when pandas is imported. The option toggles whether or not they're in the matplotlib.units.registry dict. True adds them, False removes them. That means that doing

In [8]: pd.options.plotting.matplotlib.register_formatters = False

In [9]: s.plot()

Will fall back to whatever matplotlib does with datetimes / periods (potentially an exception, e.g. for periods).

will also remove the converters for datetime.datetime, which are actually the default ones

Hmm, not sure about that. My registry is empty upon import:

In [1]: import matplotlib.units

In [2]: matplotlib.units.registry Out[2]: {}

Though maybe I'm misunderstanding. The intent is return the registry to the state before we mucked with it.

I would make the option maybe a bit shorter

Yeah, thoughts on pd.options.plotting.mpl.converters? I want the .mpl / matplotlib namespace for if / when we allow different engines for .plot.

@TomAugspurger

@TomAugspurger

@tacaswell

The unit registry gets filled on import. If you import matplotlib.dates they will be added. These imports are done in pyplot so most users have this import done.

@TomAugspurger

Ah thanks.

In [1]: import matplotlib.units

In [2]: import matplotlib.pyplot as plt

In [3]: matplotlib.units.registry Out[3]: {str: <matplotlib.category.StrCategoryConverter at 0x10beb7f60>, numpy.str_: <matplotlib.category.StrCategoryConverter at 0x10beb7f28>, bytes: <matplotlib.category.StrCategoryConverter at 0x10beb7ef0>, numpy.bytes_: <matplotlib.category.StrCategoryConverter at 0x10beb7f98>, datetime.date: <matplotlib.dates.DateConverter at 0x10bf2c208>, datetime.datetime: <matplotlib.dates.DateConverter at 0x10bf2c240>}

In [4]: import pandas as pd

In [5]: matplotlib.units.registry Out[5]: {str: <matplotlib.category.StrCategoryConverter at 0x10beb7f60>, numpy.str_: <matplotlib.category.StrCategoryConverter at 0x10beb7f28>, bytes: <matplotlib.category.StrCategoryConverter at 0x10beb7ef0>, numpy.bytes_: <matplotlib.category.StrCategoryConverter at 0x10beb7f98>, datetime.date: <pandas.plotting._converter.DatetimeConverter at 0x10bf2c240>, datetime.datetime: <pandas.plotting._converter.DatetimeConverter at 0x10eaa3828>, pandas._libs.tslib.Timestamp: <pandas.plotting._converter.DatetimeConverter at 0x10eaa3780>, pandas._libs.period.Period: <pandas.plotting._converter.PeriodConverter at 0x10eaa37b8>, datetime.time: <pandas.plotting._converter.TimeConverter at 0x10bf2c208>, numpy.datetime64: <pandas.plotting._converter.DatetimeConverter at 0x10eaac0f0>}

In [6]: pd.options.plotting.matplotlib.register_formatters = False

In [7]: matplotlib.units.registry Out[7]: {str: <matplotlib.category.StrCategoryConverter at 0x10beb7f60>, numpy.str_: <matplotlib.category.StrCategoryConverter at 0x10beb7f28>, bytes: <matplotlib.category.StrCategoryConverter at 0x10beb7ef0>, numpy.bytes_: <matplotlib.category.StrCategoryConverter at 0x10beb7f98>}

So not great... We could of course maintain our own global state when we overwrite them... Though we'll have to make sure matplotlib.pyplot is imported first. Let me give that a shot.

@jorisvandenbossche

I have no problem with providing this optinon, but why would you make this True by default? it is conceptually false now.

It has always been conceptually True (up to 0.21.0), so if we revert, then the default would be True. And from 0.21.0 you could say that it is conceptually False, but that is also not really correct, as it is False on import but True once you do plotting with pandas. In this PR, the False option will also not register them when plotting datetimes (so that will raise an error with current matplotlib)

@TomAugspurger

Another issue for down the road matplotlib/matplotlib#9779 doesn't handle DatetimeIndexes:

screen shot 2017-11-29 at 10 09 36 am

Though hopefully this can be handled on the matplotlib side by getting the values.

import pandas as pd import matplotlib.pyplot as plt from matplotlib import units

x = pd.date_range("2017", periods=120, freq='H') y = range(120) z = x.tz_localize("US/Central") %matplotlib inline

plt.plot(x, y);

@jorisvandenbossche

Yes, I think that should be easily solvable in matplotlib. I was still planning to run all the different reported examples in pandas issues related to this with matplotlib master to see if it would solve those cases.

@TomAugspurger

@TomAugspurger

For MPL 2.2+, do we want to overwrite their converters with our own? I think no, but I'll have to check the differences.

@jorisvandenbossche

To summarize, I think in the long term we want those options:

Before 0.21.0, 'True' was the default, and in the long term we want 'auto' to become the default.

But as a transition period, I think we want a hybrid of True/'auto', which would be 'auto' + deprecation warning for implicitly used converters (i.e. register our converters but with warn=True). And indeed there I would only do this warning for those types that are in the meantime not supported by matplotlib itself, and thus not overwrite the ones matplotlib in the meantime added.
Not overwriting those can still mean some breaking differences for users, but much less as is the case now (the underlying plotted float data are different, but in principle a normal user never cares about that + the formatting of the axes will be a bit different)

But, my proposal it to now do a complete revert for 0.21.1, and start the transition period only in 0.22.0 (or the first release when matplotlib 2.2 is out).

@TomAugspurger

@TomAugspurger

@TomAugspurger

Sorry for the delay on this. I'm finally done with traveling and conferences for a bit.

This PR now

So, after MPL 2.2, we should just need to change the default in register to be True, in which case we'll warn on matplotlib plotting methods.

@TomAugspurger

jorisvandenbossche

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good to me, thanks a lot!

Some minor comments.
Should we add a test that currenlty on import they are already registered?

for regular ``matplotlib.pyplot`` plotting methods, so we're temporarily
reverting that change; pandas will again register the converters on import.
Pandas plotting functions, such as ``Series.plot``, will continue to register
them for you.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence now feels a bit out of place, as the previous one already said that they are registered again on import.

.. code-block:: python
>>> from pandas.plotting import register_matplotlib_converters
>>> register_matplotlib_converters()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I would show that here. Because people who glance over it might think they need to add that to their code, which is not the case.

def register():
from pandas.plotting._converter import register as register_
msg = ("'pandas.tseries.converter.register' has been moved and renamed to "
"'pandas.plotting.register_converters'. ")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

register_converters -> register_matplotlib_converters()

@jreback

@TomAugspurger

Should we add a test that currenlty on import they are already registered?

I thought about that, but there are test-order issues with that. Do you know a clean way of running a single test in it's own process? I could use subprocess, but that seems fragile.

@TomAugspurger

@TomAugspurger

@jorisvandenbossche

@TomAugspurger

@TomAugspurger

Added ee7b457 with a test using subprocess. I worry that's fragile to PATH issues, but let's see.

tacaswell

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with this long term plan.

@jorisvandenbossche

TomAugspurger added a commit to TomAugspurger/pandas that referenced this pull request

Dec 8, 2017

@TomAugspurger

@jorisvandenbossche

I opened #18720 as a follow-up issue about the 'auto' behaviour (only modifying matplotlibs units registry in pandas' plotting context)

TomAugspurger added a commit that referenced this pull request

Dec 11, 2017

@TomAugspurger

This was referenced

Jan 27, 2019