API: Restore implicit converter registration by TomAugspurger · Pull Request #18307 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation64 Commits26 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Closes #18301
TODO:
- find best home for the import
- update deprecation message
I added the "special" section in the whatsnew for this and the changes we're making to NaN/empty sum.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me!
This caused problems for some users, so we're temporarily reverting that change; |
pandas will again register the converters on import. Using the converters |
without explicitly registering the formatters will cause a ``FutureWarning``: |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not use 'the formatters' as it actually points to the same as 'converters' but with another word, just making it confusing. I would either just leave it out or use 'them' to avoid repeating converters
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meant to use formatters converters everywhere. Fixing.
>>> from pandas.tseries import converter |
---|
>>> converter.register() |
As the error message says, you'll need to register the converts if you intend to |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
converts -> converters
As the error message says, you'll need to register the converts if you intend to |
use them with matplotlib plotting functions. Pandas plotting functions, such as |
``Series.plot``, will register them for you. (:issue:`18301`) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a bit more explicit like: "When using pandas plotting functions, such as Series.plot, the converters will be registered for you and you do not need to do this explicitly"
@@ -2352,7 +2354,6 @@ def boxplot_frame_groupby(grouped, subplots=True, column=None, fontsize=None, |
---|
>>> grouped = df.unstack(level='lvl1').groupby(level=0, axis=1) |
>>> boxplot_frame_groupby(grouped, subplots=False) |
""" |
_setup() |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one doesn't need to be replaced with _converter.WARN = False ?
plt = pytest.importorskip('matplotlib.pyplot') |
---|
def test_warns(): |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this duplicate with the other test file?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, forgot to remove it from my first commit.
What are people's thoughts on moving the register
import to pandas.plotting
instead of pandas.tseries
? I'm not sure I see the benefit, unless we're planning to remove pandas.tseries
in the future.
Should we also add some language on how to avoid this warning without registering the converters by default? So if you in the future don't want to manually register them, but rely on the matplotlib ones (eg when working with datetime.datetime) ?
I guess a pandas option?
pd.options.register_formatters : bool
And we check that before ever calling register?
Though, future-proofing a bit, should that be spread across several options, one per type? We currently register for
units.registry[lib.Timestamp] = DatetimeConverter()
units.registry[Period] = PeriodConverter()
units.registry[pydt.datetime] = DatetimeConverter()
units.registry[pydt.date] = DatetimeConverter()
units.registry[pydt.time] = TimeConverter()
units.registry[np.datetime64] = DatetimeConverter()
In the ideal future, we would I think just do Timestamp and Period, and matplotlib would have datetime64, I don't know who wants to own python dates / times / datetimes.
cc @tacaswell if any of this interests you.
Added a rough implementation of the option in 16c33f1.
In [1]: import pandas as pd
In [2]: import matplotlib.pyplot as plt
In [3]: fig, ax = plt.subplots()
In [4]: s = pd.Series(range(12), index=pd.date_range('2017', periods=12))
In [5]: pd.options.plotting.matplotlib.register_formatters = False
In [6]: ax.plot(s) Out[6]: [<matplotlib.lines.Line2D at 0x1131ce588>]
What are people's thoughts on moving the register import to pandas.plotting instead of pandas.tseries? I'm not sure I see the benefit, unless we're planning to remove pandas.tseries in the future.
if its not too much trouble. I would like to at least deprecate pandas.tseries
As the error message says, you'll need to register the converters if you intend |
to use them with matplotlib plotting functions. Pandas plotting functions, such |
as ``Series.plot``, will register them for you calling ``converter.register()`` |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"for you calling" -> "for you, and calling"
as ``Series.plot``, will register them for you calling ``converter.register()`` |
---|
first is not necessary. |
Finally, control the formatters, we've added a new option: |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"to" control
Some other things related to the option:
- It seems you kept the terminology of 'converters' (and not 'formatters') throughout the whatsnew and code? (which is I think the terminology of matplotlib, as the converter calls a formatter). But the option is called 'register_formatters'. So we should choose here.
- When you set the
pd.options.plotting.matplotlib.register_formatters = False
, they will also not get registered when using pandas' plot methods ? I don't think that is the case, but shouldn't it be? - The "unregistering" (setting
pd.options.plotting.matplotlib.register_formatters = False
) will also remove the converters for datetime.datetime, which are actually the default ones. Ideally they are set back as well. @tacaswell Is there a way to restore the default units? (like with rcParams). Of course we can also save them when we overwrite them, to set it back properly. - I would make the option maybe a bit shorter, like
pd.options.plotting.register_matplotlib_formatters
(not shorter, but a '.' less so quicker to tab-complete) ?
converter.register() |
---|
with ctx: |
with tm.assert_produces_warning(None) as w: |
ax.plot(s.index, s.values) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you did set the option to False, it should not warn?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, forget, that is what is being tested :-) (and they have been registered manually afterwards)
Should we test what happens if you do not register manually after you set the option?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean set the option to True or False? I think my latest commit does that.
After setting it to False, I think we should not warn even if they don't register manually (which is what my tests checks hopefully)
It seems you kept the terminology of 'converters' (and not 'formatters')
Switched to formatters everywhere.
they will also not get registered when using pandas' plot methods ?
Correct. To be more specific, they're always registered when pandas is imported. The option toggles whether or not they're in the matplotlib.units.registry
dict. True
adds them, False
removes them. That means that doing
In [8]: pd.options.plotting.matplotlib.register_formatters = False
In [9]: s.plot()
Will fall back to whatever matplotlib does with datetimes / periods (potentially an exception, e.g. for periods).
will also remove the converters for datetime.datetime, which are actually the default ones
Hmm, not sure about that. My registry is empty upon import:
In [1]: import matplotlib.units
In [2]: matplotlib.units.registry Out[2]: {}
Though maybe I'm misunderstanding. The intent is return the registry to the state before we mucked with it.
I would make the option maybe a bit shorter
Yeah, thoughts on pd.options.plotting.mpl.converters
? I want the .mpl
/ matplotlib namespace for if / when we allow different engines for .plot
.
The unit registry gets filled on import. If you import matplotlib.dates
they will be added. These imports are done in pyplot
so most users have this import done.
Ah thanks.
In [1]: import matplotlib.units
In [2]: import matplotlib.pyplot as plt
In [3]: matplotlib.units.registry Out[3]: {str: <matplotlib.category.StrCategoryConverter at 0x10beb7f60>, numpy.str_: <matplotlib.category.StrCategoryConverter at 0x10beb7f28>, bytes: <matplotlib.category.StrCategoryConverter at 0x10beb7ef0>, numpy.bytes_: <matplotlib.category.StrCategoryConverter at 0x10beb7f98>, datetime.date: <matplotlib.dates.DateConverter at 0x10bf2c208>, datetime.datetime: <matplotlib.dates.DateConverter at 0x10bf2c240>}
In [4]: import pandas as pd
In [5]: matplotlib.units.registry Out[5]: {str: <matplotlib.category.StrCategoryConverter at 0x10beb7f60>, numpy.str_: <matplotlib.category.StrCategoryConverter at 0x10beb7f28>, bytes: <matplotlib.category.StrCategoryConverter at 0x10beb7ef0>, numpy.bytes_: <matplotlib.category.StrCategoryConverter at 0x10beb7f98>, datetime.date: <pandas.plotting._converter.DatetimeConverter at 0x10bf2c240>, datetime.datetime: <pandas.plotting._converter.DatetimeConverter at 0x10eaa3828>, pandas._libs.tslib.Timestamp: <pandas.plotting._converter.DatetimeConverter at 0x10eaa3780>, pandas._libs.period.Period: <pandas.plotting._converter.PeriodConverter at 0x10eaa37b8>, datetime.time: <pandas.plotting._converter.TimeConverter at 0x10bf2c208>, numpy.datetime64: <pandas.plotting._converter.DatetimeConverter at 0x10eaac0f0>}
In [6]: pd.options.plotting.matplotlib.register_formatters = False
In [7]: matplotlib.units.registry Out[7]: {str: <matplotlib.category.StrCategoryConverter at 0x10beb7f60>, numpy.str_: <matplotlib.category.StrCategoryConverter at 0x10beb7f28>, bytes: <matplotlib.category.StrCategoryConverter at 0x10beb7ef0>, numpy.bytes_: <matplotlib.category.StrCategoryConverter at 0x10beb7f98>}
So not great... We could of course maintain our own global state when we overwrite them... Though we'll have to make sure matplotlib.pyplot
is imported first. Let me give that a shot.
I have no problem with providing this optinon, but why would you make this True by default? it is conceptually false now.
It has always been conceptually True
(up to 0.21.0), so if we revert, then the default would be True. And from 0.21.0 you could say that it is conceptually False, but that is also not really correct, as it is False on import but True once you do plotting with pandas. In this PR, the False
option will also not register them when plotting datetimes (so that will raise an error with current matplotlib)
Another issue for down the road matplotlib/matplotlib#9779 doesn't handle DatetimeIndexes
:
Though hopefully this can be handled on the matplotlib side by getting the values
.
import pandas as pd import matplotlib.pyplot as plt from matplotlib import units
x = pd.date_range("2017", periods=120, freq='H') y = range(120) z = x.tz_localize("US/Central") %matplotlib inline
plt.plot(x, y);
Yes, I think that should be easily solvable in matplotlib. I was still planning to run all the different reported examples in pandas issues related to this with matplotlib master to see if it would solve those cases.
For MPL 2.2+, do we want to overwrite their converters with our own? I think no, but I'll have to check the differences.
To summarize, I think in the long term we want those options:
- False: never register our converters (also not when pandas plotting is used)
- True: register our converters, so also available for pure matplotlib plots
- 'auto': only use our converters for plots made by our
.plot(..)
method. Here I think we still want to overwrite matplotlib's converters (but that's maybe up for discussion)
Before 0.21.0, 'True' was the default, and in the long term we want 'auto' to become the default.
But as a transition period, I think we want a hybrid of True/'auto', which would be 'auto' + deprecation warning for implicitly used converters (i.e. register our converters but with warn=True). And indeed there I would only do this warning for those types that are in the meantime not supported by matplotlib itself, and thus not overwrite the ones matplotlib in the meantime added.
Not overwriting those can still mean some breaking differences for users, but much less as is the case now (the underlying plotted float data are different, but in principle a normal user never cares about that + the formatting of the axes will be a bit different)
But, my proposal it to now do a complete revert for 0.21.1, and start the transition period only in 0.22.0 (or the first release when matplotlib 2.2 is out).
Sorry for the delay on this. I'm finally done with traveling and conferences for a bit.
This PR now
- Adds the options for easily adding / removing our converters
- Moved
pandas.tseries.register
topandas.plotting.register_matplotlib_converters
- Adds the scaffolding for warning in the next version about implicitly registered formatters, but doesn't actually use this yet, as the default is to not warn
So, after MPL 2.2, we should just need to change the default in register
to be True, in which case we'll warn on matplotlib plotting methods.
- actually switch the default to not warn
- We do overwrite matplotlib's formatters
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good to me, thanks a lot!
Some minor comments.
Should we add a test that currenlty on import they are already registered?
for regular ``matplotlib.pyplot`` plotting methods, so we're temporarily |
---|
reverting that change; pandas will again register the converters on import. |
Pandas plotting functions, such as ``Series.plot``, will continue to register |
them for you. |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence now feels a bit out of place, as the previous one already said that they are registered again on import.
.. code-block:: python |
---|
>>> from pandas.plotting import register_matplotlib_converters |
>>> register_matplotlib_converters() |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if I would show that here. Because people who glance over it might think they need to add that to their code, which is not the case.
def register(): |
---|
from pandas.plotting._converter import register as register_ |
msg = ("'pandas.tseries.converter.register' has been moved and renamed to " |
"'pandas.plotting.register_converters'. ") |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
register_converters -> register_matplotlib_converters()
Should we add a test that currenlty on import they are already registered?
I thought about that, but there are test-order issues with that. Do you know a clean way of running a single test in it's own process? I could use subprocess, but that seems fragile.
Added ee7b457 with a test using subprocess. I worry that's fragile to PATH issues, but let's see.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy with this long term plan.
TomAugspurger added a commit to TomAugspurger/pandas that referenced this pull request
API: Restore implicit converter registration
Remove matplotlib from blacklist
fixup! Remove matplotlib from blacklist
Add option for toggling formatters
Remove move
Handle no matplotlib
Cleanup
Test no register
Restore original state
Added deregister
Doc, naming
Naming
Added deprecation
PEP8
Fix typos
Rename it all
Missed one
Check version
No warnings by default
Update release notes
Test fixup
- actually switch the default to not warn
- We do overwrite matplotlib's formatters
Doc update
Fix deprecation message
Test added by default
I opened #18720 as a follow-up issue about the 'auto' behaviour (only modifying matplotlibs units registry in pandas' plotting context)
TomAugspurger added a commit that referenced this pull request
API: Restore implicit converter registration
Remove matplotlib from blacklist
fixup! Remove matplotlib from blacklist
Add option for toggling formatters
Remove move
Handle no matplotlib
Cleanup
Test no register
Restore original state
Added deregister
Doc, naming
Naming
Added deprecation
PEP8
Fix typos
Rename it all
Missed one
Check version
No warnings by default
Update release notes
Test fixup
- actually switch the default to not warn
- We do overwrite matplotlib's formatters
Doc update
Fix deprecation message
Test added by default
This was referenced
Jan 27, 2019