CLN: clean color selection in _matplotlib/style by ivanovmg · Pull Request #37203 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation50 Commits26 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

ivanovmg

There was a comment in the top of the module that the code was a bit too dynamic.
The logic in the function was too complex.

I refactored the code

Note: there is one issue with mypy, which I do not know how to handle. Currently I suggested to ignore the error.

charlesdong1991

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def get_standard_colors(
num_colors: int, colormap=None, color_type: str = "default", color=None
num_colors: int,
colormap=None,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you are doing the refactor here, maybe can you annotate this as well?

num_colors: int,
colormap=None,
color_type: str = "default",
color=None,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also annotate this?

def _get_colors_from_colormap(
colormap: Union[str, "Colormap"],
num_colors: int,
) -> Collection[Color]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm, i have seen you use Collection in some places in this file, while based on the code, seems almost they are all List, should we just use List instead?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use Collection for arguments to be permissive where appropriate, but return type should be more precise, in this case List.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will adjust for return types. I remember exactly @simonjayhawkins suggesting to use more generic types. Thus, I used collection.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. that's correct for function arguments. return types should be precise.

def _get_colors_from_color(
color: Union[Color, Dict[str, Color], Collection[Color]],
) -> Union[Dict[str, Color], Collection[Color]]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe change Collection to List?

) -> Union[Dict[str, Color], Collection[Color]]:
"""Get colors from user input color."""
if isinstance(color, Iterable) and len(color) == 0:
raise ValueError("Invalid color argument: {color}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need an f in front?

just to clarify: what was the reason to check ValueError here? seems original error message gives an empty string, while here is an empty iterable? is this on purpose?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this check to avoid try/except ZeroDivisionError (in the original code), which would be happening when color == "". I will make it an f-string, I missed that.

Comment on lines 106 to 122

if isinstance(color, dict):
return color
if isinstance(color, str):
if _is_single_color(color):
# GH #36972
return [color]
else:
return list(color)
# ignoring mypy error here
# error: Argument 1 to "list" has incompatible type
# "Union[Sequence[float], Collection[Union[str, Sequence[float]]]]";
# expected "Iterable[Union[str, Sequence[float]]]" [arg-type]
# A this point color may be sequence of floats or series of colors,
# all convertible to list
return list(color) # type: ignore [arg-type]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm, this seems to be a bit more complex than the original comprehension (IIUC, you are adding a patch to convert single color to list in this, right?) is it possible to also write in a comprehension expression?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original portion of the code was exactly quite dynamic. I decided to make it more explicit. I now understand that I missed one case here (which does not hit in tests). This case is when color is sequence of floats (apparently the internal code calls get_standard_colors only with color being sequence of sequence of floats). I will add this edge case and the corresponding test. Presumably, mypy errors will also be resolved.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly speaking, I kind of liked the original comprehension. The problem is that it was not compatible with typing.

return list(color) # type: ignore [arg-type]
def _get_colors_from_color_type(color_type: str, num_colors: int) -> Collection[Color]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List should be enough

raise ValueError("color_type must be either 'default' or 'random'")
def _get_default_colors(num_colors: int) -> Collection[Color]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List should be enough

colors = [c["color"] for c in list(plt.rcParams["axes.prop_cycle"])]
except KeyError:
colors = list(plt.rcParams.get("axes.color_cycle", list("bgrcmyk")))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the original code, seems there is a check of colors in this branch, so does it mean after this refactoring, this won't happen? or colors shouldn't be a str in the first place once reaching this part of code?

if isinstance(colors, str): colors = list(colors)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, in this particular function colors will not be a string. This check is effectively carried out in _get_colors_from_color.

simonjayhawkins

def _get_colors_from_colormap(
colormap: Union[str, "Colormap"],
num_colors: int,
) -> Collection[Color]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use Collection for arguments to be permissive where appropriate, but return type should be more precise, in this case List.

# expected "Iterable[Union[str, Sequence[float]]]" [arg-type]
# A this point color may be sequence of floats or series of colors,
# all convertible to list
return list(color) # type: ignore [arg-type]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ci failure could be fixed with

return list(color) # type: ignore [arg-type]
return list(color) # type: ignore[arg-type]

but please avoid using ignore instead.

@ivanovmg

@ivanovmg

It turns out that even if "axes.prop_cycle" is not defined, then it will be composed of CN colors (hex notation). However, in the current matplotlib there is no "axes.color_cycle" property in rcParams at all (not even allowed to setup).

@ivanovmg

@ivanovmg

@ivanovmg

@ivanovmg

@ivanovmg

Here is a problem. I want to install Cycler (https://matplotlib.org/cycler/) for testing purposes, to mimic matplotlib behavior on default color selection. When I add it into environment.yml, it is rejected by pre-commit check. Moreover, it seems that this package is never installed anyway, which leads to the failure of the tests requiring Cycler.

@ivanovmg

@ivanovmg

I have some weird problem with the new test module test_style.py. It works fine in my local virtual environment, but fails in CI (some importing problem).

_____________ ERROR collecting pandas/tests/plotting/test_style.py _____________
ImportError while importing test module '/home/vsts/work/1/s/pandas/tests/plotting/test_style.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../miniconda3/envs/pandas-dev/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
pandas/tests/plotting/test_style.py:5: in <module>
    from pandas.plotting._matplotlib.style import get_standard_colors
pandas/plotting/_matplotlib/__init__.py:3: in <module>
    from pandas.plotting._matplotlib.boxplot import (
pandas/plotting/_matplotlib/boxplot.py:5: in <module>
    from matplotlib.artist import setp
E   ModuleNotFoundError: No module named 'matplotlib'

@ivanovmg

All checks are good except Windows py38_np18, where tests related to numba failed. I guess it is irrelevant of my changes.

jreback

@jreback

@ivanovmg

I checked that we actually can use cycler.
But the problem is that tests must be changed.
Currently it is acceptable to return more colors that num_colors.

When I use cycler I would prefer to keep only required number of colors:

def _cycle_colors(colors: List[Color], num_colors: int) -> Iterator[Color]:
    import matplotlib.pyplot as plt

    cycle = plt.cycler(color=colors)
    for _, item in zip(range(num_colors), cycle()):
        yield item["color"]

This looks simple to understand, IMHO.
And it seems more reasonable that we return exactly num_colors colors, without any extra (although extra colors are ignored anyway).
But are we good that I go ahead and change some tests?
Or maybe leave it to another PR?

@jreback

simonjayhawkins

with suppress(ImportError):
from pandas.plotting._matplotlib.style import get_standard_colors
@td.skip_if_no_mpl

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could just use pytest.importorskip see pandas\tests\plotting\test_converter.py or other test modules testing optional dependencies. e.g. pytables

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did that. Please check if this is what you suggested.

@ivanovmg

charlesdong1991

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for late response, i was fallen ill in the past week.

@ivanovmg looks very good to me and original logics seem to be covered and unchanged! nice job!

only two minor suggestions on the tests to make the tests slightly shorter, also okay if not change.

(7, ["b", "g", "r", "y", "b", "g", "r"]),
],
)
def test_default_colors_named_from_prop_cycle_string(self, num_colors, expected):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you could combine this one with the test above, and add cycler_color to parameterize since those two tests look almost identical and test basically the same except how colors are defined

Comment on lines +102 to +107

(1, ["r", "g", "b", "k"]),
(2, ["r", "g", "b", "k"]),
(3, ["r", "g", "b", "k"]),
(4, ["r", "g", "b", "k"]),
(5, ["r", "g", "b", "k", "r"]),
(6, ["r", "g", "b", "k", "r", "g"]),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's nicer to just test the case where num_color </>/= color, so 3 parametrizations is enough. no need to change here though ^^

WillAyd

return [color]
if _is_floats_color(color):
color = cast(Sequence[float], color)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cast seems unnecessary - is there an error this solves?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, cast is necessary.
At this point color is of type Union[Color, Collection[Color]]. And for some reason _is_floats_color check does not filter out Collection[Color]. So, instead of ignoring, I added cast here.

mypy error if removing cast.

pandas\plotting\_matplotlib\style.py:114: error: List item 0 has incompatible type "Union[Union[str, Sequence[float]], Collection[Union[str, Sequence[float]]]]"; expected "Union[str, Sequence[float]]"  [list-item]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functions can't narrow types yet in mypy, but regardless this is pretty confusing. Can you try to refactor this to make things clearer?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know how to make it even clearer.
The logic is the following.

  1. Start with color being maybe
    a. a single string color
    b. or multiple string color (like 'rgbk')
    c. or single float color (0.1, 0.2, 0.3)
    d. or multiple float colors
  2. Address option a.
  3. Address option b.
  4. Address options c and d.

Is the logic confusing or the two cast statements are?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd, I updated the logic a little bit.

  1. Address one color (either string or sequence of floats)
  2. Address collection of colors

Casting is still there as there are no other options to make mypy happy.

Please let me know what you think about this implementation.

@ivanovmg

  1. Rename _is_single_color -> _is_single_string_color
  2. Extract new method _is_single_color, which checks whether the color provided is a single color, that can be either string or a sequence of floats.
  3. Allow integers to be a part of float sequence color, (1, 0.4, 0.2, 0.5), for example.

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few minor comments, ping on green.

color_type: str,
num_colors: int,
) -> List[Color]:
"""Get colors from user input."""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc-string.

also the current summary is not very descriptive, nor is the function name .

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it into _derive_colors. I am not sure if you would consider it a better alternative. I thought of having a more verbose name (like _derive_colors_from_cmap_color_and_type, but that does not seem reasonable to me, although I added the explanation in the docstring.
Anyway derive seems a better verb, implying some logic underneath.

@ivanovmg

jreback

@jreback

jreback added a commit that referenced this pull request

Nov 13, 2020

… (#37655)

of test_frame.py to test_frame_groupby.py and test_frame_subplots.py

of test_frame.py to test_frame_subplots.py

of test_frame.py to test_frame_groupby.py, test_frame_subplots.py, test_frame_color.py

Co-authored-by: Jeff Reback jeff@reback.net

Co-authored-by: Jun Kudo jun-lab@junnoMacBook-Pro.local

"columns(s)" sounded odd, I believe it was supposed to be just "column(s)".

of test_frame.py to test_frame_color.py, test_frame_groupby.py and test_frame_subplots.py

pandas/tests/plotting/frame/test_frame.py

pandas/tests/plotting/frame/test_frame_color.py

pandas/tests/plotting/frame/test_frame_subplots.py

of test_frame.py to test_frame_color.py, test_frame_groupby.py and test_frame_subplots.py

of test_frame.py to test_frame_subplots.py

of test_frame.py to test_frame_groupby.py, test_frame_subplots.py, test_frame_color.py

Co-authored-by: jbrockmendel jbrockmendel@gmail.com Co-authored-by: Marco Gorelli m.e.gorelli@gmail.com Co-authored-by: Philip Cerles philip.cerles@gmail.com Co-authored-by: Joris Van den Bossche jorisvandenbossche@gmail.com Co-authored-by: Sven sven.schellenberg@paradynsystems.com Co-authored-by: Micael Jarniac micael@jarniac.com Co-authored-by: Andrew Wieteska 48889395+arw2019@users.noreply.github.com Co-authored-by: Maxim Ivanov 41443370+ivanovmg@users.noreply.github.com Co-authored-by: Erfan Nariman 34067903+erfannariman@users.noreply.github.com Co-authored-by: Fangchen Li fangchen.li@outlook.com Co-authored-by: patrick 61934744+phofl@users.noreply.github.com Co-authored-by: attack68 24256554+attack68@users.noreply.github.com Co-authored-by: Torsten Wörtwein twoertwein@users.noreply.github.com Co-authored-by: Jeff Reback jeff@reback.net Co-authored-by: Janus janus@insignificancegalore.net Co-authored-by: Joel Whittier rootbeerfriend@gmail.com Co-authored-by: taytzehao jtth95@gmail.com Co-authored-by: ma3da 34522496+ma3da@users.noreply.github.com Co-authored-by: junk juntrp0207@gmail.com Co-authored-by: Jun Kudo jun-lab@junnoMacBook-Pro.local Co-authored-by: Alex Kirko alexander.kirko@gmail.com Co-authored-by: Yassir Karroum ukarroum17@gmail.com Co-authored-by: Kaiqi Dong kaiqi@kth.se Co-authored-by: Richard Shadrach 45562402+rhshadrach@users.noreply.github.com Co-authored-by: Simon Hawkins simonjayhawkins@gmail.com