CLN: clean color selection in _matplotlib/style by ivanovmg · Pull Request #37203 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation50 Commits26 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
- closes #xxxx
- tests added / passed
- passes
black pandas
- passes
git diff upstream/master -u -- "*.py" | flake8 --diff
- whatsnew entry
There was a comment in the top of the module that the code was a bit too dynamic.
The logic in the function was too complex.
I refactored the code
- Extract functions
- Simplify logic
- Add type annotations
Note: there is one issue with mypy, which I do not know how to handle. Currently I suggested to ignore the error.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_standard_colors( |
num_colors: int, colormap=None, color_type: str = "default", color=None |
num_colors: int, |
colormap=None, |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you are doing the refactor here, maybe can you annotate this as well?
num_colors: int, |
---|
colormap=None, |
color_type: str = "default", |
color=None, |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also annotate this?
def _get_colors_from_colormap( |
---|
colormap: Union[str, "Colormap"], |
num_colors: int, |
) -> Collection[Color]: |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm, i have seen you use Collection
in some places in this file, while based on the code, seems almost they are all List
, should we just use List
instead?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can use Collection for arguments to be permissive where appropriate, but return type should be more precise, in this case List.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will adjust for return types. I remember exactly @simonjayhawkins suggesting to use more generic types. Thus, I used collection.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. that's correct for function arguments. return types should be precise.
def _get_colors_from_color( |
color: Union[Color, Dict[str, Color], Collection[Color]], |
) -> Union[Dict[str, Color], Collection[Color]]: |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe change Collection
to List
?
) -> Union[Dict[str, Color], Collection[Color]]: |
---|
"""Get colors from user input color.""" |
if isinstance(color, Iterable) and len(color) == 0: |
raise ValueError("Invalid color argument: {color}") |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need an f
in front?
just to clarify: what was the reason to check ValueError here? seems original error message gives an empty string, while here is an empty iterable? is this on purpose?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this check to avoid try/except ZeroDivisionError (in the original code), which would be happening when color == "". I will make it an f-string, I missed that.
Comment on lines 106 to 122
if isinstance(color, dict): |
---|
return color |
if isinstance(color, str): |
if _is_single_color(color): |
# GH #36972 |
return [color] |
else: |
return list(color) |
# ignoring mypy error here |
# error: Argument 1 to "list" has incompatible type |
# "Union[Sequence[float], Collection[Union[str, Sequence[float]]]]"; |
# expected "Iterable[Union[str, Sequence[float]]]" [arg-type] |
# A this point color may be sequence of floats or series of colors, |
# all convertible to list |
return list(color) # type: ignore [arg-type] |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm, this seems to be a bit more complex than the original comprehension (IIUC, you are adding a patch to convert single color to list in this, right?) is it possible to also write in a comprehension expression?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original portion of the code was exactly quite dynamic. I decided to make it more explicit. I now understand that I missed one case here (which does not hit in tests). This case is when color is sequence of floats (apparently the internal code calls get_standard_colors only with color being sequence of sequence of floats). I will add this edge case and the corresponding test. Presumably, mypy errors will also be resolved.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly speaking, I kind of liked the original comprehension. The problem is that it was not compatible with typing.
return list(color) # type: ignore [arg-type] |
---|
def _get_colors_from_color_type(color_type: str, num_colors: int) -> Collection[Color]: |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List should be enough
raise ValueError("color_type must be either 'default' or 'random'") |
---|
def _get_default_colors(num_colors: int) -> Collection[Color]: |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List should be enough
colors = [c["color"] for c in list(plt.rcParams["axes.prop_cycle"])] |
---|
except KeyError: |
colors = list(plt.rcParams.get("axes.color_cycle", list("bgrcmyk"))) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the original code, seems there is a check of colors
in this branch, so does it mean after this refactoring, this won't happen? or colors
shouldn't be a str in the first place once reaching this part of code?
if isinstance(colors, str): colors = list(colors)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, in this particular function colors will not be a string. This check is effectively carried out in _get_colors_from_color
.
def _get_colors_from_colormap( |
---|
colormap: Union[str, "Colormap"], |
num_colors: int, |
) -> Collection[Color]: |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can use Collection for arguments to be permissive where appropriate, but return type should be more precise, in this case List.
# expected "Iterable[Union[str, Sequence[float]]]" [arg-type] |
---|
# A this point color may be sequence of floats or series of colors, |
# all convertible to list |
return list(color) # type: ignore [arg-type] |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ci failure could be fixed with
return list(color) # type: ignore [arg-type] |
---|
return list(color) # type: ignore[arg-type] |
but please avoid using ignore instead.
It turns out that even if "axes.prop_cycle" is not defined, then it will be composed of CN colors (hex notation). However, in the current matplotlib there is no "axes.color_cycle" property in rcParams at all (not even allowed to setup).
Here is a problem. I want to install Cycler (https://matplotlib.org/cycler/) for testing purposes, to mimic matplotlib behavior on default color selection. When I add it into environment.yml
, it is rejected by pre-commit check. Moreover, it seems that this package is never installed anyway, which leads to the failure of the tests requiring Cycler.
I have some weird problem with the new test module test_style.py
. It works fine in my local virtual environment, but fails in CI (some importing problem).
_____________ ERROR collecting pandas/tests/plotting/test_style.py _____________
ImportError while importing test module '/home/vsts/work/1/s/pandas/tests/plotting/test_style.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../miniconda3/envs/pandas-dev/lib/python3.7/importlib/__init__.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
pandas/tests/plotting/test_style.py:5: in <module>
from pandas.plotting._matplotlib.style import get_standard_colors
pandas/plotting/_matplotlib/__init__.py:3: in <module>
from pandas.plotting._matplotlib.boxplot import (
pandas/plotting/_matplotlib/boxplot.py:5: in <module>
from matplotlib.artist import setp
E ModuleNotFoundError: No module named 'matplotlib'
All checks are good except Windows py38_np18, where tests related to numba failed. I guess it is irrelevant of my changes.
I checked that we actually can use cycler.
But the problem is that tests must be changed.
Currently it is acceptable to return more colors that num_colors
.
When I use cycler I would prefer to keep only required number of colors:
def _cycle_colors(colors: List[Color], num_colors: int) -> Iterator[Color]:
import matplotlib.pyplot as plt
cycle = plt.cycler(color=colors)
for _, item in zip(range(num_colors), cycle()):
yield item["color"]
This looks simple to understand, IMHO.
And it seems more reasonable that we return exactly num_colors
colors, without any extra (although extra colors are ignored anyway).
But are we good that I go ahead and change some tests?
Or maybe leave it to another PR?
with suppress(ImportError): |
---|
from pandas.plotting._matplotlib.style import get_standard_colors |
@td.skip_if_no_mpl |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could just use pytest.importorskip see pandas\tests\plotting\test_converter.py or other test modules testing optional dependencies. e.g. pytables
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did that. Please check if this is what you suggested.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for late response, i was fallen ill in the past week.
@ivanovmg looks very good to me and original logics seem to be covered and unchanged! nice job!
only two minor suggestions on the tests to make the tests slightly shorter, also okay if not change.
(7, ["b", "g", "r", "y", "b", "g", "r"]), |
---|
], |
) |
def test_default_colors_named_from_prop_cycle_string(self, num_colors, expected): |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you could combine this one with the test above, and add cycler_color
to parameterize since those two tests look almost identical and test basically the same except how colors are defined
Comment on lines +102 to +107
(1, ["r", "g", "b", "k"]), |
---|
(2, ["r", "g", "b", "k"]), |
(3, ["r", "g", "b", "k"]), |
(4, ["r", "g", "b", "k"]), |
(5, ["r", "g", "b", "k", "r"]), |
(6, ["r", "g", "b", "k", "r", "g"]), |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's nicer to just test the case where num_color </>/= color, so 3 parametrizations is enough. no need to change here though ^^
return [color] |
---|
if _is_floats_color(color): |
color = cast(Sequence[float], color) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cast seems unnecessary - is there an error this solves?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, cast is necessary.
At this point color
is of type Union[Color, Collection[Color]]
. And for some reason _is_floats_color
check does not filter out Collection[Color]
. So, instead of ignoring, I added cast here.
mypy error if removing cast.
pandas\plotting\_matplotlib\style.py:114: error: List item 0 has incompatible type "Union[Union[str, Sequence[float]], Collection[Union[str, Sequence[float]]]]"; expected "Union[str, Sequence[float]]" [list-item]
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functions can't narrow types yet in mypy, but regardless this is pretty confusing. Can you try to refactor this to make things clearer?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know how to make it even clearer.
The logic is the following.
- Start with color being maybe
a. a single string color
b. or multiple string color (like 'rgbk')
c. or single float color (0.1, 0.2, 0.3)
d. or multiple float colors - Address option a.
- Address option b.
- Address options c and d.
Is the logic confusing or the two cast statements are?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd, I updated the logic a little bit.
- Address one color (either string or sequence of floats)
- Address collection of colors
Casting is still there as there are no other options to make mypy happy.
Please let me know what you think about this implementation.
- Rename _is_single_color -> _is_single_string_color
- Extract new method _is_single_color, which checks whether the color provided is a single color, that can be either string or a sequence of floats.
- Allow integers to be a part of float sequence color, (1, 0.4, 0.2, 0.5), for example.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few minor comments, ping on green.
color_type: str, |
---|
num_colors: int, |
) -> List[Color]: |
"""Get colors from user input.""" |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a doc-string.
also the current summary is not very descriptive, nor is the function name .
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it into _derive_colors
. I am not sure if you would consider it a better alternative. I thought of having a more verbose name (like _derive_colors_from_cmap_color_and_type
, but that does not seem reasonable to me, although I added the explanation in the docstring.
Anyway derive seems a better verb, implying some logic underneath.
jreback added a commit that referenced this pull request
… (#37655)
Moving the file test_frame.py to a new directory
Сreated file test_frame_color.py
Transfer tests of test_frame.py to test_frame_color.py
PEP 8 fixes
Transfer tests
of test_frame.py to test_frame_groupby.py and test_frame_subplots.py
Removing unnecessary imports
PEP 8 fixes
Fixed class name
Transfer tests
of test_frame.py to test_frame_subplots.py
- Transfer tests
of test_frame.py to test_frame_groupby.py, test_frame_subplots.py, test_frame_color.py
Changed class names
Removed unnecessary imports
Removed import
catch FutureWarnings (#37587)
TST/REF: collect indexing tests by method (#37590)
REF: prelims for single-path setitem_with_indexer (#37588)
ENH: repr for 2D DTA/TDA (#37164)
CLN: de-duplicate _validate_where_value with _validate_setitem_value (#37595)
TST/REF: collect tests by method (#37589)
TST/REF: move remaining setitem tests from test_timeseries
TST/REF: rehome test_timezones test
move misplaced arithmetic test
collect tests by method
move misplaced file
REF: Categorical.is_dtype_equal -> categories_match_up_to_permutation (#37545)
CLN refactor non-core (#37580)
refactor core/computation (#37585)
TST/REF: share method tests between DataFrame and Series (#37596)
BUG: Index.where casting ints to str (#37591)
REF: IntervalArray comparisons (#37124)
regression fix for merging DF with datetime index with empty DF (#36897)
ERR: fix error message in Period for invalid frequency (#37602)
CLN: remove rebox_native (#37608)
TST/REF: tests.generic (#37618)
TST: collect tests by method (#37617)
TST/REF: collect test_timeseries tests by method
misplaced DataFrame.values tst
misplaced dataframe.values test
collect test by method
TST/REF: share tests across Series/DataFrame (#37616)
Gh 36562 typeerror comparison not supported between float and str (#37096)
docs: fix punctuation (#37612)
REGR: pd.to_hdf(..., dropna=True) not dropping missing rows (#37564)
parametrize set_axis tests (#37619)
CLN: clean color selection in _matplotlib/style (#37203)
DEPR: DataFrame/Series.slice_shift (#37601)
REF: re-use validate_setitem_value in Categorical.fillna (#37597)
PERF: release gil for ewma_time (#37389)
BUG: Groupy dropped nan groups from result when grouping over single column (#36842)
ENH: implement timeszones support for read_json(orient='table') and astype() from 'object' (#35973)
REF/BUG/TYP: read_csv shouldn't close user-provided file handles (#36997)
BUG/REF: read_csv shouldn't close user-provided file handles
get_handle: typing, returns is_wrapped, use dataclass, and make sure that all created handlers are returned
remove unused imports
added IOHandleArgs.close
added IOArgs.close
mostly comments
move memory_map from TextReader to CParserWrapper
moved IOArgs and IOHandles
more comments
Co-authored-by: Jeff Reback jeff@reback.net
more typing checks to pre-commit (#37539)
TST: 32bit dtype compat test_groupby_dropna (#37623)
BUG: Metadata propagation for groupby iterator (#37461)
BUG: read-only values in cython funcs (#37613)
CLN refactor core/arrays (#37581)
Fixed Metadata Propogation in DataFrame (#37381)
TYP: add Shape alias to pandas._typing (#37128)
DOC: Fix typo (#37630)
CLN: parametrize test_nat_comparisons (#37195)
dataframe dataclass docstring updated (#37632)
refactor core/groupby (#37583)
BUG: set index of DataFrame.apply(f) when f returns dict (#37544) (#37606)
BUG: to_dict should return a native datetime object for NumPy backed dataframes (#37571)
ENH: memory_map for compressed files (#37621)
DOC: add example & prose of slicing with labels when index has duplicate labels (#36814)
DOC: add example & prose of slicing with labels when index has duplicate labels #36251
DOC: proofread the sentence.
Co-authored-by: Jun Kudo jun-lab@junnoMacBook-Pro.local
- DOC: Fix typo (#37636)
"columns(s)" sounded odd, I believe it was supposed to be just "column(s)".
CI: troubleshoot win py38 builds (#37652)
TST/REF: collect indexing tests by method (#37638)
TST/REF: collect tests for get_numeric_data (#37634)
misplaced loc test
TST/REF: collect get_numeric_data tests
REF: de-duplicate _validate_insert_value with _validate_scalar (#37640)
CI: catch windows py38 OSError (#37659)
share test (#37679)
TST: match matplotlib warning message (#37666)
TST: match matplotlib warning message
TST: match full message
pd.Series.loc.getitem promotes to float64 instead of raising KeyError (#37687)
REF/TST: misplaced Categorical tests (#37678)
REF/TST: collect indexing tests by method (#37677)
CLN: only call _wrap_results one place in nanmedian (#37673)
TYP: Index._concat (#37671)
BUG: CategoricalIndex.equals casting non-categories to np.nan (#37667)
CLN: _replace_single (#37683)
REF: simplify _replace_single by noting regex kwarg is bool
Annotate
CLN: remove never-False convert kwarg
TYP: make more internal funcs keyword-only (#37688)
REF: make Series._replace_single a regular method (#37691)
REF: simplify cycling through colors (#37664)
REF: implement _wrap_reduction_result (#37660)
BUG: preserve fold in Timestamp.replace (#37644)
CLN: Clean indexing tests (#37689)
TST: fix warning for pie chart (#37669)
PERF: reverted change from commit 7d257c6 to solve issue #37081 (#37426)
DataFrameGroupby.boxplot fails when subplots=False (#28102)
ENH: Improve error reporting for wrong merge cols (#37547)
Transfer tests of test_frame.py to test_frame_color.py
PEP8
Fixes for linter
Сhange pd.DateFrame to DateFrame
Move inconsistent namespace check to pre-commit, fixup more files (#37662)
check for inconsistent namespace usage
doc
typos
verbose regex
use verbose flag
use verbose flag
match both directions
add test
don't import annotations from future
update extra couple of cases
🚚 rename
typing
don't use subprocess
don't type tests
use pathlib
REF: simplify NDFrame.replace, ObjectBlock.replace (#37704)
REF: implement Categorical.encode_with_my_categories (#37650)
REF: implement Categorical.encode_with_my_categories
privatize
BUG: unpickling modifies Block.ndim (#37657)
REF: dont support dt64tz in nanmean (#37658)
CLN: Simplify groupby head/tail tests (#37702)
Bug in loc raised for numeric label even when label is in Index (#37675)
REF: implement replace_regex, remove unreachable branch in ObjectBlock.replace (#37696)
TYP: Check untyped defs (except vendored) (#37556)
REF: remove ObjectBlock._replace_single (#37710)
Transfer tests of test_frame.py to test_frame_color.py
TST/REF: collect indexing tests by method (#37590)
PEP8
Сhange DateFrame to pd.DateFrame
Сhange pd.DateFrame to DateFrame
Removing imports
Bug fixes
Bug fixes
Fix incorrect merge
test_frame_color.py edit
Transfer tests
of test_frame.py to test_frame_color.py, test_frame_groupby.py and test_frame_subplots.py
Removing unnecessary imports
PEP8
Conflicts:
pandas/tests/plotting/frame/test_frame.py
pandas/tests/plotting/frame/test_frame_color.py
pandas/tests/plotting/frame/test_frame_subplots.py
Moving the file test_frame.py to a new directory
Transfer tests
of test_frame.py to test_frame_color.py, test_frame_groupby.py and test_frame_subplots.py
Removing unnecessary imports
PEP8
CLN: clean categorical indexes tests (#37721)
Fix merge error
PEP 8 fixes
Fix merge error
Removing unnecessary imports
PEP 8 fixes
Fixed class name
Transfer tests
of test_frame.py to test_frame_subplots.py
- Transfer tests
of test_frame.py to test_frame_groupby.py, test_frame_subplots.py, test_frame_color.py
Changed class names
Removed unnecessary imports
Removed import
TST/REF: collect indexing tests by method (#37590)
TST: match matplotlib warning message (#37666)
TST: match matplotlib warning message
TST: match full message
TST: fix warning for pie chart (#37669)
Transfer tests of test_frame.py to test_frame_color.py
PEP8
Fixes for linter
Сhange pd.DateFrame to DateFrame
Transfer tests of test_frame.py to test_frame_color.py
PEP8
Сhange DateFrame to pd.DateFrame
Сhange pd.DateFrame to DateFrame
Removing imports
Bug fixes
Bug fixes
Fix incorrect merge
test_frame_color.py edit
Fix merge error
Fix merge error
Removing unnecessary features
black fix
Co-authored-by: jbrockmendel jbrockmendel@gmail.com Co-authored-by: Marco Gorelli m.e.gorelli@gmail.com Co-authored-by: Philip Cerles philip.cerles@gmail.com Co-authored-by: Joris Van den Bossche jorisvandenbossche@gmail.com Co-authored-by: Sven sven.schellenberg@paradynsystems.com Co-authored-by: Micael Jarniac micael@jarniac.com Co-authored-by: Andrew Wieteska 48889395+arw2019@users.noreply.github.com Co-authored-by: Maxim Ivanov 41443370+ivanovmg@users.noreply.github.com Co-authored-by: Erfan Nariman 34067903+erfannariman@users.noreply.github.com Co-authored-by: Fangchen Li fangchen.li@outlook.com Co-authored-by: patrick 61934744+phofl@users.noreply.github.com Co-authored-by: attack68 24256554+attack68@users.noreply.github.com Co-authored-by: Torsten Wörtwein twoertwein@users.noreply.github.com Co-authored-by: Jeff Reback jeff@reback.net Co-authored-by: Janus janus@insignificancegalore.net Co-authored-by: Joel Whittier rootbeerfriend@gmail.com Co-authored-by: taytzehao jtth95@gmail.com Co-authored-by: ma3da 34522496+ma3da@users.noreply.github.com Co-authored-by: junk juntrp0207@gmail.com Co-authored-by: Jun Kudo jun-lab@junnoMacBook-Pro.local Co-authored-by: Alex Kirko alexander.kirko@gmail.com Co-authored-by: Yassir Karroum ukarroum17@gmail.com Co-authored-by: Kaiqi Dong kaiqi@kth.se Co-authored-by: Richard Shadrach 45562402+rhshadrach@users.noreply.github.com Co-authored-by: Simon Hawkins simonjayhawkins@gmail.com