Meaning of color argument in DataFrame.plot.scatter() · Issue #16485 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
df = pd.DataFrame.from_records([[1,1,'g'],[2,2,'r']], columns=['x','y','Color']) df.plot.scatter(x='x',y='y',c='Color')
Problem description
The issue here is that it is not clear what the values in the column corresponding to the argument c
of scatter
should be. The example given at http://pandas.pydata.org/pandas-docs/stable/visualization.html#scatter-plot uses numerical values, but in this example, I just want red and green dots. With matplotlib, you can supply the colors as a vector.
IMHO, the API should be consistent. You should be able to specify the column names corresponding to the value of x, y, and the color. This would be especially useful if you have a pattern such as:
df[df.x>=1].plot.scatter(x='x', y='y', c='Color')
where you produce a scatter plot of selected rows.
The code in the simple example generates an error:
KeyError Traceback (most recent call last)
C:\Anaconda3\envs\py36\lib\site-packages\matplotlib\colors.py in to_rgba(c, alpha)
140 try:
--> 141 rgba = _colors_full_map.cache[c, alpha]
142 except (KeyError, TypeError): # Not in cache, or unhashable.
KeyError: ('o', None)
Expected Output
A plot with 2 points, one red and one green.
Output of pd.show_versions()
INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None
pandas: 0.20.1
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.12.1
scipy: None
xarray: None
IPython: 6.0.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.0
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None