Bar charts (original) (raw)

In addition to plotting numerical data on continuous ranges, you can also use Bokeh to plot categorical data on categorical ranges.

Basic categorical ranges are represented in Bokeh as sequences of strings. For example, a list of the four seasons:

seasons = ["Winter", "Spring", "Summer", "Fall"]

Bokeh can also handle hierarchical categories. For example, you can use nested sequences of strings to represent the individual months within each yearly quarter:

months_by_quarter = [ ("Q1", "Jan"), ("Q1", "Feb"), ("Q1", "Mar"), ("Q2", "Apr"), ("Q2", "May"), ("Q2", "Jun"), ("Q3", "Jul"), ("Q3", "Aug"), ("Q3", "Sep"), ("Q4", "Oct"), ("Q4", "Nov"), ("Q4", "Dec"), ]

Depending on the structure of your data, you can use different kinds of charts: bar charts, categorical heatmaps, jitter plots, and others. This chapter will present several kinds of common plot types for categorical data.

Bars#

One of the most common ways to handle categorical data is to present it in a bar chart. Bar charts have one categorical axis and one continuous axis. Bar charts are useful when there is one value to plot for each category.

The values associated with each category are represented by drawing a bar for that category. The length of this bar along the continuous axis corresponds to the value for that category.

Bar charts may also be stacked or grouped together according to hierarchical sub-categories. This section will demonstrate how to draw a variety of different categorical bar charts.

Basic#

To create a basic bar chart, use the hbar() (horizontal bars) or vbar()(vertical bars) glyph methods. The example below shows a sequence of simple 1-level categories.

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']

To assign these categories to the x-axis, pass this list as thex_range argument to figure().

p = figure(x_range=fruits, ... )

Doing so is a convenient shorthand for creating aFactorRange object. The equivalent explicit notation is:

p = figure(x_range=FactorRange(factors=fruits), ... )

This form is useful when you want to customize theFactorRange, for example, by changing the range or category padding.

Next, call vbar() with the list of fruit names as the x coordinate and the bar height as the topcoordinate. You can also specify width or other optional properties.

p.vbar(x=fruits, top=[5, 3, 4, 2, 4, 6], width=0.9)

Combining the above produces the following output:

from bokeh.plotting import figure, show

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'] counts = [5, 3, 4, 2, 4, 6]

p = figure(x_range=fruits, height=350, title="Fruit Counts", toolbar_location=None, tools="")

p.vbar(x=fruits, top=counts, width=0.9)

p.xgrid.grid_line_color = None p.y_range.start = 0

show(p)

You can also assign the data to a ColumnDataSourceand supply it as the source parameter to vbar()instead of passing the data directly as parameters. You will see this in later examples.

Sorting#

To order the bars of a given plot, sort the categories by value.

The example below sorts the fruit categories in ascending order based on counts and rearranges the bars accordingly.

from bokeh.plotting import figure, show

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'] counts = [5, 3, 4, 2, 4, 6]

sorting the bars means sorting the range factors

sorted_fruits = sorted(fruits, key=lambda x: counts[fruits.index(x)])

p = figure(x_range=sorted_fruits, height=350, title="Fruit Counts", toolbar_location=None, tools="")

p.vbar(x=fruits, top=counts, width=0.9)

p.xgrid.grid_line_color = None p.y_range.start = 0

show(p)

Filling#

Colors#

You can color the bars in several ways:

Supply all the colors along with the rest of the data to a ColumnDataSource and assign the name of the color column to the color argument of vbar().
from bokeh.models import ColumnDataSource
from bokeh.palettes import Bright6
from bokeh.plotting import figure, show
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
counts = [5, 3, 4, 2, 4, 6]
source = ColumnDataSource(data=dict(fruits=fruits, counts=counts, color=Bright6))
p = figure(x_range=fruits, y_range=(0,9), height=350, title="Fruit Counts",
toolbar_location=None, tools="")

p.vbar(x='fruits', top='counts', width=0.9, color='color', legend_field="fruits", source=source)
p.xgrid.grid_line_color = None
p.legend.orientation = "horizontal"
p.legend.location = "top_center"
show(p)
You can also use the color column with the line_color andfill_color arguments to change outline and fill colors, respectively.

Use the CategoricalColorMapper model to map bar colors in a browser. You can do this with the factor_cmap() function.
factor_cmap('fruits', palette=Spectral6, factors=fruits)
You can then pass the result of this function to the color argument ofvbar() to achieve the same result:
from bokeh.models import ColumnDataSource
from bokeh.palettes import Bright6
from bokeh.plotting import figure, show
from bokeh.transform import factor_cmap
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
counts = [5, 3, 4, 2, 4, 6]
source = ColumnDataSource(data=dict(fruits=fruits, counts=counts))
p = figure(x_range=fruits, height=350, toolbar_location=None, title="Fruit Counts")
p.vbar(x='fruits', top='counts', width=0.9, source=source, legend_field="fruits",
line_color='white', fill_color=factor_cmap('fruits', palette=Bright6, factors=fruits))

p.xgrid.grid_line_color = None
p.y_range.start = 0
p.y_range.end = 9
p.legend.orientation = "horizontal"
p.legend.location = "top_center"
show(p)
See Client-side color mapping for more information on using Bokeh’s color mappers.

Stacking#

To stack vertical bars, use the vbar_stack()function. The example below uses three sets of fruit data. Each set corresponds to a year. This example produces a bar chart for each set and stacks each fruit’s bar elements on top of each other.

from bokeh.palettes import HighContrast3 from bokeh.plotting import figure, show

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'] years = ["2015", "2016", "2017"]

data = {'fruits' : fruits, '2015' : [2, 1, 4, 3, 2, 4], '2016' : [5, 3, 4, 2, 4, 6], '2017' : [3, 2, 4, 4, 5, 3]}

p = figure(x_range=fruits, height=250, title="Fruit Counts by Year", toolbar_location=None, tools="hover", tooltips="$name @fruits: @$name")

p.vbar_stack(years, x='fruits', width=0.9, color=HighContrast3, source=data, legend_label=years)

p.y_range.start = 0 p.x_range.range_padding = 0.1 p.xgrid.grid_line_color = None p.axis.minor_tick_line_color = None p.outline_line_color = None p.legend.location = "top_left" p.legend.orientation = "horizontal"

show(p)

You can also stack bars that represent positive and negative values:

from bokeh.models import ColumnDataSource from bokeh.palettes import GnBu3, OrRd3 from bokeh.plotting import figure, show

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'] years = ["2015", "2016", "2017"]

exports = {'fruits' : fruits, '2015' : [2, 1, 4, 3, 2, 4], '2016' : [5, 3, 4, 2, 4, 6], '2017' : [3, 2, 4, 4, 5, 3]} imports = {'fruits' : fruits, '2015' : [-1, 0, -1, -3, -2, -1], '2016' : [-2, -1, -3, -1, -2, -2], '2017' : [-1, -2, -1, 0, -2, -2]}

p = figure(y_range=fruits, height=350, x_range=(-16, 16), title="Fruit import/export, by year", toolbar_location=None)

p.hbar_stack(years, y='fruits', height=0.9, color=GnBu3, source=ColumnDataSource(exports), legend_label=[f"{year} exports" for year in years])

p.hbar_stack(years, y='fruits', height=0.9, color=OrRd3, source=ColumnDataSource(imports), legend_label=[f"{year} imports" for year in years])

p.y_range.range_padding = 0.1 p.ygrid.grid_line_color = None p.legend.location = "top_left" p.axis.minor_tick_line_color = None p.outline_line_color = None

show(p)

Tooltips#

Bokeh automatically sets the name property of each layer to its name in the data set. You can use the $name variable to display the names on tooltips. You can also use the @$nametooltip variable to retrieve values for each item in a layer from the data set.

The example below demonstrates both behaviors:

from bokeh.palettes import HighContrast3 from bokeh.plotting import figure, show

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'] years = ["2015", "2016", "2017"]

data = {'fruits' : fruits, '2015' : [2, 1, 4, 3, 2, 4], '2016' : [5, 3, 4, 2, 4, 6], '2017' : [3, 2, 4, 4, 5, 3]}

p = figure(x_range=fruits, height=250, title="Fruit counts by year", toolbar_location=None, tools="hover", tooltips="$name @fruits: @$name")

p.vbar_stack(years, x='fruits', width=0.9, color=HighContrast3, source=data, legend_label=years)

show(p)

You can override the value of name by passing it manually to the vbar_stack or hbar_stack function. In this case,$@name will correspond to the names you provide.

The hbar_stack and vbar_stack functions return a list of all the renderers (one per bar stack). You can use this list to customize the tooltips for each layer.

renderers = p.vbar_stack(years, x='fruits', width=0.9, color=colors, source=source, legend=[value(x) for x in years], name=years)

for r in renderers: year = r.name hover = HoverTool(tooltips=[ ("%s total" % year, "@%s" % year), ("index", "$index") ], renderers=[r]) p.add_tools(hover)

Grouping#

Instead of stacking, you also have the option to group the bars. Depending on your use case, you can achieve this in two ways:

With nested categories
With visual offsets

Nested categories#

If you provide several subsets of data, Bokeh automatically groups the bars into labeled categories, tags each bar with the name of the subset it represents, and adds a separator between the categories.

The example below creates a sequence of fruit-year pairs (tuples) and groups the bars by fruit name with a single call to vbar().

from bokeh.models import ColumnDataSource, FactorRange from bokeh.plotting import figure, show

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'] years = ['2015', '2016', '2017']

data = {'fruits' : fruits, '2015' : [2, 1, 4, 3, 2, 4], '2016' : [5, 3, 3, 2, 4, 6], '2017' : [3, 2, 4, 4, 5, 3]}

this creates [ ("Apples", "2015"), ("Apples", "2016"), ("Apples", "2017"), ("Pears", "2015), ... ]

x = [ (fruit, year) for fruit in fruits for year in years ] counts = sum(zip(data['2015'], data['2016'], data['2017']), ()) # like an hstack

source = ColumnDataSource(data=dict(x=x, counts=counts))

p = figure(x_range=FactorRange(*x), height=350, title="Fruit Counts by Year", toolbar_location=None, tools="")

p.vbar(x='x', top='counts', width=0.9, source=source)

p.y_range.start = 0 p.x_range.range_padding = 0.1 p.xaxis.major_label_orientation = 1 p.xgrid.grid_line_color = None

show(p)

To apply different colors to the bars, use factor_cmap() forfill_color in the vbar() function call as follows:

p.vbar(x='x', top='counts', width=0.9, source=source, line_color="white",

   # use the palette to colormap based on the x[1:2] values
   fill_color=factor_cmap('x', palette=palette, factors=years, start=1, end=2))

The start=1 and end=2 in the call to factor_cmap() use the year in the (fruit, year) pair for color mapping.

Visual offset#

Take a scenario with separate sequences of (fruit, year) pairs instead of a single data table. You can plot the sequences with separate calls to vbar(). However, since every bar in each group belongs to the same fruit category, the bars will overlap. To avoid this behavior, use the dodge() function to provide an offset for each call to vbar().

from bokeh.models import ColumnDataSource from bokeh.plotting import figure, show from bokeh.transform import dodge

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries'] years = ['2015', '2016', '2017']

data = {'fruits' : fruits, '2015' : [2, 1, 4, 3, 2, 4], '2016' : [5, 3, 3, 2, 4, 6], '2017' : [3, 2, 4, 4, 5, 3]}

source = ColumnDataSource(data=data)

p = figure(x_range=fruits, y_range=(0, 10), title="Fruit Counts by Year", height=350, toolbar_location=None, tools="")

p.vbar(x=dodge('fruits', -0.25, range=p.x_range), top='2015', source=source, width=0.2, color="#c9d9d3", legend_label="2015")

p.vbar(x=dodge('fruits', 0.0, range=p.x_range), top='2016', source=source, width=0.2, color="#718dbf", legend_label="2016")

p.vbar(x=dodge('fruits', 0.25, range=p.x_range), top='2017', source=source, width=0.2, color="#e84d60", legend_label="2017")

p.x_range.range_padding = 0.1 p.xgrid.grid_line_color = None p.legend.location = "top_left" p.legend.orientation = "horizontal"

show(p)

Stacking and grouping#

You can also combine the above techniques to create plots of stacked and grouped bars. Here is an example that groups bars by quarter and stacks them by region:

from bokeh.models import ColumnDataSource, FactorRange from bokeh.plotting import figure, show

factors = [ ("Q1", "jan"), ("Q1", "feb"), ("Q1", "mar"), ("Q2", "apr"), ("Q2", "may"), ("Q2", "jun"), ("Q3", "jul"), ("Q3", "aug"), ("Q3", "sep"), ("Q4", "oct"), ("Q4", "nov"), ("Q4", "dec"),

]

regions = ['east', 'west']

source = ColumnDataSource(data=dict( x=factors, east=[ 5, 5, 6, 5, 5, 4, 5, 6, 7, 8, 6, 9 ], west=[ 5, 7, 9, 4, 5, 4, 7, 7, 7, 6, 6, 7 ], ))

p = figure(x_range=FactorRange(*factors), height=250, toolbar_location=None, tools="")

p.vbar_stack(regions, x='x', width=0.9, alpha=0.5, color=["blue", "red"], source=source, legend_label=regions)

p.y_range.start = 0 p.y_range.end = 18 p.x_range.range_padding = 0.1 p.xaxis.major_label_orientation = 1 p.xgrid.grid_line_color = None p.legend.location = "top_center" p.legend.orientation = "horizontal"

show(p)

Mixed factors#

You can use any level in a multi-level data structure to position glyphs.

The example below groups bars for each month into financial quarters and adds a quarterly average line at the group center coordinates from Q1to Q4.

from bokeh.models import FactorRange from bokeh.palettes import TolPRGn4 from bokeh.plotting import figure, show

quarters =("Q1", "Q2", "Q3", "Q4")

months = ( ("Q1", "jan"), ("Q1", "feb"), ("Q1", "mar"), ("Q2", "apr"), ("Q2", "may"), ("Q2", "jun"), ("Q3", "jul"), ("Q3", "aug"), ("Q3", "sep"), ("Q4", "oct"), ("Q4", "nov"), ("Q4", "dec"), )

fill_color, line_color = TolPRGn4[2:]

p = figure(x_range=FactorRange(*months), height=500, tools="", background_fill_color="#fafafa", toolbar_location=None)

monthly = [10, 13, 16, 9, 10, 8, 12, 13, 14, 14, 12, 16] p.vbar(x=months, top=monthly, width=0.8, fill_color=fill_color, fill_alpha=0.8, line_color=line_color, line_width=1.2)

quarterly = [13, 9, 13, 14] p.line(x=quarters, y=quarterly, color=line_color, line_width=3) p.scatter(x=quarters, y=quarterly, size=10, line_color=line_color, fill_color="white", line_width=3)

p.y_range.start = 0 p.x_range.range_padding = 0.1 p.xaxis.major_label_orientation = 1 p.xgrid.grid_line_color = None

show(p)

Using pandas#

pandas is a powerful and popular tool for analyzing tabular and time series data in Python. While not necessary, it can make working with Bokeh easier.

For example, you can use the GroupBy objects offered by pandas to initialize a ColumnDataSource and automatically create columns for many statistical parameters, such as group mean and count. You can also pass theseGroupBy objects as a range argument to figure.

from bokeh.palettes import Spectral5 from bokeh.plotting import figure, show from bokeh.sampledata.autompg import autompg as df from bokeh.transform import factor_cmap

df.cyl = df.cyl.astype(str) group = df.groupby('cyl')

cyl_cmap = factor_cmap('cyl', palette=Spectral5, factors=sorted(df.cyl.unique()))

p = figure(height=350, x_range=group, title="MPG by # Cylinders", toolbar_location=None, tools="")

p.vbar(x='cyl', top='mpg_mean', width=1, source=group, line_color=cyl_cmap, fill_color=cyl_cmap)

p.y_range.start = 0 p.xgrid.grid_line_color = None p.xaxis.axis_label = "some stuff" p.xaxis.major_label_orientation = 1.2 p.outline_line_color = None

show(p)

The example above groups data by the column 'cyl', which is why theColumnDataSource includes this column. It also adds associated columns to non-grouped categories such as 'mpg', providing, for instance, a mean number of miles per gallon in the 'mpg_mean' column.

This also works with multi-level groups. The example below groups the same data by ('cyl', 'mfr') and displays it in nested categories distributed along the x-axis. Here, the index column name 'cyl_mfr' is made by joining the names of the grouped columns.

from bokeh.palettes import MediumContrast5 from bokeh.plotting import figure, show from bokeh.sampledata.autompg import autompg_clean as df from bokeh.transform import factor_cmap

df.cyl = df.cyl.astype(str) df.yr = df.yr.astype(str)

group = df.groupby(['cyl', 'mfr'])

index_cmap = factor_cmap('cyl_mfr', palette=MediumContrast5, factors=sorted(df.cyl.unique()), end=1)

p = figure(width=800, height=300, title="Mean MPG by # Cylinders and Manufacturer", x_range=group, toolbar_location=None, tooltips=[("MPG", "@mpg_mean"), ("Cyl, Mfr", "@cyl_mfr")])

p.vbar(x='cyl_mfr', top='mpg_mean', width=1, source=group, line_color="white", fill_color=index_cmap )

p.y_range.start = 0 p.x_range.range_padding = 0.05 p.xgrid.grid_line_color = None p.xaxis.axis_label = "Manufacturer grouped by # Cylinders" p.xaxis.major_label_orientation = 1.2 p.outline_line_color = None

show(p)

Intervals#

You can use bars for more than just bar charts with a common baseline. In case each category has both a starting and ending value associated, you can also use bars to represent intervals across a range for each category.

The example below supplies the hbar() function with both left andright properties to show the spread in times between gold and bronze medalists in Olympic sprinting over many years.

from bokeh.models import ColumnDataSource from bokeh.plotting import figure, show from bokeh.sampledata.sprint import sprint

df = sprint.copy() # since we are modifying sampledata

df.Year = df.Year.astype(str) group = df.groupby('Year') source = ColumnDataSource(group)

p = figure(y_range=group, x_range=(9.5,12.7), width=400, height=550, toolbar_location=None, title="Time Spreads for Sprint Medalists (by Year)") p.hbar(y="Year", left='Time_min', right='Time_max', height=0.4, source=source)

p.ygrid.grid_line_color = None p.xaxis.axis_label = "Time (seconds)" p.outline_line_color = None

show(p)