Interactivity#

The previous notebook showed all the steps required to get a Datashader rendering of your dataset, yielding raster images displayed using Jupyter’s “rich display” support. However, these bare images do not show the data ranges or axis labels, making them difficult to interpret. Moreover, they are only static images, and datasets often need to be explored at multiple scales, which is much easier to do in an interactive program.

To get axes and interactivity, the images generated by Datashader need to be embedded into a plot using an external library like Matplotlib or Bokeh. As we illustrate below, the most convenient way to make Datashader plots using these libraries is via the HoloViews high-level data-science API, using either Bokeh or Plotly. HoloViews encapsulates the Datashader pipeline in a way that lets you combine interactive datashaded plots easily with other plots without having to write explicit callbacks or event-processing code.

In this notebook, we will first look at the HoloViews API, then at Datashader’s new native Matplotlib support.

Embedding Datashader with HoloViews#

HoloViews (1.7 and later) is a high-level data analysis and visualization library that makes it simple to generate interactive Datashader-based plots. Here’s an illustration of how this all fits together when using HoloViews+Bokeh:

Datashader+Holoviews+Bokeh

HoloViews offers a data-centered approach for analysis, where the same tool can be used with small data (anything that fits in a web browser’s memory, which can be visualized with Bokeh directly), and large data (which is first sent through Datashader to make it tractable) and with several different plotting frontends. A developer willing to do more programming can do all the same things separately, using Bokeh, Matplotlib, and Datashader’s APIs directly, but with HoloViews it is much simpler to explore and analyze data. Of course, the previous notebook showed that you can also use datashader without either any plotting library at all (the light gray pathways above), but then you wouldn’t have interactivity, axes, and so on.

Most of this notebook will focus on HoloViews+Bokeh to support full interactive plots in web browsers, but HoloViews+Plotly works similarly for interactive plots, and we will also briefly illustrate the non-interactive HoloViews+Matplotlib approach, followed by a non-HoloViews Matplotlib approach at the end. Let’s start by importing some parts of HoloViews and setting some defaults:

import holoviews as hv
import holoviews.operation.datashader as hd
hd.shade.cmap=["lightblue", "darkblue"]
hv.extension("bokeh", "matplotlib") 

Next we’ll start with the same example from the previous notebook:

import pandas as pd
import numpy as np
import datashader as ds
import datashader.transfer_functions as tf

num=100000
np.random.seed(1)

dists = {cat: pd.DataFrame(dict([('x',np.random.normal(x,s,num)), 
                                 ('y',np.random.normal(y,s,num)), 
                                 ('val',val), 
                                 ('cat',cat)]))      
         for x,  y,  s,  val, cat in 
         [(  2,  2, 0.03, 10, "d1"), 
          (  2, -2, 0.10, 20, "d2"), 
          ( -2, -2, 0.50, 30, "d3"), 
          ( -2,  2, 1.00, 40, "d4"), 
          (  0,  0, 3.00, 50, "d5")] }

df = pd.concat(dists,ignore_index=True)
df["cat"]=df["cat"].astype("category")

HoloViews+Bokeh#

Rather than starting out by specifying a figure or plot, in HoloViews you specify an Element object to contain your data, such as Points for a collection of 2D x,y points. To start, let’s define a Points object wrapping around a small dataframe with 10,000 random samples from the df above:

points = hv.Points(df.sample(10000))

points

As you can see, the points object visualizes itself as a Bokeh plot, where you can already see many of the problems that motivate datashader (overplotting of points, being unable to detect the closely spaced dense collections of points shown in red above, and so on). But this visualization is just the default representation of points, using Jupyter’s rich display support; the actual points object itself is merely a data container:

points.data.head()
x y val cat
184289 2.164107 -2.038032 20 d2
8258 1.997346 1.983239 10 d1
186900 2.176841 -2.070830 20 d2
161735 1.981356 -2.084261 20 d2
149948 2.018556 -2.000011 20 d2

HoloViews+Datashader+Matplotlib#

The default visualizations in HoloViews work well for small datasets, but larger ones will have overplotting issues as are already visible above, and will eventually either overwhelm the web browser (for the Bokeh frontend) or take many minutes to plot (for the Matplotlib backend). Luckily, HoloViews provides support for using Datashader to handle both of these problems:

hv.output(backend="matplotlib")
agg = ds.Canvas().points(df,'x','y')
hd.datashade(points)  +  hd.shade(hv.Image(agg)) + hv.RGB(np.array(tf.shade(agg).to_pil()), bounds=(-10,-10,10,10))

Here we asked HoloViews to plot df using Datashader+Matplotlib, in three different ways:

  • A: HoloViews aggregates and shades an image directly from the points object using its own datashader support, then passes the image to Matplotlib to embed into an appropriate set of axes.

  • B: HoloViews accepts a pre-computed datashader aggregate, reads out the metadata about the plot ranges that is stored in the aggregate array, and passes it to Matplotlib for colormapping and then embedding.

  • C: HoloViews accepts a PIL image computed beforehand and passes it to Matplotlib for embedding, along with information about what data bounds it covers (which isn’t needed in B because the aggregate array preserves that information).

As you can see, option A is the most convenient; you can simply wrap your HoloViews element with datashade and the rest will be taken care of. But if you want to have more control by computing the aggregate or the full RGB image yourself using the API from the previous notebook you are welcome to do so while using HoloViews+Matplotlib (or HoloViews+Bokeh, below) to embed the result into labelled axes.

HoloViews+Datashader+Bokeh#

The Matplotlib interface only produces a static plot, i.e., a PNG or SVG image, but the Bokeh and Plotly interfaces of HoloViews add the dynamic zooming and panning necessary to understand datasets across scales:

hv.output(backend="bokeh")
hd.datashade(points)