FAQ

When should I use Datashader?

Datashader is designed for working with large datasets, for cases where it is most crucial to faithfully represent the distribution of your data. Datashader can work easily with extremely large datasets, generating a fixed-size data structure (regardless of the original number of records) that gets transferred to your local browser for display. If you ever find yourself subsampling your data just so that you can plot it feasibly, or if you are forced for practical reasons to iterate over chunks of it rather than looking at all of it at once, then Datashader can probably help you.

When should I not use Datashader?

If you have a very small number of data points (in the hundreds or thousands) or curves (in the tens or several tens, each with hundreds or thousands of points), then conventional plotting packages like Bokeh may be more suitable. With conventional browser-based packages, all of the data points are passed directly to the browser for display, allowing specific interaction with each curve or point, including display of metadata, linking to sources, etc. This approach offers the most flexibility per point or per curve, but rapidly runs into limitations on how much data can be processed by the browser, and how much can be displayed on screen and resolved by the human visual system. If you are not having such problems, i.e., your data is easily handled by your plotting infrastructure and you can easily see and work with all your data onscreen already, then you probably don't need Datashader.

Is Datashader part of Bokeh or HoloViews?

Datashader is an independent project, focusing on generating aggregate arrays and representations of them as images. Bokeh and HoloViews are complementary projects, focusing on building browser-based visualizations and dashboards. These and other plotting packages can display images rendered by Datashader, providing axes, interactive zooming and panning, selection, legends, hover information, and so on. Sample Bokeh and HoloViews plotting code is provided with Datashader and Plotly also provides support, while viewers for Matplotlib and other plotting tools are under development. The library can also be used separately, without any external plotting packages, generating images that can be displayed directly or saved to disk, or generating aggregate arrays suitable for further analysis.

What's the easiest way to use Datashader interactively?

HoloViews. HoloViews uses Bokeh or Plotly behind the scenes, but it offers a higher level API that is well suited to the sorts of magic that allow interactive use of Datashader. For a given dataset, HoloViews can easily construct either a raw Bokeh/Plotly plot or a Bokeh/Plotly plot with server-side rendering from Datashader, hiding nearly all of the complexity involved.

How can I get legends and colorbars for my Datashader plot?

When used as a standalone library, Datashader can only generate images or bare arrays; it does not have any concept of axes, legends, or colorbars. But Datashader is designed to work well as a rendering engine for other plotting libraries that do offer those features. For the specific case of colorbars, just ensure that you are letting the separate plotting library do the colormapping, not Datashader, and you should be able to get full support for colorbars. That is, use Datashader to aggregate the image into a fixed array of values, and then use Bokeh, Plotly, or Holoviews to colormap and render the array as pixels, which will allow the plotting library to construct a suitable colormap. For instance, to get a colorbar in HoloViews, use rasterize() to invoke datashader on the data and generate an array of values; do not use datashade(), which would rasterize() while also calling shade() to generate RGB pixel values, at which point the plotting library would not be able to report the mapping from value to color, as it only would have seen the final RGB values. Of course, if you let the plotting library do the colormapping, you will no longer be able to use Datashader-specific features like histogram equalization, which would then need to be implemented by the plotting library if you want to have colorbars for such cases.

What data libraries can I use with Datashader?

Datashader accepts various types of DataFrame:

  • Pandas: for every glyph type (points, lines, areas, trimesh, raster)
  • Dask: for most glyph types (points, lines, areas), using distributed, multi-core, and/or out of core computation
  • cuDF: for points, lines, or areas, with computation on single NVIDIA GPUs
  • Dask-cuDF: for points, lines, or areas, with computation on multiple NVIDIA GPUs

Right click to download this notebook from GitHub.