Turns even the largest data into images, accurately.

Datashader is a graphics pipeline system for creating meaningful representations of large datasets quickly and flexibly. Datashader breaks the creation of images into a series of explicit steps that allow computations to be done on intermediate representations. This approach allows accurate and effective visualizations to be produced automatically without trial-and-error parameter tuning, and also makes it simple for data scientists to focus on particular data and relationships of interest in a principled way.

The computation-intensive steps in this process are written in Python but transparently compiled to machine code using Numba and flexibly distributed across cores and processors using Dask , providing a highly optimized rendering pipeline that makes it practical to work with extremely large datasets even on standard hardware.

To make it concrete, here’s an example of what datashader code looks like:

>>> import datashader as ds
>>> import pandas as pd

>>> cvs = ds.Canvas(plot_width=400, plot_height=400)
>>> agg = cvs.points(df, 'x_col', 'y_col', ds.mean('z_col'))
>>> img = tf.shade(agg, cmap=['lightblue', 'darkblue'], how='log')


This code reads a data file into a Pandas dataframe  df  , and then projects the fields  x_col  and  y_col  onto the x and y dimensions of 400x400 grid, aggregating it by the mean value of the  z_col  of each datapoint. The results are rendered into an image where the minimum count will be plotted in  lightblue  , the maximum in  darkblue  , and ranging logarithmically in between.

And here are some sample outputs for 300 million points of data (one per person in the USA) from the 2010 census, each constructed using code like the above:

# Installation ¶

Please follow the instructions on the Github repo if you want to reproduce the specific examples on this website, or the ones at PyViz.org if you want to try out Datashader together with related plotting tools.

# Other resources ¶

You can watch a short talk about datashader on YouTube: Datashader: Revealing the Structure of Genuinely Big Data . The video Visualizing Billions of Points of Data (and its slides ) from a February 2016 one-hour talk first introducing Datashader are also available, but do not cover more recent extensions to the library.

Some of the original ideas for datashader were developed under the name Abstract Rendering, which is described in a 2014 SPIE VDA paper .