Inspection Reductions#

Each Datashader canvas function call accepts an agg argument which is a Reduction that is used to aggregate values in each pixel (histogram bin) to return to the user. Each Reduction is in one of two categories:

  1. Mathematical combination of data such as the count of data points per pixel or the mean of a column of the supplied dataset.

  2. Selection of data from a column of the supplied dataset, or the index of the corresponding row in the dataset.

This notebook explains how to use selection reductions.

1. first and last selection reductions#

The simplest selection reduction is the first reduction. This returns, for each pixel in the canvas, the value of a particular column in the dataset corresponding to the first data point that maps to that pixel. This is best illustrated with an example.

Firstly create a sample dataset:

import datashader as ds
import pandas as pd

df = pd.DataFrame(dict(
    x     = [ 0,  0,  1,  1,  0,  0,  2,  2],
    y     = [ 0,  0,  0,  0,  1,  1,  1,  1],
    value = [ 9,  8,  7,  6,  2,  3,  4,  5],
    other = [11, 12, 13, 14, 15, 16, 17, 18],
))

There are 8 rows in the dataset with columns for x and y coordinates as well as a value and an other column.

Next create a Datashader canvas with a height of 2 pixels and a width of 3 pixels:

canvas = ds.Canvas(plot_height=2, plot_width=3)

Two rows of the dataset map to each canvas pixel with the exception of pixels [0, 2] and [1, 1] which do not have any rows mapped to them.

Now call canvas.line using a first reduction:

canvas.points(df, 'x', 'y', ds.first('value'))
<xarray.DataArray (y: 2, x: 3)> Size: 48B
array([[ 9.,  7., nan],
       [ 2., nan,  4.]])
Coordinates:
  * x        (x) float64 24B 0.3333 1.0 1.667
  * y        (y) float64 16B 0.25 0.75
Attributes:
    x_range:  (0.0, 2.0)
    y_range:  (0.0, 1.0)

The returned xarray.DataArray is the same shape as the canvas and contains values taken from the 'value' column corresponding to the first row that maps to each pixel. Pixels which do not have any rows mapped to them contain NaN values.

Here are the results using a last selection reduction:

canvas.points(df, 'x', 'y', ds.last('value'))
<xarray.DataArray (y: 2, x: 3)> Size: 48B
array([[ 8.,  6., nan],
       [ 3., nan,  5.]])
Coordinates:
  * x        (x) float64 24B 0.3333 1.0 1.667
  * y        (y) float64 16B 0.25 0.75
Attributes:
    x_range:  (0.0, 2.0)
    y_range:  (0.0, 1.0)

2. max and min selection reductions#

A max selection reduction returns, for each pixel in the canvas, the maximum value of the specified column of all rows that map to that pixel. For example:

canvas.points(df, 'x', 'y', ds.max('value'))
<xarray.DataArray (y: 2, x: 3)> Size: 48B
array([[ 9.,  7., nan],
       [ 3., nan,  5.]])
Coordinates:
  * x        (x) float64 24B 0.3333 1.0 1.667
  * y        (y) float64 16B 0.25 0.75
Attributes:
    x_range:  (0.0, 2.0)
    y_range:  (0.0, 1.0)

The corresponding min selection reduction is:

canvas.points(df, 'x', 'y', ds.min('value'))
<xarray.DataArray (y: 2, x: 3)> Size: 48B
array([[ 8.,  6., nan],
       [ 2., nan,  4.]])
Coordinates:
  * x        (x) float64 24B 0.3333 1.0 1.667
  * y        (y) float64 16B 0.25 0.75
Attributes:
    x_range:  (0.0, 2.0)
    y_range:  (0.0, 1.0)

3. first_n, last_n, max_n and min_n selection reductions#

These provide the same functionality as first, last, max and min reductions except that they return multiple values per pixel. For example, the max_n reduction with n=3 returns the 3 largest values, in descending order, for each pixel:

canvas.points(df, 'x', 'y', ds.max_n('value', n=3))
<xarray.DataArray (y: 2, x: 3, n: 3)> Size: 144B
array([[[ 9.,  8., nan],
        [ 7.,  6., nan],
        [nan, nan, nan]],

       [[ 3.,  2., nan],
        [nan, nan, nan],
        [ 5.,  4., nan]]])
Coordinates:
  * x        (x) float64 24B 0.3333 1.0 1.667
  * y        (y) float64 16B 0.25 0.75
  * n        (n) int64 24B 0 1 2
Attributes:
    x_range:  (0.0, 2.0)
    y_range:  (0.0, 1.0)

The returned xarray.DataArray has shape (ny, nx, n) which is (2, 3, 3) in this example. The third dimension contains the maximum n values in order for each pixel, and where there are fewer than n values available nan is used instead as usual.

4. where selection reductions#

A where reduction takes two arguments, a selector reduction and a lookup_column name. The selector reduction, such as a first or max, selects which row of the dataset to return information about for each pixel. But the information returned is that from the lookup_column rather than the column used by the selector.

Again this is best illustrated by an example:

canvas.points(df, 'x', 'y', ds.where(ds.max('value'), 'other'))
<xarray.DataArray (y: 2, x: 3)> Size: 48B
array([[11., 13., nan],
       [16., nan, 18.]])
Coordinates:
  * x        (x) float64 24B 0.3333 1.0 1.667
  * y        (y) float64 16B 0.25 0.75
Attributes:
    x_range:  (0.0, 2.0)
    y_range:  (0.0, 1.0)

This returns, for each pixel, the value of the 'other' column corresponding to the maximum of the 'value' column of the data points that map to that pixel.

Although it is possible to use a first or last as a selector with a lookup_column, such as

ds.where(ds.first('value'), 'other')

this is unnecessary as it is identical to the simpler

ds.where(ds.first('other'))

5. where selection reductions returning a row index#

The lookup_column argument to where is optional. If not specified, where defaults to returning the index of the row in the dataset corresponding to the selector for each pixel.

canvas.points(df, 'x', 'y', ds.where(ds.max('value')))
<xarray.DataArray (y: 2, x: 3)> Size: 48B
array([[ 0,  2, -1],
       [ 5, -1,  7]])
Coordinates:
  * x        (x) float64 24B 0.3333 1.0 1.667
  * y        (y) float64 16B 0.25 0.75
Attributes:
    x_range:  (0.0, 2.0)
    y_range:  (0.0, 1.0)

There are 8 rows in the dataframe so row indices returned are in the range 0 to 7. An index of -1 is returned for pixels that do not have any data points mapped to them.

first and last can be used as where reduction selectors that return row indexes, for example:

canvas.points(df, 'x', 'y', ds.where(ds.first('value')))
<xarray.DataArray (y: 2, x: 3)> Size: 48B
array([[ 0,  2, -1],
       [ 4, -1,  6]])
Coordinates:
  * x        (x) float64 24B 0.3333 1.0 1.667
  * y        (y) float64 16B 0.25 0.75
Attributes:
    x_range:  (0.0, 2.0)
    y_range:  (0.0, 1.0)

where reductions can also use a selector that is a first_n, last_n, max_n or min_n reduction, for example:

canvas.points(df, 'x', 'y', ds.where(ds.first_n('value', 3)))
<xarray.DataArray (y: 2, x: 3, n: 3)> Size: 144B
array([[[ 0,  1, -1],
        [ 2,  3, -1],
        [-1, -1, -1]],

       [[ 4,  5, -1],
        [-1, -1, -1],
        [ 6,  7, -1]]])
Coordinates:
  * x        (x) float64 24B 0.3333 1.0 1.667
  * y        (y) float64 16B 0.25 0.75
  * n        (n) int64 24B 0 1 2
Attributes:
    x_range:  (0.0, 2.0)
    y_range:  (0.0, 1.0)
This web page was generated from a Jupyter notebook and not all interactivity will work on this website. Right click to download and run locally for full Python-backed interactivity.

Right click to download this notebook from GitHub.