Plotting Pitfalls

Common plotting pitfalls that get worse with large data

When working with large datasets, visualizations are often the only way available to understand the properties of that dataset -- there are simply too many data points to examine each one! Thus it is very important to be aware of some common plotting problems that are minor inconveniences with small datasets but very serious problems with larger ones.

We'll cover:

  1. Overplotting
  2. Oversaturation
  3. Undersampling
  4. Undersaturation
  5. Underutilized range
  6. Nonuniform colormapping

You can skip to the end if you just want to see an illustration of these problems.

This notebook requires HoloViews, colorcet, and matplotlib, and optionally scikit-image, which can be installed with:

conda install holoviews colorcet matplotlib scikit-image

We'll first load the plotting libraries and set up some defaults:

In [1]:
import numpy as np
np.random.seed(42)

import holoviews as hv
from holoviews.operation.datashader import datashade
from holoviews import opts, dim
hv.extension('matplotlib')

from colorcet import fire
datashade.cmap=fire[50:]