Plotting Pitfalls#

Common plotting pitfalls that get worse with large data#

When working with large datasets, visualizations are often the only practical way to understand the properties of that dataset – it’s too easy to get fooled by statistical measures computed blindly, yet too many data points to examine each one! Thus it is very important to be aware of some common plotting problems that are minor inconveniences with small datasets but very serious problems with larger ones.

We’ll cover:

  1. Overplotting

  2. Oversaturation

  3. Undersampling

  4. Undersaturation

  5. Underutilized range

  6. Nonuniform colormapping

You can skip to the end if you just want to see an illustration of these problems.

This notebook requires HoloViews, colorcet, and matplotlib, and optionally scikit-image, which can be installed with:

conda install holoviews colorcet matplotlib scikit-image

We’ll first load the plotting libraries and set up some defaults:

import numpy as np

import holoviews as hv
from holoviews.operation.datashader import datashade
from holoviews import opts, dim

from colorcet import fire