Plotting Pitfalls#
Common plotting pitfalls that get worse with large data#
When working with large datasets, visualizations are often the only practical way to understand the properties of that dataset – it’s too easy to get fooled by statistical measures computed blindly, yet too many data points to examine each one! Thus it is very important to be aware of some common plotting problems that are minor inconveniences with small datasets but very serious problems with larger ones.
We’ll cover:
Overplotting
Oversaturation
Undersampling
Undersaturation
Underutilized range
Nonuniform colormapping
You can skip to the end if you just want to see an illustration of these problems.
This notebook requires HoloViews, colorcet, and matplotlib, and optionally scikit-image, which can be installed with:
conda install holoviews colorcet matplotlib scikit-image
We’ll first load the plotting libraries and set up some defaults:
import numpy as np
np.random.seed(42)
import holoviews as hv
from holoviews.operation.datashader import datashade
from holoviews import opts, dim
hv.extension('matplotlib')
from colorcet import fire
datashade.cmap=fire[50:]