Common plotting pitfalls that get worse with large data#
When working with large datasets, visualizations are often the only practical way to understand the properties of that dataset – it’s too easy to get fooled by statistical measures computed blindly, yet too many data points to examine each one! Thus it is very important to be aware of some common plotting problems that are minor inconveniences with small datasets but very serious problems with larger ones.
You can skip to the end if you just want to see an illustration of these problems.
This notebook requires HoloViews, colorcet, and matplotlib, and optionally scikit-image, which can be installed with:
conda install holoviews colorcet matplotlib scikit-image
We’ll first load the plotting libraries and set up some defaults:
import numpy as np np.random.seed(42) import holoviews as hv from holoviews.operation.datashader import datashade from holoviews import opts, dim hv.extension('matplotlib') from colorcet import fire datashade.cmap=fire[50:]