OpenSky flight trajectories

Flight path information for commercial flights is available for some regions of the USA and Europe from the crowd-sourced OpenSky Network . OpenSky collects data from a large number of users monitoring public air-traffic control information. Here we will use a subset of the data that was polled from their REST API at an interval of 1 minute over 4 days (September 5-13, 2016), using the scripts shown at the end of this notebook. Unfortunately, we are not allowed to redistribute this data (1.1GB as a database, 600MB in HDF5), but you can run the scripts at the end of this notebook to collect some yourself, or else you can contact Open Sky asking for a copy of the dataset.

We'll only use some of the fields provided by Open Sky, out of: icao24, callsign, origin, time_position, time_velocity, longitude, latitude, altitude, on_ground, velocity, heading, vertical_rate, sensors, timestamp

If you are able to get a copy of the data, you can create an environment with all the packages required to run this notebook using conda env create opensky.ipynb , and can then switch to it using source activate opensky so that you can launch jupyter notebook.

Here, we'll load the data and declare that some fields are categorical (which isn't information that HDF5 stores):

In [1]:
import pandas as pd

flightpaths = pd.read_hdf('../data/opensky.h5', 'flights')
flightpaths['origin']    = flightpaths.origin.astype('category')
flightpaths['on_ground'] = flightpaths.on_ground.astype('category')
flightpaths['ascending'] = flightpaths.ascending.astype('category')
CPU times: user 2.67 s, sys: 1.39 s, total: 4.06 s
Wall time: 4.39 s
In [2]:
longitude latitude origin on_ground ascending velocity
10227905 -8.845280e+06 4.553381e+06 0.0 True 262.14
10227906 -8.862735e+06 4.540946e+06 0.0 False 183.28
10227907 -8.876594e+06 4.530873e+06 0.0 False 258.15
10227908 -8.894316e+06 4.521176e+06 0.0 True 234.24
10227909 NaN NaN nan NaN False NaN

The default database has about 10 million points, with some metadata for each.

Now let's define a datashader-based processing pipeline to render images:

In [3]:
import datashader as ds
import datashader.transfer_functions as tf
from colorcet import fire
from matplotlib.colors import rgb2hex
from import get_cmap

import numpy as np
from cartopy import crs

plot_width  = 850
plot_height = 600
x_range = (-2.0e6, 2.5e6)
y_range = (4.1e6, 7.8e6)

def categorical_color_key(ncats,cmap):
    """Generate a color key from the given colormap with the requested number of colors"""
    mapper = get_cmap(cmap)
    return [str(rgb2hex(mapper(i))) for i in np.linspace(0, 1, ncats)]

def create_image(x_range=x_range, y_range=y_range, w=plot_width, h=plot_height, 
                 aggregator=ds.count(), categorical=None, black=False, cmap="blue"):
    if categorical and cmap:
        opts['color_key'] = categorical_color_key(len(flightpaths[aggregator.column].unique()),cmap)       

    cvs = ds.Canvas(plot_width=w, plot_height=h, x_range=x_range, y_range=y_range)
    agg = cvs.line(flightpaths, 'longitude', 'latitude',  aggregator)
    img = tf.shade(agg, cmap=cmap, **opts)
    if black: img = tf.set_background(img, 'black')
    return img

We can use this function to get a dump of all of the trajectory information:

In [4]:
create_image(aggregator=ds.count(), cmap=fire, black=True)
CPU times: user 1.29 s, sys: 86.3 ms, total: 1.37 s
Wall time: 1.37 s

This plot shows all of the trajectories in this database, overlaid in a way that avoids overplotting . With this "fire" color map, a single trajectory shows up as black, while increasing levels of overlap show up as brighter colors.

A static image on its own like this is difficult to interpret, but if we overlay it on a map we can see where these flights originate, and can zoom in to see detail in specific regions:

In [5]:
from datashader.bokeh_ext import InteractiveImage
from bokeh.plotting import figure, output_notebook
from bokeh.models.tiles import WMTSTileSource


def base_plot(tools='pan,wheel_zoom,reset',plot_width=plot_width, plot_height=plot_height,**plot_args):
    p = figure(tools=tools, plot_width=plot_width, plot_height=plot_height,
        x_range=x_range, y_range=y_range, outline_line_color=None,
        min_border=0, min_border_left=0, min_border_right=0,
        min_border_top=0, min_border_bottom=0, **plot_args)
    p.axis.visible = False
    p.xgrid.grid_line_color = None
    p.ygrid.grid_line_color = None
    return p

Loading BokehJS ...
In [6]:
p = base_plot()
InteractiveImage(p, create_image, aggregator=ds.count())