TLViz + PlotLy for interactive visualisations

In this example, we’ll see how TLViz can be used together with PlotLy Express to produce rich, interactive visualisations of your extracted components with just a few lines of code.

Imports and utilities

import plotly.express as px
from tensorly.decomposition import non_negative_parafac_hals

import tlviz

To fit CP models, we need to solve a non-convex optimization problem, possibly with local minima. It is therefore beneficial to fit several models with the same number of components using many different random initialisations.

def fit_many_nn_parafac(X, num_components, num_inits=5):
    return [non_negative_parafac_hals(X, num_components, n_iter_max=500, init="random",) for i in range(num_inits)]

Loading the data

bike_data = tlviz.data.load_oslo_city_bike()
bike_data
<xarray.DataArray 'Bike trips' (End station name: 270, Year: 2, Month: 12,
                                Day of week: 7, Hour: 24)>
array([[[[[ 0.,  0.,  0., ...,  0.,  0.,  0.],
          [ 0.,  0.,  0., ...,  0.,  0.,  0.],
          [ 0.,  0.,  0., ...,  0.,  0.,  0.],
          ...,
          [ 0.,  0.,  0., ...,  0.,  0.,  0.],
          [ 0.,  0.,  0., ...,  0.,  0.,  0.],
          [ 0.,  0.,  0., ...,  0.,  0.,  0.]],

         [[ 0.,  0.,  0., ...,  0.,  0.,  0.],
          [ 0.,  0.,  0., ...,  0.,  0.,  0.],
          [ 0.,  0.,  0., ...,  0.,  0.,  0.],
          ...,
          [ 0.,  0.,  0., ...,  0.,  0.,  0.],
          [ 0.,  0.,  0., ...,  0.,  0.,  0.],
          [ 0.,  0.,  0., ...,  0.,  0.,  0.]],

         [[ 0.,  0.,  0., ...,  0.,  0.,  0.],
          [ 0.,  0.,  0., ...,  1.,  0.,  0.],
          [ 0.,  0.,  0., ...,  1.,  0.,  1.],
          ...,
...
          ...,
          [ 0.,  0.,  0., ...,  1.,  0.,  1.],
          [ 4.,  0.,  0., ...,  0.,  0.,  1.],
          [ 0.,  0.,  0., ...,  1.,  0.,  1.]],

         [[ 0.,  0.,  0., ...,  0.,  0.,  0.],
          [ 0.,  0.,  0., ...,  2.,  0.,  1.],
          [ 0.,  0.,  0., ...,  3.,  0.,  0.],
          ...,
          [ 0.,  0.,  0., ...,  0.,  0.,  0.],
          [ 1.,  0.,  0., ...,  1.,  1.,  0.],
          [ 0.,  0.,  0., ...,  0.,  0.,  0.]],

         [[ 0.,  0.,  0., ...,  0.,  0.,  0.],
          [ 0.,  0.,  0., ...,  0.,  0.,  0.],
          [ 0.,  0.,  0., ...,  0.,  0.,  1.],
          ...,
          [ 1.,  0.,  0., ...,  0.,  0.,  0.],
          [ 0.,  0.,  0., ...,  0.,  0.,  0.],
          [ 0.,  0.,  0., ...,  0.,  0.,  1.]]]]])
Coordinates:
  * End station name  (End station name) object '7 Juni Plassen' ... 'Økernve...
    lat               (End station name) float64 59.92 59.93 ... 59.93 59.92
    lon               (End station name) float64 10.73 10.75 ... 10.8 10.78
  * Hour              (Hour) int32 0 1 2 3 4 5 6 7 8 ... 16 17 18 19 20 21 22 23
  * Month             (Month) int32 1 2 3 4 5 6 7 8 9 10 11 12
  * Day of week       (Day of week) int32 0 1 2 3 4 5 6
  * Year              (Year) int32 2020 2021


We see that there are two metadata columns in the End station name axis: lat and lon. These contain the coordinates for each bike station.

Fitting the model

We fit five three-component PARAFAC2 model candidates to this data and select the one with the lowest error (to see why a three-component model is a good choice, see the split-half analysis example).

model_candidates = fit_many_nn_parafac(bike_data.data, 3, num_inits=5)
selected_cp = tlviz.multimodel_evaluation.get_model_with_lowest_error(model_candidates, bike_data)

Postprocessing

If we just postprocess, then we get a labelled CP tensor. That is, the factor matrices are transformed into Pandas DataFrames with the same index as the main coordinates of the xarray DataArray. However, in this case, we also want to include the additional coordinates (i.e. the latitude and longitude) from this DataArray as metadata-columns. To include these as well, we set the include_metadata flag in postprocess to True.

cp_with_metadata = tlviz.postprocessing.postprocess(selected_cp, bike_data, include_metadata=True)

weights, (end_station, year, month, day_of_week, hour) = cp_with_metadata

Converting the data to a tidy format

PlotLy assumes that the data is tidy, but factor matrices are not. We have one column per factor matrix, which makes it cumbersome to use with plotly. We therefore have a factor_matrix_to_tidy-function in tlviz which simply converts a factor matrix (with potential metadata) into a tidy format. See tlviz.postprocessing.factor_matrix_to_tidy for more info.

tidy_end_station_data = tlviz.postprocessing.factor_matrix_to_tidy(end_station, value_name="Popularity")

Density map of the end station components

Now, we use the density_mapbox function in PlotLy Express to create a density map of the End station components. We can use the animation frame to get a nice slider for the different components.

px.density_mapbox(
    tlviz.postprocessing.factor_matrix_to_tidy(end_station, value_name="Popularity"),
    lat="lat",
    lon="lon",
    z="Popularity",
    animation_frame="Component",
    hover_data=["End station name"],
    zoom=10.5,
    opacity=0.5,
    mapbox_style="carto-positron",
    width=600,
)


We see that there are three distinct patterns. Component 0 is spread across most of the city, while component 1 is focused on central areas. Component 2 is also spread widely, but has a strong signal at leisure-places like the beach at Bygdøy.

Plotting the various time-components

Time of day

First, we look at the time-of-day components to see if these reveal what kind of patterns the different components represent.

px.line(
    tlviz.postprocessing.factor_matrix_to_tidy(hour, value_name="Popularity"),
    x="Hour",
    y="Popularity",
    color="Component",
    width=600,
)


We see that the first component shows a strong signal after 16:00, which is when a normal working day in Norway ends. Likewise, the second component shows the strongest signal at 08:00, which is when a normal working day in Norway starts. This indicates that the first two components represent getting to and from work. The third component shows activity during midday, which combined with the map-plot above indicates that it represents leisurely activities.

Weekday

Next, we look at the weekday components.

tidy_day_of_week = tlviz.postprocessing.factor_matrix_to_tidy(day_of_week, value_name="Popularity")

px.line(
    tidy_day_of_week, x="Day of week", y="Popularity", color="Component", width=600
)


Here, we see that the first two components are the most active on weekdays, which is when people mostly work. The morning component barely shows any signal in the weekends at all and the leisure component has the strongest signal on Saturdays.

Month plots

tidy_month = tlviz.postprocessing.factor_matrix_to_tidy(month, value_name="Popularity")

px.line(
    tidy_month, x="Month", y="Popularity", color="Component", width=600,
)


The month-mode components show reasonable patterns. People bike the most during summer, but the work-components get a dip during July, when most people are on holidays

Year

px.line(
    tlviz.postprocessing.factor_matrix_to_tidy(year, value_name="Popularity"),
    x="Year",
    y="Popularity",
    color="Component",
    width=600,
)


We see that there is overall less biking in 2021 than in 2020. Perhaps people biked more early in the pandemic when most public transport was closed?

Important about including factor metadata

By including factor metadata, we add additional columns to the factor matrices. As a consequence, some of the functions in tlviz will no longer work properly on CP tensors with metadata on the factor matrices. It can therefore be beneficial to not include the metadata in the beginning (this is the default behaviour of postprocess) and only add the metadata in the end. Alternatively, it can be useful to have a normally postprocessed CP tensor as well as a CP tensor with metadata.

Total running time of the script: ( 0 minutes 6.291 seconds)

Gallery generated by Sphinx-Gallery