You've successfully subscribed to Better Data Science
Welcome back! You've successfully signed in

# Data Science for Cycling - How to Visualize Gradient Ranges of a GPX Route Part 5/6 - Visualize gradient ranges of a Strava route with Python and Plotly

It's been quite a while since the last article in the cycling series, I know. The good news is - the story continues today. We'll continue where we left off, and that's gradient analysis and visualization. By now you know what gradients in cycling are, and how to calculate gradients as an elevation difference between two points.

Today we'll visualize gradient ranges, which means showing how much time and distance was covered in a particular gradient range, for example, between 3% and 5%. In the upcoming article, we'll include that visualization (and others) in an interactive Python dashboard.

## How to Read a Strava Route Dataset

We won't bother with the GPX route file today, as we already have a CSV file that contains data points, elevation, distance, and gradient data. To start, we'll import Numpy, Pandas, and Plotly, and then we'll read the dataset:

``````import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.offline as pyo

Here's what it looks like:

We're particularly interested in the `gradient` column. To start the analysis, let's call the `describe()` method on it:

``route_df['gradient'].describe()``

The route looks mostly flat (mean and median), with a minimum gradient of -29.2% and a maximum gradient of 17.5%. These are the key information we need for the next step - creating intervals (bins) for gradient ranges.

## How to Create Intervals with Pandas

We'll now group gradient values into bins. That way we can calculate statistics for every gradient range - for example, for all data points captured on a 3-5% gradient. To do so, we'll use the `IntervalIndex` class from Pandas. It allows us to create bins from tuples.

The values used in the interval index below are completely random. You're free to use different ones to accommodate your route file. The bins are also left-closed, which means the value on the left is included, but the one on the right isn't:

``````bins = pd.IntervalIndex.from_tuples([
(-30, -10),
(-10, -5),
(-5, -3),
(-3, -1),
(-1, 0),
(0, 1),
(1, 3),
(3, 5),
(5, 7),
(7, 10),
(10, 12),
(12, 15),
(15, 20)
], closed='left')
bins``````

Let's now add these bins to the dataset by using the `cut()` method from Pandas:

``````route_df['gradient_range'] = pd.cut(route_df['gradient'], bins=bins)

We now have 13 distinct groups stored in the `gradient_range` columns. As the next step, we'll calculate a couple of statistics from it that will be useful for visualization.

## Calculate Statistics from Gradient Ranges

The goal now is to create a new DataFrame that will contain statistics for each gradient, including:

• Distance traveled
• Percentage of the ride spent in this gradient range
• Elevation gained
• Elevation lost

We'll create it by iterating over each unique gradient range and subsetting the dataset - and calculating statistics from there:

``````gradient_details = []

# For each unique gradient range
# Keep that subset only

# Statistics
total_distance = subset['distance'].sum()
pct_of_total_ride = (subset['distance'].sum() / route_df['distance'].sum()) * 100
elevation_gain = subset[subset['elevation_diff'] > 0]['elevation_diff'].sum()
elevation_lost = subset[subset['elevation_diff'] < 0]['elevation_diff'].sum()

# Save results
'total_distance': np.round(total_distance, 2),
'pct_of_total_ride': np.round(pct_of_total_ride, 2),
'elevation_gain': np.round(elevation_gain, 2),
'elevation_lost': np.round(np.abs(elevation_lost), 2)
})``````

Once done, convert the list to the DataFrame and sort it by the gradient range. It's an `IntervalIndex`, which means sorting works like a charm:

``````gradient_details_df = pd.DataFrame(gradient_details).sort_values(by='gradient_range').reset_index(drop=True)

Here are a couple of interpretations:

• I've covered 442.96 meters in a gradient range of [-30%, -10%), and lost 68.58 meters of elevation along the way.
• Most of the ride is flat [-1%, 1) - 71,56% of the route or 26 kilometers.
• I've ridden only 911 meters on gradients of 10% and above.

Let's now visualize this data.

## Visualize Strava Gradient Ranges with Plotly

I've decided to use Plotly for visualizing data because it produces interactive charts by default. You're free to stick with Matplotlib or any other library.

To start, let's declare a list of colors for each gradient range - going from blue to red (descent to ascent):

``````colors = [
'#0d46a0', '#2f3e9e', '#2195f2', '#4fc2f7',
'#a5d6a7', '#66bb6a', '#fff59d', '#ffee58',
'#ffca28', '#ffa000', '#ff6f00', '#f4511e', '#bf360c'
]``````

We'll make a bar chart, and each bar will display a gradient range and distance traveled in kilometers. Each bar will also show the range and the distance traveled. Feel free to convert the values to miles if you're using the Imperial system:

``````custom_text = [f'''<b>{gr}%</b> - {dst}km''' for gr, dst in zip(
gradient_details_df['total_distance'].apply(lambda x: round(x / 1000, 2))
)]``````

And finally, we'll create the figure:

``````fig = go.Figure(
data=[go.Bar(
y=gradient_details_df['total_distance'].apply(lambda x: round(x / 1000, 2)),
marker_color=colors,
text=custom_text
)],
layout=go.Layout(
bargap=0,
yaxis_title='Distance covered (km)',
autosize=False,
width=1440,
height=800,
template='simple_white'
)
)
fig.show()``````