London-Edinburgh-London 2022

London-Edinburgh-London participants follow a route through the Fens of the south, the Wolds and Pennines of northern England and the Southern Uplands of Scotland, before turning at Edinburgh to make the return journey to London. Riders support themselves on the road, but the route passes through a series of control points where food and basic sleeping facilities are provided. The event is organised and run by a team of hard working and enthusiastic volunteers, including many who travel from outside the UK to help out.

LEL 2022

The scheduled 2021 edition of LEL was delayed a year by the Covid pandemic, but just over 1500 riders started on August 7th 2022 from two locations; Guild Hall in central London and the main Debden control in north London. Riders could choose from two different challenges: The majority of riders were aiming to beat the standard 128 hr 20 minute limit, and left in small groups at 15 minute intervals from 06:00 to 14:45 (start times allocated by ballot). Faster riders wanting a guaranteed early start could leave from Debden at 05:00, but with a 100 hour maximum time limit. Weather conditions were good, with no rain at any point, and fairly moderate winds, but the summer heat proved a challenge for many, with temperatures nudging 35 degrees C by the end of the week.

Data

Arrival times at controls were manually recorded on laptops to provide basic rider tracking for the organising team and people following LEL online. After the event, the organiser published this information in anonymised form (you can download it here) and invited people to make use of it.

Data cleaning

The manual recording system, plus riders arriving tired and distracted at controls meant that there were lots of opportunities for times to be recorded incorrectly or missed completely, so there a number of gaps and inconsistencies in the published results data. To clean up the published data, I scraped the source data from the LEL tracking page and spent some time filling in the gaps and correcting as many discrepancies as I could (supplemented with additional information from the offical AUK LEL results).

Please note that all the charts and analysis on this page are based on this source data as published, and the numbers and outcomes summarised do not represent the official results of the event. The definitive record of every ride is the brevet card carried by each rider, and stamped by the controllers at each stop. These cards are manually scrutinised and approved by the organising team and Audax UK after the event to determine if a rider successfully completed the ride within the allotted time (Official AUK LEL 2022 Results).

LEL 2022 Charts

Summary outcomes

After tidying up some of the obvious problems in the data, I ended up with the following totals (DNF = did not finish and DNS = did not start):

The official Audax UK results list 889 riders homologated as successful finishers, vs the 885 in-time finishers shown in the charts above. I did some cross checking between the AUK results and the control data to see where the differences lay: 

As mentioned in the introduction, my numbers are derived from the recorded control data, which includes lots of gaps and inconsistencies, and does not represent the official finish results based on the rider brevet cards scrutinised by the LEL team and AUK.

DNF outcomes

The most obvious feature of the chart above is the substantial number of riders who did not finish the event or finished out of time; testament to the challenge that LEL presents. The next chart shows where those riders got to:

A number of riders continued riding after logging a DNF and recorded times on their return journey. For the chart above, if the rider did not reach Dunfermline, I have taken the most northerly control reached as the DNF location. For riders that did pass through Dunfermline, the DNF location is the furthest control recorded.

Not unexpectedly, the hardest stages of the route, north of the Humber (with the Howardian Hills and the big moorland climbs of the Pennines) appear to have ended many rides, with another peak at the northern turning point of Dunfermline (the last opportunity to catch a train for the best part of 200km on the way south).

Note that the chart above shows a small number of riders for Great Easton and Debden finish; these are riders who were specifically flagged in the rider tracking system as "Rider dropped out" despite recording an arrival time at these controls.

Here is a slightly different view of the same data, looking at the day when DNFs were recorded (by cumulative percentage of DNFs):

It is notable that 40% of the DNFs were recorded by the end of the second day, and 80% by Wednesday, so a substantial number of people appear to have encountered ride-ending problems relatively early in the event.

Start time vs outcome

We can't tell whether those issues were heat, terrain, fitness, mechanical issues or something else, but I wondered if there was any relationship between starting time and outcomes. For example, could a late start disrupt performance by tempting riders to stop early on their first day, or conversely by encouraging them to ride longer through the night?

The charts above group riders by start time, showing the relative percentage of successful and unsuccessful outcomes for each time bin. The chart on the left shows the original starting groups at 15 minute intervals, with lots of variation between each starting group. Reducing the number of bins in the centre chart smooths the noise a bit, but does not reveal an obvious pattern. The final chart divides all riders into two equal sized time buckets, before and after 09:52. At this point the percentage of riders finishing in time is identical for both groups, at 58%. Either most riders ended up with a start time that suited them, or more likely there is really no relationship between starting time and outcome.

Stage 1 Speed

The Stage 1 average speed is one of the most interesting pieces of information in the data. Stage 1 is the only point where we can make a reasonable estimate of rider performance, because the majority of participants are likely to have ridden from Debden (or Guild Hall) to St Ives with minimal or no stops. After arrival time is recorded at St Ives, riders will spend hugely varying amounts of time at the controls they pass through, and riding speed can no longer be measured directly from the raw arrival times.

The centre plot in the chart above shows the distribution of Stage 1 speed vs Finish time, for riders on the 128:20 time limit. Darker colours correspond to larger numbers of riders. The marginal plots above, and to the right, show the frequency distribution of speed and finish time respectively, and the red dashed line indicates the time limit. The coloured lines superimposed on the marginal plots are a smoothed representation of the distributions plotted in the histograms underneath. Note that the marginal plots are stacked columns, so the height of the column represents the total number of riders, with the different colours showing how many of each category are in that column (i.e. not three separate histograms layered on top of each other). DNFs don't appear in the right hand marginal or the main plot, because they don't have a finish time.

There is an obvious distribution on both axes, centred around 24kph stage 1 speed & 125 hours finish time; lots of "full-value" rides on an event as demanding as LEL. The top plot shows the speed distributions, with the "DNF/out of time" group appearing to be slightly skewed to the left of the "in time" group. Are these actually two distinct populations, or just part of the same overall distribution of rider speeds? It's a very long time since I have done any statistics, but a quick Google search suggested a KS two sample test. The results of this test suggest that the difference is indeed statistically significant, and two distinct distributions can be identified in the rider speed data for stage 1.

Here is the corresponding plot for the 100:00 time limit, with the centre chart in the form of a scatter plot because of the smaller numbers:

Estimating control times

The main limitation of the source data is that it only tells us when riders arrived at controls. We can calculate the time difference between a rider's arrival at one control and their arrival at the next control, but we don't know how much of that time was actually spent at the first control. However, if we can estimate how much time the rider took to cycle between the two controls, we can calculate the control time. This is where Stage 1 speed comes. The idea is that Stage 1 speed can be used to estimate the riding time for other stages, and hence calculate the stopping time for each rider at every control they visit. 

Experimenting with my own ride data (recorded by GPS) as a reference, I found that I could get a reasonable estimate of my real average speed by starting with a base of 80% of my Stage 1 speed and applying simple weighting factors to reduce the speed more for the harder and later stages. I used this formula to calculate average speeds for every rider on every stage, and then used those values to estimate the time spent by every rider at each control. This approach also allowed me to clean up the data a bit more. There were around 300 riders with one or more missing control times, so I used the estimated average stage speeds to fill in the gaps in the incomplete journeys. Very rough and ready, but good enough to plot some more charts...

Controls

With arrival times and estimated departure times for every rider, it is possible to estimate how many riders are at a control at any particular time. The following charts show the numbers of northbound and southbound riders at every control over the duration of the event. The grey bars represent the hours of darkness (as measured at Malton) and the green bar on the Debden chart represents the time limit for riders on the 128:20 schedule (the limit is a band of time in this chart , rather than a line, because it represents a rolling cut-off for riders who started between 06:00 and 14:45) . These mini charts are a bit small to see clearly, but you can click on the image to view a page with higher resolution plots for all the controls.

The control charts show the twin peaks nature of the load on the controls that serve both outwards and return journeys, and how the gap between the peak loads narrows the further north the control is. Hessle was the most popular first night stop, by some distance, but was bookended by very similar volumes of riders who pushed on a bit further to Malton, or didn't make it quite as far and opted for Louth. The volumes at Moffat and all points north are substantially lower, showing the effect of the significant number of DNFs and the stretching field.

Here is the same information, displayed as an animated bar graph:

The next chart shows the distribution of the control times as a Boxen plot:

In the Boxen plot, the central line represents the median value, the first two boxes on either side represent 50% of the data, the next pair of boxes represent 25% of the data and so on, out to the edges of the plot. 

So in the chart above, Malton displays a similar distribution of  data for riders travelling south and north. The median stopping time is just under 2 hours, and 50% of riders stopped for a time of between 1 and 4 hours. In contrast, Hessle shows a much greater variance; half of the riders visiting on the way north spent between 1 and 4.5 hours at the control, but on the return journey, the time range for 50% of riders was only 1 to 2.5 hours.

To finish off the controls, here is a 3d plot showing the different controls stacked against each other:

First sleep stop

Reading the Winter 2022/23 edition of Arrivée, I was struck by an anecdotal remark recounted in Nick Tickner's article about his experience of LEL 2022: 

"anyone who needs to sleep at Louth probably isn't going to make it".

Is this true? What does the data show? For the following charts, I  looked for the first control where each rider stopped for at least 3 hours, and assumed this location was their first sleep stop. The "No sleep" group consists of riders who did not record any control stops of at least 3 hours.

It is true that for the riders who stopped at Louth, in-time finishers were outnumbered by riders who were DNF or finished out of time. However, around 43% of the riders went on to finish in time, so stopping at  Louth certainly didn't guarantee an unsuccessful finish. Success rates were clearly better for riders who reached controls further north on their first day. For riders stopping at Malton, in-time finishers outnumbered DNF/out-of-time finishers by almost 3 to 1.

 The distance travelled on the first day only tells part of the story, and the second chart shows elapsed time before the first stop of at least 3 hours (the plots are normalised as percentages here to compensate for the different sizes of the two groups).  This distribution of first day times looks very similar for both groups of riders, suggesting that riders who were DNF or finished out-of-time put in just as big a shift as the in-time finishers on the first day; they just didn't get so far. 

Visualising the complete event

The method used to estimate the control times can also be extended to estimate the position of riders on the road stages between controls, so the final chart brings things together with a overview of the whole event; London Edinburgh London in one minute!

 (the animation is a large file and may take a while to load when viewing this page)