Tracking smartphone Wi-Fi signals reveals curious journeys on the London Underground

A trial by TfL to track the movements of people by logging Wi-Fi identities has revealed some very curious journey patterns on the tube network.

The aim of the trial was to track real-life movements of the public across the network in order to map journey patterns and find otherwise hidden congestion and quiet patches within stations and tunnels.

It’s not unusual for organsiations to carry out such surveys, but they tend to have a participation bias, in that only those interested enough to carry a monitoring device, or fill in surveys tend to be tracked, leaving the vast majority of real people’s behaviour to be inferred, with varying levels of accuracy.

TfL already logs where people enter and leave the network, thanks to ticket barriers, but the big mystery was what people did once they were in the network.

To answer that, what TfL did was to take advantage of the Wi-Fi coverage in the stations, by logging the individual Media Access Control (MAC) address numbers that all smartphones emit when trying to connect to a local Wi-Fi hotspot, and by logging each point where a smartphone tried to connect to the Wi-Fi service, their journey could be mapped.

In terms of privacy, the ID number of the smartphone isn’t directly linked to any personal information about the user of the phone, so it is just a seemingly random ID code. However, it has been suggested that in theory, it could be possible for some less reputable organisations to link, for example a mobile payment to a smartphone ID and hence gain information about who is doing the journey.

TfL says that it double-scrambled the data to prevent anyone being able to infer any sort of identity of the people who journeyed on the network during the trial, which took place late last year.

In total just under 510  million data points were collected from 5.6 million unique smartphones during the trial as people wandered around leaving a digital trace in their wake.

Not all tube stations with Wi-Fi were part of the trial, which was mainly concentrated within Zone 1 and some parts of the Jubilee and Northern line out to Zone 4. After all, if a person gets on a tube train in Stanmore, and leaves at Wembley Park, there’s no mystery about the journey they took.

However, a person starting at Waterloo has many different routes to get to King’s Cross, and the trial found that many different routes are indeed being used.

Another aspect of the trial is that it shows hidden congestion that has long been anecdotally known, but never proven in actual numbers. For example tube stations with several lines have a lot of traffic swapping between lines, but that would never show up in Oyster card tracking at the ticket gates.

In one example, Oxford Circus shows higher traffic flows than Oyster card data would suggest. While the swapping between lines was known, how many people did so was not.

It’s much easier to prioritise station upgrades when you have a clearer understanding of where pressure points exist.

This information could also be used to offer extra information to people planning a journey. For those who value comfort over speed could be offered alternative routes, which is good for them, but diverting them away from the busy route also releases more capacity on the busy route, for the benefit of those passengers as well.

The same could be used within stations.

For example, the below image shows the number of people within Euston station on 30th November 2016, just before the station was closed due to overcrowding.

The overcrowding was likely due to an incident at King’s Cross, causing overcrowding on the northbound Northern line. However, lacking live granular data, the entire station was closed, rather than attempting to keep it open for southbound customers, where overcrowding wasn’t an issue.

This wont be possible in many stations, but some stations could have selective closing, if Wi-Fi tracking showed that only part of the station was overcrowded.

At the moment, the data collection carried out by TfL for the trial was not real-time, so while useful for later analysis, it’s no good for live monitoring. They expect that they will be able to upgrade to live passenger movement monitoring, and then they opens up some interesting ideas.

For example, if they notice that people making a journey between two locations are taking longer than normal along Route A, they could alert people to switch to Route B instead.

The data suggests that they can predict the crowding levels on individual trains, so it could be that when a busy train comes in, people can be reliably informed that the train behind is half-empty, and some people may choose to wait at the station for a couple of minutes in order to get a seat, or at least not be quite so squashed.

The trial over, and results proven to be of use, TfL says that it is now working with the Information Commissioner’s Office, privacy campaigners and consumer groups about how the data collection could be undertaken on a permanent basis.

One question that some may ask is that now they TfL has all this journey data, can they send you a copy of your journeys? In simple terms, no they can’t. When the ID numbers from each smartphone were collected, they were encrypted in a way that makes reversing the encryption essentially impossible. TfL also added what is known as a “salt” to the ID numbers, so even if someone knew the encryption, they would also need the salt, and that salt value was destroyed after the trial ended.

So, unless the spooks at GCHQ have something clever in their computers, it’s currently impossible to ask for a copy of someone’s journey across the TfL network based on their smartphone data.

At the moment, the privacy concerns that some people are expressing seems unlikely to be a significant issue.

Whether that will survive the government’s current desire to weaken data encryption though, that’s the big issue that could cause a privacy headache in the future.

Tagged with: , ,

Whats's on in London: today or tomorrow or this weekend

17 comments on “Tracking smartphone Wi-Fi signals reveals curious journeys on the London Underground
  1. GT says:

    I must admit that, looking at those journeys ( Waterloo – KX ) my first thought was “wtf?”
    I mean, it’s cross-platform @ Oxford Circus Victoria-Bakerloo lines & is the shortes.
    Why would anyone go any other way – even Jubilee/Victoria lines, given the interchange-length at Green Park.
    Uh?

    • KK says:

      That was my exact reaction. Why on earth!?

    • Joe says:

      The Vic to Jub interchange at Green Park isn’t very long at all. Simply up some stairs, along a short passage and straight down the escalators. It is all the other interchanges at Green Park that are long.

      • Road-hog123 says:

        As has been suggested by others, those unfamiliar with the network will go by sub-optimal routes.

        It’s well known that the tube map is not geographically accurate, including that it omits the lengths of changes. When I did KX to Waterloo recently I picked up a map at KX and picked the most direct route: Picc and Northern via Leicester Sq. – Oxford Circus and Green Park seem out-of-the-way, and there’s no indication that changing is any longer there.

        Warren Street and Euston I can see being used because those routes are “Up and Across”, nice and simple to navigate.

        Many of the small numbered ones I can see being as a result of people getting lost or meeting people without leaving the system, the latter especially those with many changes. London Bridge seems like a good contender for “got on a train going the wrong way” for example, but also for meeting a National Rail arrival. Similarly I could understand using Picc at Green Park as “getting the wrong blue line” and possibly Baker Street for meeting a Paddington arrival?

  2. Martin Hollands says:

    This is really interesting, taking the Waterloo to Kings Cross example, I would love to see the time slices to see who travels which routes at different times of the day.

    What this doesn’t show of course is Why! Do some people take the more circuitous routes as they are meeting other people, or they’re lost or maybe tourists just wanting to explore the network.

    Depending on how much time I have, I will often take odd routes between stations just to have a nose around and as I hate boring repetition.

  3. Matthew Malthouse says:

    I’d love to see if percentage of route choices for a given trip, such as the WAT>KGX illustrated, differ when taken in the reverse direction.

  4. Andrew says:

    My guess would be that a lot of people do not know or care much about the geography, or the intimate details of the transfers at each station: regular commuters find a route that works for them, with reasonable comfort in a reasonable time and low chance of disruption, and stick to it; less regular travellers will look at the map and choose a route the looks shortest, quickest or easiest, particularly if it links with a part of the network that feels familiar to them.

    For example, Waterloo to Kings Cross on Jubilee and Victoria via Green Park is 6 stops; on Bakerloo and Victoria via Oxford Circus is 7 stops. By that metric, routes via Leicester Square or Warren Street look OK too. But it is amazing that significant numbers of people will go via Bank, or even Baker Street or London Bridge. Why?

  5. David says:

    I suspect the route between Kings Cross and Waterloo that people take may well depend on which entrance to the station they use or the exit they want to be closest too.
    Both stations have multiple entrances and exits and people will choose the closest line when they enter the station, or if they know the journey well, the one that is quickest to exit from.

  6. Paul says:

    I wonder if it’s age dependent, and people mainly just use the route they have grown up with, which may have once been the best advice, but isn’t now?

    Waterloo – Kings Cross is a great example, and if I remember correctly back in the days of “follow the overhead coloured lights” the route was via the Northern line, and this was never changed to ‘via Oxford Circus’ for the many years that the lights remained in use.

  7. Graham says:

    i Wonder if the routes in the 1% bracket are genuine, every day, or whether they might be as a result of disruption or having to meet up with a friend en route?

    • Mark says:

      Yes, some of these routes look like there was an aborted attempt to use a particular tube line involved. For example, i swap at Euston or Warren St if the Victoria line trains grind to a halt.

  8. Penny says:

    Follow the overhead lights – There were different coloured lights for different lines. My personal favourite was the red light for Piccadilly ;)

  9. Philip Richards says:

    One other factor that has a big influence such as in this example is the National Rail Journey Planner (plus other train operators sites who use same data). For example, a journey from Peterborough to Basingstoke I’ve just searched for would take you via Kings Cross then onto Waterloo. Options for crossing London are 1. Northern then Victoria via Warren Street and 2. Piccadilly then Northern via Leicester Square. No suggestion of Victoria then Bakerloo via Oxford Circus!

  10. Rosemary says:

    Re the Green Park interchange: people who have difficulty with stairs would be likely to choose a Green Park interchange that looks weird to a totally able person because it can be done entirely on the level and the map shows this.

  11. Geoffrey says:

    Follow the coloured lights was a World War 2 system to help servicemen and women to find their way across London particuarly from southern termini to north bound trains and for leave times in west end

Leave a Reply

Your email address will not be published. Required fields are marked *

*