A trial by TfL to track the movements of people by logging Wi-Fi identities has revealed some very curious journey patterns on the tube network.
The aim of the trial was to track real-life movements of the public across the network in order to map journey patterns and find otherwise hidden congestion and quiet patches within stations and tunnels.
It’s not unusual for organsiations to carry out such surveys, but they tend to have a participation bias, in that only those interested enough to carry a monitoring device, or fill in surveys tend to be tracked, leaving the vast majority of real people’s behaviour to be inferred, with varying levels of accuracy.
TfL already logs where people enter and leave the network, thanks to ticket barriers, but the big mystery was what people did once they were in the network.
To answer that, what TfL did was to take advantage of the Wi-Fi coverage in the stations, by logging the individual Media Access Control (MAC) address numbers that all smartphones emit when trying to connect to a local Wi-Fi hotspot, and by logging each point where a smartphone tried to connect to the Wi-Fi service, their journey could be mapped.
In terms of privacy, the ID number of the smartphone isn’t directly linked to any personal information about the user of the phone, so it is just a seemingly random ID code. However, it has been suggested that in theory, it could be possible for some less reputable organisations to link, for example a mobile payment to a smartphone ID and hence gain information about who is doing the journey.
TfL says that it double-scrambled the data to prevent anyone being able to infer any sort of identity of the people who journeyed on the network during the trial, which took place late last year.
In total just under 510 million data points were collected from 5.6 million unique smartphones during the trial as people wandered around leaving a digital trace in their wake.
Not all tube stations with Wi-Fi were part of the trial, which was mainly concentrated within Zone 1 and some parts of the Jubilee and Northern line out to Zone 4. After all, if a person gets on a tube train in Stanmore, and leaves at Wembley Park, there’s no mystery about the journey they took.
However, a person starting at Waterloo has many different routes to get to King’s Cross, and the trial found that many different routes are indeed being used.
Another aspect of the trial is that it shows hidden congestion that has long been anecdotally known, but never proven in actual numbers. For example tube stations with several lines have a lot of traffic swapping between lines, but that would never show up in Oyster card tracking at the ticket gates.
In one example, Oxford Circus shows higher traffic flows than Oyster card data would suggest. While the swapping between lines was known, how many people did so was not.
It’s much easier to prioritise station upgrades when you have a clearer understanding of where pressure points exist.
This information could also be used to offer extra information to people planning a journey. For those who value comfort over speed could be offered alternative routes, which is good for them, but diverting them away from the busy route also releases more capacity on the busy route, for the benefit of those passengers as well.
The same could be used within stations.
For example, the below image shows the number of people within Euston station on 30th November 2016, just before the station was closed due to overcrowding.
The overcrowding was likely due to an incident at King’s Cross, causing overcrowding on the northbound Northern line. However, lacking live granular data, the entire station was closed, rather than attempting to keep it open for southbound customers, where overcrowding wasn’t an issue.
This wont be possible in many stations, but some stations could have selective closing, if Wi-Fi tracking showed that only part of the station was overcrowded.
At the moment, the data collection carried out by TfL for the trial was not real-time, so while useful for later analysis, it’s no good for live monitoring. They expect that they will be able to upgrade to live passenger movement monitoring, and then they opens up some interesting ideas.
For example, if they notice that people making a journey between two locations are taking longer than normal along Route A, they could alert people to switch to Route B instead.
The data suggests that they can predict the crowding levels on individual trains, so it could be that when a busy train comes in, people can be reliably informed that the train behind is half-empty, and some people may choose to wait at the station for a couple of minutes in order to get a seat, or at least not be quite so squashed.
The trial over, and results proven to be of use, TfL says that it is now working with the Information Commissioner’s Office, privacy campaigners and consumer groups about how the data collection could be undertaken on a permanent basis.
One question that some may ask is that now they TfL has all this journey data, can they send you a copy of your journeys? In simple terms, no they can’t. When the ID numbers from each smartphone were collected, they were encrypted in a way that makes reversing the encryption essentially impossible. TfL also added what is known as a “salt” to the ID numbers, so even if someone knew the encryption, they would also need the salt, and that salt value was destroyed after the trial ended.
So, unless the spooks at GCHQ have something clever in their computers, it’s currently impossible to ask for a copy of someone’s journey across the TfL network based on their smartphone data.
At the moment, the privacy concerns that some people are expressing seems unlikely to be a significant issue.
Whether that will survive the government’s current desire to weaken data encryption though, that’s the big issue that could cause a privacy headache in the future.