Rough sleeping insights tool: Using machine learning to support decision-making across London
Background
London rough sleeping data is siloed, being held across 10s of disparate systems across the capital. This makes it difficult to view an end-to-end journey, how effective countermeasures have been, and whether there are common barriers preventing people from accessing support to get off the streets. Faculty, in partnership with Beam, the London Office of Technology and Innovation (LOTI), London Councils, and the Greater London Authority (GLA), were commissioned to tie a lot of this data together. We built a tool, termed the Strategic Insights Tool (SIT), that helps users across all London borough rough sleeping teams understand the broader picture. The project to create the SIT began in 2022, the pilot phase began in June 2023 with the Minimum Viable Product (MVP) launched in September 2023, and finally a full roll-out to all London boroughs and 14 services providers was delivered by February 2024.
The SIT combines rough sleeping data from different sources, including CHAIN (central rough sleeping database), In-Form (accommodation services data), and H-CLIC (housing application data) and leverages probabilistic data matching techniques to generate insights that end users can use for future planning purposes. For the first time, this unlocks insights enabling users to: better map the history and journey of different groups of rough sleepers and their interaction with homelessness services; offer insights into the effectiveness of support; and enable users to plan how to improve support.
How was probabilistic data matching used?
The data science and engineering team at Faculty used Splink, an open source machine learning algorithm to identify matches between individual records across the different data sources uploaded to the tool. The algorithm detects if records relating to a particular individual in H-CLIC, where their statutory homelessness application would be, are associated with the same person’s records in CHAIN or In-Form, where their rough sleeping outreach contacts are recorded, and are merged into a single rough sleeping ‘journey’. This process is not as straightforward as just matching the person’s name; typos, fake names, and other reasons might mean the names differ between systems. As a result, the complex matching algorithm considers a number of different factors, including “fuzzy matching” between names, and other personal details such as contact details, to identify potential matches.
If conflicts arise between the original records, a prioritisation mechanism selects the information from the most reliable source. However, such conflicts have little impact on the final result, because they tend to affect fields like the name or phone number which are of no use post-matching. The algorithm only associates records that meet an 85% probability threshold of being a match. This gives us a high level of confidence and includes as few false positives (cases where we’ve accidentally matches two different people) in the final matched dataset as possible.
Once all of the matches between different datasets have been determined, the different journeys are aggregated and visualised within the SIT. This means we can ensure that data is safely and securely anonymised, while providing insight into rough sleeping journeys across many different systems.
Who is using the tool and how?
The tool is now deployed across all 33 London boroughs and 14 service providers, with all organisations regularly providing up-to-date data either manually or via API. The SIT pools and presents the information in a user-friendly way, and gives decision-makers deeper insights into the effectiveness of various interventions which can be used to support future policy development, as well as service commissioning and delivery.
Generally, the tool is used to collaboratively identify issues and opportunities and develop responses to them. The SIT is not used for identifying, or trying to identify, specific individuals. Users critically do not have access to individual records, and so identifying specific individuals is not possible by design. See below for examples of questions users might answer / investigate using the SIT outputs:
- “I know that X% of my rough sleeping population are EEA nationals, but of those, how many are women under 25 who had a previous accommodation stay in my pathway?”
- “Why does such a high proportion of people sleep rough after having contact with my housing options team?”
- “I want to identify shared needs with neighbouring areas that might present opportunities for joint working”
What’s next?
The SIT continues to be used in operational settings across London, and is being supported and maintained by Homeless Link (the organisation that also supports and maintains the CHAIN database). Now that the probabilistic data matching algorithm has been deployed, the possibilities of supplementing the novel matched dataset with additional data are almost endless! There are also new opportunities to leverage the data to develop exciting applications of cutting edge data science and artificial intelligence, for example predictive modelling to forecast upcoming demand for services across discrete components of the rough sleeping pathway.
Local Public Services updates
Sign-up to get the latest updates and opportunities from our Local Public Services programme.