The Home Office’s new “ChatGPT-style” LLM tool is riddled with mistakes that could mean life or death for people seeking asylum
By Donald Campbell
Recently, the UK Government briefed the media on what sounded like a success story: a new “ChatGPT-style” tool that would allow them to cut the amount of time caseworkers spend on asylum claims, summarising information in order to save “44 years of time”.
What they didn’t mention were some of the alarming findings contained in a Home Office evaluation of the tool, quietly published on their website on April 29, the same day Dame Angela Eagle, the Minister for Asylum and Border Security, told LBC: “We can cut nearly half the amount of time it takes for people to search the policy information notes, and we can cut by nearly a third, the amount of time it takes for cases to be summarised, and that means there are significant increases in productivity here.”
The pilot is pitched as a success — but the picture is less rosy when you look at the details. The tool being trialled uses a ‘Large Language Model’ (LLM) to “extract and summarise information from existing asylum interview transcript documents.”
The problem is that it made a substantial number of mistakes. Nearly one in ten of the summaries it produced were found to be inaccurate, or had missing information. These summaries were so faulty they had to be removed from the study altogether.
The Home Office evaluation reads: “Technical specialists reviewed all summaries for accuracy prior to use in the pilot. A small proportion of summaries produced (9%) were deemed to be inaccurate or had missing information and were therefore removed from the pilot.”
Context reported last week that half of the caseworkers who tested the tool said it gave them incorrect information with some users saying it did not provide references to the asylum seeker’s interview transcript. Nearly a quarter said they were not “fully confident” in the summaries provided.
The evaluation described the mistakes as a “small proportion.” But if this tool is unleashed on asylum claims — which can be a very real matter of life or death — then can faulty information in one in ten cases really be seen as ‘small’?
At the end of 2024, the government had 90,686 asylum cases awaiting an initial decision, official data showed.
Sadly this isn’t our first experience of the UK Home Office seizing on dodgy tech ‘fixes’ for complex challenges.
Here at Foxglove — a non-profit which works to make tech fair for everyone — we successfully stopped the Home Office using a racist algorithm to help it process visas.
In 2020, after we brought a legal challenge against an algorithm which discriminated against people based on their nationality, then-Home Secretary Priti Patel backed down and agreed to withdraw it.
One thing we found was that that algorithm suffered from “feedback loop” problems known to plague many such automated systems — where past bias and discrimination, fed into a computer programme, reinforce future bias and discrimination.
This same concern also applies to the AI ‘LLMs’ that are now hyped as an answer to just about everything. These models hoover up vast quantities of material from the internet as part of their ‘training’. They do so at such a scale that it is impossible to have meaningful quality control over what they ‘learn’ from.
In short, we already know that the internet is awash with racism, bigotry and discrimination. Given this is the source of training material for LLMs of the kind the Home Office is using, there are serious concerns that this toxic material will influence the way these models carry out their work.
We already know — thanks to the Home Office’s somewhat over-optimistic evaluation of this tool — that it makes mistakes at an unacceptable rate, and is not seen as trustworthy by a significant chunk of the caseworkers who tried it out.
But on top of that — and in a much more pernicious way — there is the risk that this tool may bring to bear the many biases and bigotries of the internet on the decisions it makes when generating material for busy, overwhelmed caseworkers.
And because there is so little insight and accountability around the training methods of these LLMs — they are frequently referred to as ‘black boxes’ because even their developers don’t fully understand how they come to certain conclusions — this is a very hard problem to detect, let alone fix.
To give the Home Office some credit, they have made incremental improvements in the past few years. They are now, at least, running trials and publishing assessments of these AI tools before pushing ahead with implementing them.
But these assessments only reinforce the need for extreme caution, and still leave a great many unanswered questions: how are the Home Office going to deal with its persistent tendency to make mistakes? What are they going to do to address the widespread — and it seems, well founded — concerns of their staff about using this tool? What LLM are they using, and how can they guarantee that it will not replicate biases from its training data when compiling summaries of crucial information?
Until these questions can be convincingly answered, there should be no prospect of unleashing this tool on the life and death process of deciding asylum claims.
This article was originally published by the Byline Times.