When patients with regional accents reported that an AI receptionist couldn't understand their names, it generated genuine frustration and entirely legitimate criticism. Voice recognition that fails a patient trying to book a GP appointment is not a minor inconvenience. For some people it is a barrier to care.
But understanding why this happens, and what is actually being done about it, matters more than the headline. Because the accent problem in AI is not a mystery, it is not unique to healthcare, and it is not permanent.
Every voice recognition system learns from data. The more speech it hears from a particular accent, dialect or regional variation, the better it becomes at understanding it. The less it hears, the more it struggles. This is not a design flaw in any single product. It is a reflection of how the training datasets that underpin most commercial voice AI were assembled, and whose voices dominated them.
Research published in 2024 confirmed what many users had long suspected: widely used automatic speech recognition systems perform significantly worse on regional accents, minority ethnic accents and non-native English speakers. American and British received pronunciation dominated the datasets that trained most early commercial voice technology. The result was systems that worked well in certain parts of the country and considerably less well in others. The same pattern appears across smart speakers, voice assistants, and automated telephone systems. These are not hypothetical edge cases. They represent millions of people whose everyday speech was simply not well represented when the first generation of voice AI was built.
England is one of the most linguistically diverse countries in the world. Within a fifty-mile radius in almost any direction, you will find accents that differ substantially in vowel patterns, intonation, rhythm and consonant use. Add to that the millions of patients for whom English is a second language, those with speech differences caused by health conditions, and older patients whose speech patterns differ from younger norms, and the picture becomes clear. Building a voice system that works for NHS patients means building one that works for an extraordinarily wide range of human speech.
This is not a problem unique to healthcare AI. Academic research on speech recognition across regional British accents found that systems trained on southern English speech can see error rates fall by almost 20% when accent-specific pronunciation models are introduced. That is a significant and entirely addressable gap. The issue has never been whether it can be fixed. The issue has been whether the developers deploying these systems in public-facing services are doing the work required to fix it.
Digital health inclusion research has consistently found that patients with limited English proficiency experience worse outcomes from digital healthcare tools, including higher rates of failed interactions and greater likelihood of seeking care via emergency departments instead. NHS England's own digital inclusion framework acknowledges that health technology risks entrenching existing inequalities rather than alleviating them when it is not designed with the full diversity of its users in mind.
It is worth pausing on something that rarely gets acknowledged in this debate. Accent misunderstanding did not arrive with AI. Human receptionists mishear names, ask patients to repeat themselves, struggle on bad phone lines, and occasionally misroute calls because of communication difficulties. Patients with strong regional accents or limited English have navigated these moments for years, absorbing them as everyday friction in an already pressured system.
The difference is that when a person does it, it is accepted as an unfortunate but human reality. When an AI does it, it becomes evidence that the technology is fundamentally flawed. That is not a consistent standard. It is not a fair comparison. And it matters enormously when we are trying to have an honest conversation about whether AI in primary care is actually making things better or worse.
The honest comparison is not AI versus perfection. It is AI versus the reality of what already existed: a phone system with bad lines, overstretched receptionists working under enormous pressure, and no shortage of miscommunication. Measured against that baseline, the picture looks considerably more nuanced than the headlines suggest.
The crucial point, and the one that gets lost in coverage of AI failures, is that accent recognition is an engineering problem with engineering solutions. It is not a ceiling.
The path to improvement is straightforward in principle, even if it requires sustained effort in practice. Systems need to be trained on more diverse voice data. They need real-world exposure to the full range of accents they will encounter in deployment. They need to be tested specifically on regional and non-native speech before they go live. And they need feedback loops that surface failures quickly so they can be corrected.
Quantum Loop AI's EMMA system provides a practical illustration of how this works. When EMMA was first deployed in GP surgeries, accent recognition was one of the areas where performance fell short of what patients needed. The system encountered its first real-world regional voices not in a lab but in actual NHS surgeries across different parts of England. That exposure, at genuine scale, across genuinely diverse patient populations, is what has driven measurable improvement over the past two years. There is no shortcut to this. You cannot build an accent-inclusive voice system without exposing it to real accents, at volume, in real conditions.
The phonetic alphabet requirement that drew criticism was a workaround introduced precisely because name recognition was not yet reliable enough without it. The decision to remove it came when the underlying capability had improved sufficiently to make it unnecessary. That sequence, identifying a failure, working on the capability, and removing the workaround when ready, is what responsible iteration looks like.
None of this is an argument for patience without accountability. Patients using NHS services have a right to expect systems that work for them, and that right does not vary by postcode or accent.
What it does mean is that the conversation needs to move beyond simply noting that something failed, to asking what is being done about it. The questions worth putting to any AI system deployed in NHS primary care are these: Is the developer actively measuring performance across different accent groups? Is there a clear feedback route when the system fails a patient? Is the system being updated based on real-world performance data? And is there always a route to a human when the AI cannot help?
NHS England's digital inclusion framework is clear that alternative contact methods must remain available for those who need them. That is not a concession to AI failure. It is a design principle that should be non-negotiable in any healthcare technology deployment, and one that responsible developers already build in as standard.
The honest position is that AI voice systems in GP surgeries are at an early but genuinely improving stage of their ability to serve all patients. The gap between how well they serve patients with certain accents and how well they serve those with regional or non-native speech is real, documented, and actively being closed by developers who are embedded in enough real NHS environments to know exactly where their systems fall short.
That real-world presence, at scale and over time, is not a risk to patients. It is the only mechanism through which genuine, lasting improvement happens. The technology will get there. The question is whether the feedback, accountability and inclusive design standards are in place to make it happen as fast as patients need it to.
Sources: