Imagine a doctor dictating a prescription, a nurse recording discharge instructions, or a pharmacist noting recommendations for a patient – all using voice tools with automatic speech-to-text transcription. It sounds convenient, but what happens if the system hears «Humira» and writes down something completely different? In medicine, such a mistake is more than just a typo.
This is precisely the question researchers at AssemblyAI asked: how accurately do modern speech recognition systems handle pharmaceutical names? The results were mixed and deserve the attention of everyone working at the intersection of medicine and technology.
Why Drugs Are a Special Case
Drug names are one of the most challenging word categories for any speech recognition system. They don't follow the usual logic of language: they are artificially created words, often similar in sound but completely different in action. It's easy to confuse «Celebrex» and «Cerebyx» during transcription, yet the first is used for arthritis, while the second is an anticonvulsant.
Add to this the diversity of accents, professional jargon, and background noise in a clinic, and the task becomes truly non-trivial. Transcription systems are trained on vast amounts of general text and speech, but pharmaceutical vocabulary is sparsely represented in this data. The model simply hasn't «seen» these words often enough to reproduce them confidently.
How the Test Was Conducted
The researchers took 50 widely used pharmaceutical drugs – both brand names («Lipitor», «Viagra», «Adderall») and generic ones («atorvastatin», «sildenafil», «amphetamine»). For each name, they recorded audio clips with several pronunciation variations and under different recording conditions.
These recordings were then run through several popular transcription systems. Accuracy was measured using a standard metric – the word error rate. Simply put: how many times did the system write something different from what was spoken?
Additionally, they tested whether a so-called custom vocabulary – the ability to provide the model with a list of specific words to consider during transcription – was helpful.
What the Study Showed
The overall picture is this: all tested systems made errors on drug names significantly more often than on ordinary speech. However, the difference between the models was substantial.
The best result was achieved by AssemblyAI's Best model, which reached an accuracy of about 80% for pharmaceutical names without any additional settings. This is noticeably higher than its competitors in their default modes.
When using a custom vocabulary, the model's accuracy increased to 90% and above. In other words, if you «prompt» the system in advance with the words it might encounter, it performs significantly better.
For comparison, other tested systems in their default modes showed accuracy ranging from 40% to 60% on the same data. This means that almost every second drug name could have been transcribed incorrectly.
Brand vs. Generic: Is There a Difference?
Yes, and a quite noticeable one. Generic (international) names – such as «metformin» or «amoxicillin» – appear more frequently in texts and have a more predictable structure. Models handle them slightly better.
Brand names – «Zyprexa», «Nexium», «Xarelto» – are far more unpredictable. They can sound like made-up words because, for the most part, they are. A speech recognition system that hasn't encountered such a word in its training data often picks the closest-sounding familiar alternative. Sometimes this is just funny; other times, it's dangerous.
Why This Matters Beyond the Clinic
Medicine is the obvious context. But pharmaceutical names also appear outside of it: in insurance documents, telemedicine consultations, audio recordings from pharmaceutical reps, educational materials, and health podcasts.
Wherever there is voice input or automatic transcription, there is a risk of error in a drug's name. And the higher the stakes, the more important it is to know how much you can trust the system.
This isn't a call to abandon AI transcription in a medical context, but rather a reminder: tools must be chosen deliberately, with an understanding of their limitations.
What to Do in Practice
If you use or plan to use voice transcription in a context where drug names appear, consider a few practical observations from the study:
- Custom vocabularies work. If your system supports providing a list of specific terms, use it, as the accuracy boost is significant.
- Baseline accuracy varies greatly between systems. Don't choose a tool blindly – it makes sense to test it specifically on the vocabulary that is important to you.
- Generic names are recognized more reliably. If you have a choice between a brand and a generic name when dictating, the latter is more likely to be recognized correctly.
- Human review remains crucial. Even 90% accuracy means one error in ten words. In a medical document, this can be critical.
An Open Question
The study covers 50 drugs – a sufficiently representative sample, but far from the entire pharmaceutical lexicon. A real clinical environment is much richer: rare drugs, new brand names, regional pronunciation variations, and abbreviations.
Furthermore, the test was conducted under relatively controlled conditions. How these systems will perform with real recordings from a noisy clinic, with the tired voice of a doctor on call, or with a non-standard accent is a separate question that the study doesn't answer.
Nevertheless, even in its current form, the work provides a useful benchmark: not all systems are created equal, the gap between the best and the average is significant, and fine-tuning capabilities really do impact the results.
If you work with voice data in a medical or pharmaceutical environment, this study is worth keeping in mind when choosing your tools.