Using Speech Recognition Software

Hardware and software advances make it faster, easier and more accurate

By Robert F. Jones, MD

Speech recognition software became widely available in the mid 1990s. If you tried and abandoned this technology because it was too slow, inaccurate and cumbersome, now might be the time to reconsider.

Today’s continuous speech software enables you to dictate notes rapidly without pauses and has an accuracy rate that can be better than a human transcriptionist. This software can eliminate transcription costs and turnaround time, improve practice and physician efficiency and enable instantaneous review of your patient notes by everyone on your office network.

How does it work?

Speech recognition software generates a statistical model of the user’s speech and usage, transforming voice audio input into text. Dragon Naturally Speaking (DNS), distributed by ScanSoft Inc, is the speech recognition “engine” that has persevered as the standard in this market.

This software’s current version 7.3 combined with an up-to-date computer provides what is essentially automated transcription with minimal discernable delay.

Getting started is relatively easy. After loading the software on the computer, the new user is prompted to read a text selection. This takes about 20 minutes. The software then uses this information to optimize itself for that particular user’s speech characteristics. Automated transcription can then begin immediately. The user’s speech file is updated with each use, and accuracy improves as recognition errors that occur are corrected. After a few hours of use, the software can have an accuracy rate as high as 98 percent.

Transcribing voice to text is not the only magic that this software can perform. DNS can be instructed to execute ‘macros’—a series of steps that accomplish a specific task, like inserting standardized text or a signature in a document—by saying just a few works. DNS can input text or carry out commands in a wide variety of Windows-based programs.

There’s another challenge that faces providers who seek to implement this type of system in their office. To achieve maximum efficiency, basic patient information (like name, date of birth, referring physician) must be quickly and accurately inserted into preformatted documents or a database, and the transcribed text must be easily stored and accessed.

There are several EMR (electronic medical records) packages on the market that now accept text from DNS and address these issues.

Practice recommendations and cost

I started using DNS as a solo practioner in 1998. As the practice grew, we developed a simple document creation, storage and retrieval program that has enabled us to complete patient encounters as fast or faster than with a handheld recorder. We haven’t used a transcriptionist for the past six years. The practice has grown to four providers, so we save about $4,000 each month in transcription expenses. We’ve used this system successfully for 15,000 patients and have created about 60,000 documents.

Our experience has taught us the following:

We dedicate one desktop workstation to each provider, and typically work in our offices to minimize background noise. Each of these workstations, with the appropriate software, soundcard and microphone, costs less than $2500.

These ‘dictation’ workstations are connected to our office network, and the documents created are stored on the server. This gives both providers and staff instant access to the records.


Speech to text accuracy is of paramount importance. Many hospital transcription departments accept an accuracy rate of 98 percent (one error per 50 words) in the documents they handle. In our practice however, we found that this number of “speak-0s” (speech typographical errors) causes enough disruption in the meaning and flow of the documents as to render them potentially inaccurate and certainly unprofessional. While using the early versions of DNS, we thoroughly proofread every document, using additional provider and staff time. With advances in hardware and software, we no longer find this necessary.

To determine our current baseline, we had a professional transcriptionist review 500 consecutive patient encounter documents that generated over the course of about one month earlier this year. Using the system described above, the accuracy rate was 99.7 percent, dictating at slightly more than 100 words/minute (including any error corrections). This is equivalent to about one error every 300 words.

A final perspective

Speech recognition requires some effort to be truly practical in a modern medical office. Potential users must be willing to learn to enunciate more clearly than they normally do and tinker with the macros to achieve maximum efficiency. A moderate comfort level with computers is certainly helpful, or a consultant can be hired to assist in setup and training.

Appropriate use of the technology can result in huge transcription cost savings (about $50,000 a year for us) without an increase in the time providers spend generating notes. This is one step closer to a true electronic, paperless medical record.

Useful links

Robert F. Jones, MD, is an orthopaedist in private practice in Leominster, Mass. He can be reached at

