Human Transcription Services vs. ASR: Which is Right for You?

Kat Hounsell

Transcription Decision Tree

Human Transcription Services vs. Automatic Speech Recognition (ASR): Which is Right for You?

If you need a transcript and you’ve ruled out doing the transcription yourself (smart choice, by the way), there are 2 main types of transcription to choose from. Firstly, you have human transcription services where real people, i.e. a professional human transcriber (transcriptionist in the US) will transcribe your audio or video file into text. The second is automatic speech recognition (or automated speech recognition, or voice recognition), where a machine transforms the speech into text.

Jumping straight to Automated Speech Recognition (ASR) can be very tempting, after all, artificial intelligence has come on leaps and bounds. Technology is always improving and it’s quick and cheap, which is ever so enticing. But, human transcription services are very much still in demand, for good reason. So, before you upload your file, we’ll walk you through the key factors you need to consider to ensure you get the right transcription service.

There are 2 key considerations when it comes to choosing between ASR and human transcription services:

  1. The quality of your audio file and the type of recording
  2. The type of output you need

Human transcription services vs automatic speech recognition

We’ve produced this handy infographic for you to make the process of choosing between ASR and human transcription services as painless as possible.

Now, on to making a decision…

Transcription Decision Tree

Step 1: Is your recording good quality?

As you might expect, a machine can only do so much. Accuracy levels can drop drastically if your audio has background noise or people speaking over each other. Unless you have crystal clear audio, human or hybrid transcription is going to provide you with a better output and remove the need for you to correct and edit the transcript.

If you do have a high-quality recording there are some other elements to take into account to see if ASR or human transcription services are the way to go.

Step 2: Does your content have multiple speakers?

If you’ve just got one speaker you can jump straight to step 3. 

Usually, if you have multiple speakers you’ll want to distinguish between them in your transcript, otherwise, the text can feel jumbled and it’s hard to work out who said what. Imagine paying for a transcript and then having to meticulously go through it yourself to identify the speakers – No thank you. Avoid that analysis headache by using human transcription services which can accurately identify and label the speakers for you. Perfect for interviews or market research focus group content.

If you don’t need the speakers identified, it’s time for step 3.

Step 3: Does the audio contain regional accents?

If you’re anything like us, you’ll love a regional accent. ASR seems a little less keen. It’s not personal, but machines can struggle with the wonderful array of dialects out there.

Most automatic speech recognition software has been developed using a standard US accent. You know, the generic kind of American accent that you might hear on the television, one that isn’t easy to place in a specific location. This means that many other dialects and accents may see much lower accuracy rates, especially if the participants are talking quickly.

Step 4: What type of transcription do you need?

This all comes down to you, what you need the transcription for, and whether you’ve got time to make edits to get the final output you need.

ASR will transcribe every single word. And we mean Every. Single. Word. On the surface this might not sound like an issue – after all, that’s what you’re paying for, to have the speech converted to text, right?

“But, have you erm, thought like, what all the err, you know, speech, would actually, um look like when it’s it’s written out.” 

Probably not exactly what you’re looking for! 

ASR will deliver the full verbatim, but if you don’t have time to go through and edit the transcript, human transcription services can remove stutters, repetitions and filler words for you. ASR services are improving in this area but consistency and accuracy vary.

So, if you want a clean transcript that’s ready for you to use human or hybrid transcription services will be the best option.

Step 5: How accurate does it need to be?

Accuracy is what divides a good transcript from the bad.

If you need 99%+ accuracy, human and hybrid transcription services will deliver the most accurate transcript. Most services will provide you with a guarantee and your transcript will be proofread, so you can be confident that what you receive will be of a high quality.

If you just need a rough idea of what people are saying and you don’t need high levels of accuracy, ASR is a good budget option. But remember, the accuracy, at best, is likely to be around 85% and will drop significantly if the quality of the audio isn’t great, you have multiple speakers or participants with strong accents.

Wait. What about timescales, isn’t an automated transcription service quicker?

If you need your transcript in a hurry, on the surface, ASR can appear like the clear winner. Many services can turn content around almost instantly, a speed that human transcription services simply can’t match, no matter how many transcribers they have or how they can split your file. However, as you’ve probably guessed, there is a big but.

You also need to factor in any time spent cleaning and editing your file to get it into a usable format. And, unfortunately, that can take much longer than you think. 

If you need real-time instant captions or transcriptions, ASR is likely the option for you. Professional human transcriptionists are pretty quick, but they can’t compete with ASR. If speed is your priority, ASR can deliver as long as you’re ok with lower accuracy levels.

For accurate transcription with guaranteed turnaround times, upload your audio or video files today.