Stream Audio in Real-time

Last updated 10/20/2021

To process audio in real-time, you will stream your audio to Deepgram as it happens. Deepgram will return temporary transcripts that finalize and self-correct as additional words are spoken.

Deepgram gives you streamlined access to automatic transcription from Deepgram's off-the-shelf and trained speech recognition models. This product is very fast, can understand nearly every audio format available, and is customizable.

When streaming audio in real-time, you will live-stream your audio to Deepgram and receive both live transcriptions and transcript corrections in return. During the process, you will receive multiple response messages as new transcripts become available and old transcripts are corrected.

Use Cases

Deepgram's real-time streaming is ideal for use cases involving live audio streams that need to be analyzed and transcribed as words are being spoken.

Real-time streaming differs from batch transcription, which is incredibly fast, low-latency, and is optimized for use cases involving pre-recorded audio. If you already have complete audio data on disk or in memory, then using batch mode will give you the best performance.

Command and Control

You need to dictate a message to your phone. You want to be able to see the words appear as soon as they are spoken, so that you can check the message and make sure that it is correct. As you speak, words begin to appear on your screen. As you continue talking, more words appear (as new transcripts become available). Eventually, phrases that start out wrong correct themselves (as old transcripts are corrected) until you have a final message.

Real-time Agent Assist

You manage a call center. You want all calls taken by your call center agents to be transcribed in real-time, so you can provide each agent with timely information to provide more effective customer service.

How It Works

Real-time streaming uses WebSockets, a communications protocol that enables full-duplex communication, which means that you can stream new audio to Deepgram at the same time the latest transcription results are streaming back to you. Using WebSockets is further eased by the wide variety of third-party client libraries that have been written to support a range of languages and production environments.