Last updated 10/20/2021

Although Deepgram provides a library of beginner speech recognition models you can use to get started unlocking your audio data, you can increase accuracy and adjust to complex use cases by model training using intermediate or advanced models powered by machine learning.

Have a complex use case or prefer to forgo training? Deepgram AI experts are available to train an expert model to your needs, delivered in weeks.

Collect Data

Deepgram accepts over 40 audio formats, which we use to establish a foundation for reliable learning patterns.

For best results, it’s important to provide data that is as similar as possible to real data that will be processed.

For optimal accuracy, provided data must be labeled to help a model organize it into patterns that produce a predictive response.


Data labeling is the process of detecting and tagging data to provide a learning basis for future data processing. The more labeled data you have, the more gains you can see in accuracy over time.

Labeling is both a science and an art. Labels used to identify data features must be informative, discriminating, and independent in order to produce a quality algorithm. A properly labeled dataset provides a ground truth that a model uses to check its predictions for accuracy and to continue refining its algorithm. Errors in data labeling impair the quality of the training dataset and the performance of any predictive models for which it’s used.

With Deepgram, you can upload pre-labeled files or label audio data as needed using provided state-of-the-art labeling tools while keeping your data on-premise. Alternatively, you can leverage our in-house transcription team to convert your audio or transcripts into training-ready datasets.


When your data is properly labeled, you can use it to incrementally improve your model’s ability to predict speech. This process is iterative with each cycle updating weights and biases in the dataset and moving the model’s predictive ability closer to the truth.

Unlike traditional automatic speech recognition systems, which are trained by meticulously editing sub-components of a data pipeline, our deep neural network improves with each data set it receives. Continuously train your model with the voice of your customers, and it will improve identification of sounds, and subsequently the words in new audio submitted.

Periodically during the training process, you should test your model against data that has been labeled but never used for training and evaluate how it performs against data it has not yet seen. During this process, you can tune parameters and measure your model’s learning rate, both of which play a role in how accurate the model can become and how long training takes. And thanks to our GPU supercomputer data centers and patented DNN inference architecture, you can analyze thousands of audio streams simultaneously, train multiple models, and compare the results.


Once you’re happy with your trained model, it’s time to use your model to transcribe audio via our APIs. Deepgram provides access to our APIs via the Cloud or on premise. To learn more, see Deployment Models.


Continue to upload audio, label data, train and improve models, and deploy from one location using our console and APIs. Deepgram also provides ways to monitor usage and request API keys.