audio-trascription model | Clarifai

audio-transcription

Deepgram Nova-2 sets a new benchmark in speech-to-text with 30% lower error rates, and unmatched speed, making it the superior choice in automatic speech recognition.

No input available.

Notes

Introduction

Deepgram Nova-2, is the latest advancement in speech-to-text (STT) technology. Nova-2 is designed to set a new gold standard for ASR performance, offering groundbreaking accuracy, speed, and cost-effectiveness.

Nova-2 Speech-to-Text Model

Nova-2 is the most powerful speech-to-text model globally, boasting an average 30% reduction in word error rate (WER) compared to competitors, delivering superhuman transcription performance.
The model features an 18.4% reduction in WER from Nova-1, demonstrating the effectiveness of speech-specific optimizations, advanced data curation, and a multi-stage training methodology.
Nova-2 employs a novel Transformer-based architecture, enhancing accuracy in pre-recorded and streaming transcription of entities, punctuation, and capitalization.

Run Deepgram STT with an API

Running the API with Clarifai's Python SDK

You can run the Deepgram STT Model API using Clarifai’s Python SDK.

Export your PAT as an environment variable. Then, import and initialize the API Client.

Find your PAT in your security settings.

export CLARIFAI_PAT={your personal access token}

from clarifai.client.model import Model
from pydub import AudioSegment
from scipy.io import wavfile

# files
wav_file = "test.wav"
AUDIO_FILE_LOCATION = 'record_out+(3).mp3'

# convert any audio file format to .wav audio file
sound = AudioSegment.from_file(AUDIO_FILE_LOCATION)
sound.export(wav_file, format="wav")
samplerate, data = wavfile.read(wav_file)

with open(wav_file, "rb") as f:
file_bytes = f.read()inference_params = dict(sample_rate = samplerate, punctuate = True, model= 'nova-2' )

transcription_model = Model("https://clarifai.com/deepgram/transcribe/models/audio-trascription", pat="YOUR PAT")
model_prediction = transcription_model.predict_by_bytes(file_bytes, "audio", inference_params=inference_params)

# Print the transcribed output
print("Output: ", model_prediction.outputs[0].data.text.raw)

You can also run Deepgram SST using other Clarifai Client Libraries like Java, cURL, NodeJS, PHP, etc here.

Use Cases

Nova-2 is suitable for various voice applications, providing exceptional accuracy and speed across various contexts.
Nova-2 is versatile and excels in various applications such as podcast transcriptions, video/media captioning, meeting notes, and phone call transcriptions

Evaluation

Accuracy

Based on a comprehensive benchmarking methodology, nova-2 demonstrates a 30% lower error rate (WER) than competitors.
In pre-recorded inference mode, Nova-2 achieves a median WER of 8.4% across diverse audio domains, outperforming competitors by an average of 30%.

Speed

Nova-2 is the fastest model, with a median inference time of 29.8 seconds per hour of diarized audio, surpassing competitors by 5 to 40 times.

Dataset

Trained on the largest, most diverse dataset in Deepgram's history, curated from nearly 6 million resources, and 47 billion tokens and incorporating an extensive library of high-quality human transcriptions.
Utilizes a multi-stage training methodology for superior performance.

Advantages

Accuracy: Nova-2 offers a 30% reduction in WER, surpassing competitors and achieving superhuman transcription performance.
Speed: With a median inference time of 29.8 seconds per hour, Nova-2 is the fastest model in the market.

Limitations

Limited Languages: Nova-2 may not fully support all languages currently available in English.
Dependency on Data Quality: Performance may be impacted by the quality of input data, especially in real-world scenarios.

Disclaimer

Please be advised that this model utilizes wrapped Artificial Intelligence (AI) provided by Deepgram (the "Vendor"). These AI models may collect, process, and store data as part of their operations. By using our website and accessing these AI models, you hereby consent to the data practices of the Vendor. We do not have control over the data collection, processing, and storage practices of the Vendor. Therefore, we cannot be held responsible or liable for any data handling practices, data loss, or breaches that may occur. It is your responsibility to review the privacy policies and terms of service of the Vendor to understand their data practices. You can access the Vendor's privacy policy and terms of service at https://deepgram.com/privacy.

We disclaim all liability with respect to the actions or omissions of the Vendor, and we encourage you to exercise caution and to ensure that you are comfortable with these practices before utilizing the AI models hosted on our site.

ID
Model Type ID
Audio To Text
Input Type
audio
Output Type
text
Description
Deepgram Nova-2 sets a new benchmark in speech-to-text with 30% lower error rates, and unmatched speed, making it the superior choice in automatic speech recognition.
Last Updated
Oct 17, 2024
Privacy
PUBLIC
Use Case
License
Share
Badge