- Community
- Model
- audio-transcription
audio-transcription
Deepgram Nova-2 sets a new benchmark in speech-to-text with 30% lower error rates, and unmatched speed, making it the superior choice in automatic speech recognition.
Notes
Introduction
Deepgram Nova-2, is the latest advancement in speech-to-text (STT) technology. Nova-2 is designed to set a new gold standard for ASR performance, offering groundbreaking accuracy, speed, and cost-effectiveness.
Nova-2 Speech-to-Text Model
- Nova-2 is the most powerful speech-to-text model globally, boasting an average 30% reduction in word error rate (WER) compared to competitors, delivering superhuman transcription performance.
- The model features an 18.4% reduction in WER from Nova-1, demonstrating the effectiveness of speech-specific optimizations, advanced data curation, and a multi-stage training methodology.
- Nova-2 employs a novel Transformer-based architecture, enhancing accuracy in pre-recorded and streaming transcription of entities, punctuation, and capitalization.
Run Deepgram STT with an API
Running the API with Clarifai's Python SDK
You can run the Deepgram STT Model API using Clarifai’s Python SDK.
Export your PAT as an environment variable. Then, import and initialize the API Client.
Find your PAT in your security settings.
export CLARIFAI_PAT={your personal access token}
from clarifai.client.model import Model
from pydub import AudioSegment
from scipy.io import wavfile
# files
wav_file = "test.wav"
AUDIO_FILE_LOCATION = 'record_out+(3).mp3'
# convert any audio file format to .wav audio file
sound = AudioSegment.from_file(AUDIO_FILE_LOCATION)
sound.export(wav_file, format="wav")
samplerate, data = wavfile.read(wav_file)
with open(wav_file, "rb") as f:
file_bytes = f.read()inference_params = dict(sample_rate = samplerate, punctuate = True, model= 'nova-2' )
transcription_model = Model("https://clarifai.com/deepgram/transcribe/models/audio-trascription", pat="YOUR PAT")
model_prediction = transcription_model.predict_by_bytes(file_bytes, "audio", inference_params=inference_params)
# Print the transcribed output
print("Output: ", model_prediction.outputs[0].data.text.raw)
You can also run Deepgram SST using other Clarifai Client Libraries like Java, cURL, NodeJS, PHP, etc here.
Use Cases
- Nova-2 is suitable for various voice applications, providing exceptional accuracy and speed across various contexts.
- Nova-2 is versatile and excels in various applications such as podcast transcriptions, video/media captioning, meeting notes, and phone call transcriptions
Evaluation
Accuracy
- Based on a comprehensive benchmarking methodology, nova-2 demonstrates a 30% lower error rate (WER) than competitors.
- In pre-recorded inference mode, Nova-2 achieves a median WER of 8.4% across diverse audio domains, outperforming competitors by an average of 30%.
Speed
- Nova-2 is the fastest model, with a median inference time of 29.8 seconds per hour of diarized audio, surpassing competitors by 5 to 40 times.
Dataset
- Trained on the largest, most diverse dataset in Deepgram's history, curated from nearly 6 million resources, and 47 billion tokens and incorporating an extensive library of high-quality human transcriptions.
- Utilizes a multi-stage training methodology for superior performance.
Advantages
- Accuracy: Nova-2 offers a 30% reduction in WER, surpassing competitors and achieving superhuman transcription performance.
- Speed: With a median inference time of 29.8 seconds per hour, Nova-2 is the fastest model in the market.
Limitations
- Limited Languages: Nova-2 may not fully support all languages currently available in English.
- Dependency on Data Quality: Performance may be impacted by the quality of input data, especially in real-world scenarios.
Disclaimer
Please be advised that this model utilizes wrapped Artificial Intelligence (AI) provided by Deepgram (the "Vendor"). These AI models may collect, process, and store data as part of their operations. By using our website and accessing these AI models, you hereby consent to the data practices of the Vendor. We do not have control over the data collection, processing, and storage practices of the Vendor. Therefore, we cannot be held responsible or liable for any data handling practices, data loss, or breaches that may occur. It is your responsibility to review the privacy policies and terms of service of the Vendor to understand their data practices. You can access the Vendor's privacy policy and terms of service at https://deepgram.com/privacy.
We disclaim all liability with respect to the actions or omissions of the Vendor, and we encourage you to exercise caution and to ensure that you are comfortable with these practices before utilizing the AI models hosted on our site.
- ID
- Nameaudio-trascription
- Model Type IDAudio To Text
- DescriptionDeepgram Nova-2 sets a new benchmark in speech-to-text with 30% lower error rates, and unmatched speed, making it the superior choice in automatic speech recognition.
- Last UpdatedOct 17, 2024
- PrivacyPUBLIC
- Use Case
- License
- Share
- Badge