- Community
- Model
- whisper
Notes
Note
This model performs well for audio with a sample rate of 44.1kHz and 48kHz.
Introduction
Whisper-Large is a speech recognition model developed by OpenAI that uses large-scale weak supervision to predict transcripts of audio on the internet. The model is designed to generalize well to standard benchmarks and is often competitive with prior fully supervised results, approaching the accuracy and robustness of humans
Whisper
Whisper-Large is a part of the Whisper family of models and has 32 layers, 1280 width, 20 heads, and 1550M parameters. The model is fine-tuned on a subset of transcripts that do not include speaker annotations to avoid getting stuck in repeat loops.
Use Cases
Whisper-Large can be used for various speech recognition tasks, including transcription of audio recordings, voice commands, and speech-to-text translation. The model can be applied to different languages and accents, making it useful for multilingual applications.
Dataset
Whisper-Large is trained on a large-scale weakly supervised dataset that includes 680,000 hours of audio, covering 96 languages. The dataset also includes 125,000 hours of X→en translation data. The models trained on this dataset transfer well to existing datasets zero-shot, removing the need for any dataset-specific fine-tuning to achieve high-quality results
Advantages:
Whisper-Large has several advantages over traditional speech recognition models. The model can be trained on a large-scale weakly supervised dataset, which reduces the need for expensive and time-consuming manual annotation. The model can also be applied to different languages and accents, making it useful for multilingual applications. Finally, the model can be fine-tuned on a subset of transcripts that do not include speaker annotations, which avoids getting stuck in repeat loops.
Limitations:
Whisper-Large's performance is limited by the quality and quantity of the training data. The model still struggles with many languages and accents, and there are remaining errors that need to be addressed. The model's performance may also be affected by the quality of the audio recordings
Disclaimer
Please be advised that this model utilizes wrapped Artificial Intelligence (AI) provided by OpenAI (the "Vendor"). These AI models may collect, process, and store data as part of their operations. By using our website and accessing these AI models, you hereby consent to the data practices of the Vendor. We do not have control over the data collection, processing, and storage practices of the Vendor. Therefore, we cannot be held responsible or liable for any data handling practices, data loss, or breaches that may occur. It is your responsibility to review the privacy policies and terms of service of the Vendor to understand their data practices. You can access the Vendor's privacy policy and terms of service at https://openai.com/policies/privacy-policy. We disclaim all liability with respect to the actions or omissions of the Vendor, and we encourage you to exercise caution and to ensure that you are comfortable with these practices before utilizing the AI models hosted on our site.
- ID
- Namewhisper
- Model Type IDAudio To Text
- DescriptionAudio transcription model for converting speech audio to text
- Last UpdatedOct 17, 2024
- PrivacyPUBLIC
- License
- Share
- Badge