speech-synthesis

Create realistic speech and voices using the robust Text to Speech and Voice Cloning model

No input available.

Notes

Currently, this model can only be used via API. The “try your own input” and “Use model” buttons on this screen will NOT work.

Introduction

In the world of cutting-edge AI technology, the Text to Speech (TTS) capabilities have reached new heights with the powerful Generative AI model developed by ElevenLabs. This model revolutionizes audio generation by delivering the highest-quality TTS capabilities, seamlessly incorporating context, emotions, and customization. Whether it's storytelling or content creation, this model promises to redefine the way we interact with AI-generated audio.

ElevenLabs Text to Speech Model

The Text to Speech model developed by ElevenLabs leverages a powerful generative AI framework to convert written text into lifelike audio with remarkable accuracy. By understanding the context in which each sentence is situated, the model ensures a natural and convincing delivery, enhancing the overall audio experience.

Run Elevenlabs text-to-speech Model with an API

Running the API with Clarifai's Python SDK

You can run the Elevenlabs text-to-speech Model API using Clarifai’s Python SDK.

Export your PAT as an environment variable. Then, import and initialize the API Client.

Find your PAT in your security settings.

export CLARIFAI_PAT={your personal access token}

from clarifai.client.model import Model

input = "I love your product very much"

api_key = ElevenLabs_API_KEY

inference_params = dict(voice-id="EXAVITQu4vr4xnSDxMaL", model_id="eleven_multilingual_v2", stability= 0.5, similarity_boost= 0.5, style=0,use_speaker_boost=True, api_key = api_key)

# Model Predict
model_prediction = Model("https://clarifai.com/eleven-labs/audio-generation/models/speech-synthesis").predict_by_bytes(input.encode(), input_type="text", inference_params=inference_params)

output_base64 = model_prediction.outputs[0].data.audio.base64

with open('audio_file3.wav', 'wb') as f:
f.write(output_base64)

You can also run Elevenlabs text-to-speech or tts API using other Clarifai Client Libraries like Java, cURL, NodeJS, PHP, etc here.

Inference Parameter

voice-id: Voice ID to be used, you can use https://api.elevenlabs.io/v1/voices to list all the available voices
model_id: Identifier of the model that will be used from eleven_multilingual_v2, eleven_multilingual_v1 and eleven_monolingual_v1
stability: The stability parameter determines how stable the voice is and the randomness between each generation. Lowering this parameter introduces a broader emotional range for the voice. As mentioned before, this is also influenced heavily by the original voice. Setting the parameter too low may result in odd performances that are overly random and cause the character to speak too quickly. On the other hand, setting it too high can lead to a monotonous voice with limited emotion.
similarity_boost: The similarity parameter dictates how closely the AI should adhere to the original voice when attempting to replicate it. If the original audio is of poor quality and the similarity slider is set too high, the AI may reproduce artifacts or background noise when trying to mimic the voice if those were present in the original recording.
style: This setting attempts to amplify the style of the original speaker. It does consume additional computational resources and might increase latency if set to anything other than 0.
use_speaker_boost: It boosts the similarity to the original speaker. However, using this setting requires a slightly higher computational load, which in turn increases latency.

Please be advised that this model utilizes wrapped Artificial Intelligence (AI) provided by [ElevenLabs] (the "Vendor"). These AI models may collect, process, and store data as part of their operations. By using our website and accessing these AI models, you hereby consent to the data practices of the Vendor. We do not have control over the data collection, processing, and storage practices of the Vendor. Therefore, we cannot be held responsible or liable for any data handling practices, data loss, or breaches that may occur. It is your responsibility to review the privacy policies and terms of service of the Vendor to understand their data practices. You can access the Vendor's privacy policy and terms of service at ElevenLab. We disclaim all liability with respect to the actions or omissions of the Vendor, and we encourage you to exercise caution and to ensure that you are comfortable with these practices before utilizing the AI models hosted on our site.

Disclaimer

Please be advised that this model utilizes wrapped Artificial Intelligence (AI) provided by Elevenlabs (the "Vendor"). These AI models may collect, process, and store data as part of their operations. By using our website and accessing these AI models, you hereby consent to the data practices of the Vendor. We do not have control over the data collection, processing, and storage practices of the Vendor. Therefore, we cannot be held responsible or liable for any data handling practices, data loss, or breaches that may occur. It is your responsibility to review the privacy policies and terms of service of the Vendor to understand their data practices. You can access the Vendor's privacy policy and terms of service at https://elevenlabs.io/privacy.

We disclaim all liability with respect to the actions or omissions of the Vendor, and we encourage you to exercise caution and to ensure that you are comfortable with these practices before utilizing the AI models hosted on our site.

ID
Model Type ID
Text To Audio
Input Type
text
Output Type
audio
Description
Create realistic speech and voices using the robust Text to Speech and Voice Cloning model
Last Updated
Oct 17, 2024
Privacy
PUBLIC
License
Share
Badge