cohere-embed-multilingual-v3_0 model | Clarifai

cohere-embed-multilingual-v3_0

Cohere Embed-multilingual-v3 is a versatile embedding model designed for multilingual applications, offering state-of-the-art performance across various languages

No input available.

Notes

Cohere Embed-v3

Cohere Embed-v3 is the latest embedding model by Cohere. It comes in both English and multilingual versions, with choices of 1024 dimensions.. As of October 2023, Embed-v3 demonstrates state-of-the-art performance on the Massive Text Embedding Benchmark (MTEB) and excels in zero-shot dense retrieval on the BEIR dataset.

Run Cohere Emebedding Multilingual v3 Model with an API

Running the API with Clarifai's Python SDK

You can run the Cohere Embed-multilingual-v3 Model API using Clarifai’s Python SDK.

Export your PAT as an environment variable. Then, import and initialize the API Client.

Find your PAT in your security settings.

export CLARIFAI_PAT={your personal access token}

from clarifai.client.model import Model

text = '''In India Green Revolution commenced in the early 1960s that led to an increase in food grain production, especially in Punjab, Haryana, and Uttar Pradesh. Major milestones in this undertaking were the development of high-yielding varieties of wheat. The Green revolution is revolutionary in character due to the introduction of new technology, new ideas, the new application of inputs like HYV seeds, fertilizers, irrigation water, pesticides, etc. As all these were brought suddenly and spread quickly to attain dramatic results thus it is termed as a revolution in green agriculture.
'''

# Model Predict
model_prediction = Model("https://clarifai.com/cohere/embed/models/cohere-embed-multilingual-v3_0").predict_by_bytes(text.encode(), "text")
# print(model_prediction.outputs[0].data.text.raw)

embeddings = model_prediction.outputs[0].data.embeddings[0].vector

num_dimensions= model_prediction.outputs[0].data.embeddings[0].num_dimensions

You can also run Cohere Emebedding multilingual v3 API using other Clarifai Client Libraries like Java, cURL, NodeJS, PHP, etc here.

Using cURL to Make a Direct HTTP Call

To make a direct HTTP call to the Cohere Embed-multilingual-v3 API using cURL, you can use the following command:

curl -X POST "https://api.clarifai.com/v2/users/cohere/apps/embed/models/cohere-embed-multilingual-v3_0/versions/e2dd848faf454fbda85c26cf89c4926e/outputs" \
-H "Authorization: Key YOUR_PAT_HERE" \
-H "Content-Type: application/json" \
-d '{
"inputs": [
{
"data": {
"text": {
"raw": "Give me an exotic yet tasty recipe for some noodle dish"
}
}
}
],
"model": {
"model_version": {
"output_info": {
"params": {
"input_type":"search_query"
}
}
}
}
}'

Input Type

The new models require the specification of an "input_type" parameter, which can be set to one of the following values and the default is set to search_document:

search_document: For texts (documents) intended to be stored in your vector database.
search_query: For search queries to find the most relevant documents in your vector database.
classification: If you use the embeddings as input for a classification system.
clustering: If you use the embeddings for text clustering.

Improvements

Cohere Embed-multilingual-v3 introduces several key improvements:

Enhanced Query Matching: This model has the ability to evaluate how well a query matches a document's topic and assesses the overall quality of content. It ranks the highest-quality documents at the top, which is particularly useful for dealing with noisy datasets.
Compression-Aware Training: The model incorporates a compression-aware training method, which significantly reduces the cost of running a vector database. This allows for efficient handling of billions of embeddings without significantly increasing cloud infrastructure expenses.

Use Cases

Cohere Embed-multilingual-v3 is highly versatile and can be used in various applications, including but not limited to:

Retrieval-Augmentation Generation (RAG) Systems: The model can improve retrieval for RAG systems, allowing them to provide comprehensive and relevant information by retrieving and augmenting data from relevant conversations.
Improving Search Applications: It is beneficial for enhancing search applications that deal with real-world, noisy data.

Evaluation

The model's performance is evaluated using several metrics and benchmarks:

Massive Text Embedding Benchmark (MTEB): Cohere Embed Multilingual v3 model is ranked first among multilingual models. All evaluation results can be found in the embed v3.0 evaluation spreadsheet.

MIRACL (Semantic Search Across 100+ Languages): The multilingual version of Embed-v3 is highly performant across 100+ languages, making it suitable for applications involving multiple languages, such as semantic search and content moderation.

Disclaimer

Please be advised that this model utilizes wrapped Artificial Intelligence (AI) provided by Cohere (the "Vendor"). These AI models may collect, process, and store data as part of their operations. By using our website and accessing these AI models, you hereby consent to the data practices of the Vendor. We do not have control over the data collection, processing, and storage practices of the Vendor. Therefore, we cannot be held responsible or liable for any data handling practices, data loss, or breaches that may occur. It is your responsibility to review the privacy policies and terms of service of the Vendor to understand their data practices. You can access the Vendor's privacy policy and terms of service at https://cohere.city/privacy-policy.

We disclaim all liability with respect to the actions or omissions of the Vendor, and we encourage you to exercise caution and to ensure that you are comfortable with these practices before utilizing the AI models hosted on our site.

ID
Model Type ID
Text Embedder
Input Type
text
Output Type
embeddings
Description
Cohere Embed-multilingual-v3 is a versatile embedding model designed for multilingual applications, offering state-of-the-art performance across various languages
Last Updated
Oct 17, 2024
Privacy
PUBLIC
Use Case
Toolkit
License
Share
Badge