cohere-embed-english-v3_0 model | Clarifai

cohere-embed-english-v3_0

Cohere Embed-v3 is a state-of-the-art embedding model excels in semantic search and retrieval-augmentation generation systems with enhanced content quality assessment and efficiency.

No input available.

Notes

Cohere Embed-v3

Cohere Embed-v3 is the latest embedding model by Cohere. It comes in English versions with dimensions of 1024. As of October 2023, Embed-v3 demonstrates state-of-the-art performance on the Massive Text Embedding Benchmark (MTEB) and excels in zero-shot dense retrieval on the BEIR dataset.

Run Cohere Emebedding English v3 Model with an API

Running the API with Clarifai's Python SDK

You can run the Cohere Emebedding English v3 Model Model API using Clarifai’s Python SDK.

Export your PAT as an environment variable. Then, import and initialize the API Client.

Find your PAT in your security settings.

export CLARIFAI_PAT={your personal access token}

from clarifai.client.model import Model

text = '''In India Green Revolution commenced in the early 1960s that led to an increase in food grain production, especially in Punjab, Haryana, and Uttar Pradesh. Major milestones in this undertaking were the development of high-yielding varieties of wheat. The Green revolution is revolutionary in character due to the introduction of new technology, new ideas, the new application of inputs like HYV seeds, fertilizers, irrigation water, pesticides, etc. As all these were brought suddenly and spread quickly to attain dramatic results thus it is termed as a revolution in green agriculture.
'''

# Model Predict
model_prediction = Model("https://clarifai.com/cohere/embed/models/cohere-embed-english-v3_0").predict_by_bytes(text.encode(), "text")
# print(model_prediction.outputs[0].data.text.raw)

embeddings = model_prediction.outputs[0].data.embeddings[0].vector

num_dimensions= model_prediction.outputs[0].data.embeddings[0].num_dimensions

You can also run Cohere Emebedding English v3 API using other Clarifai Client Libraries like Java, cURL, NodeJS, PHP, etc here.

Using cURL to Make a Direct HTTP Call

To make a direct HTTP call to the Cohere Emebedding English v3 API using cURL, you can use the following command:

curl -X POST "https://api.clarifai.com/v2/users/cohere/apps/embed/models/cohere-embed-english-v3_0/versions/e2dd848faf454fbda85c26cf89c4926e/outputs" \
    -H "Authorization: Key YOUR_PAT_HERE" \
    -H "Content-Type: application/json" \
    -d '{
    "inputs": [
        {
            "data": {
                "text": {
                    "raw": "Give me an exotic yet tasty recipe for some noodle dish"
                }
            }
        }
    ],
    "model": {
        "model_version": {
            "output_info": {
                "params": {
                    "input_type":"search_query"
                }
            }
        }
    }
}'

Input Type

The new models require the specification of an "input_type" parameter, which can be set to one of the following values and the default is set to search_document:

search_document: For texts (documents) intended to be stored in your vector database.
search_query: For search queries to find the most relevant documents in your vector database.
classification: If you use the embeddings as input for a classification system.
clustering: If you use the embeddings for text clustering.

Improvements

Cohere Embed-english-v3 introduces several key improvements:

Enhanced Query Matching: This model has the ability to evaluate how well a query matches a document's topic and assesses the overall quality of content. It ranks the highest-quality documents at the top, which is particularly useful for dealing with noisy datasets.
Compression-Aware Training: The model incorporates a compression-aware training method, which significantly reduces the cost of running a vector database. This allows for efficient handling of billions of embeddings without significantly increasing cloud infrastructure expenses.

Use Cases

Cohere Embed-english-v3 is highly versatile and can be used in various applications, including but not limited to:

Retrieval-Augmentation Generation (RAG) Systems: The model can improve retrieval for RAG systems, allowing them to provide comprehensive and relevant information by retrieving and augmenting data from relevant conversations.
Improving Search Applications: It is beneficial for enhancing search applications that deal with real-world, noisy data.

Evaluation

The model's performance is evaluated using several metrics and benchmarks:

Massive Text Embedding Benchmark (MTEB): Cohere Embed-english-v3 achieves state-of-the-art performance among 90+ models on MTEB, which assesses classification, clustering, pair classification, re-ranking, retrieval, STS (semantic textual similarity), and summarization across 56 datasets. All evaluation results can be found in the embed v3.0 evaluation spreadsheet.

BEIR (Out-of-Domain Information Retrieval): The model excels in out-of-domain information retrieval, which is a critical indicator for embedding models. BEIR focuses on this aspect and highlights the model's performance on 14 publicly available datasets. All results can be viewed in our BEIR eval spreadsheet.

Disclaimer

Please be advised that this model utilizes wrapped Artificial Intelligence (AI) provided by Cohere (the "Vendor"). These AI models may collect, process, and store data as part of their operations. By using our website and accessing these AI models, you hereby consent to the data practices of the Vendor. We do not have control over the data collection, processing, and storage practices of the Vendor. Therefore, we cannot be held responsible or liable for any data handling practices, data loss, or breaches that may occur. It is your responsibility to review the privacy policies and terms of service of the Vendor to understand their data practices. You can access the Vendor's privacy policy and terms of service at https://cohere.city/privacy-policy/.

We disclaim all liability with respect to the actions or omissions of the Vendor, and we encourage you to exercise caution and to ensure that you are comfortable with these practices before utilizing the AI models hosted on our site.

ID
Model Type ID
Text Embedder
Input Type
text
Output Type
embeddings
Description
Cohere Embed-v3 is a state-of-the-art embedding model excels in semantic search and retrieval-augmentation generation systems with enhanced content quality assessment and efficiency.
Last Updated
Oct 17, 2024
Privacy
PUBLIC
Use Case
Toolkit
License
Share
Badge