- Community
- Model
- text-embedding-3-large
Notes
Introduction
The text-embedding-3-large model is a larger text-embedding model designed to represent concepts within content such as natural language or code. It generates embeddings with up to 3072 dimensions, offering stronger performance compared to its predecessor, text-embedding-ada-002.
text-embedding-3-large Model
- Dimensions: Up to 3072 dimensions.
- Performance Improvement: On MIRACL, the average score has increased from 31.4% to 54.9%, and on MTEB, the average score has increased from 61.0% to 64.6% compared to text-embedding-ada-002.
Run Openai embedding Model with an API
Running the API with Clarifai's Python SDK
You can run the Embedding-3-large Model Model API using Clarifai’s Python SDK.
Export your PAT as an environment variable. Then, import and initialize the API Client.
Find your PAT in your security settings.
export CLARIFAI_PAT={your personal access token}
from clarifai.client.model import Model
text = '''In India Green Revolution commenced in the early 1960s that led to an increase in food grain production, especially in Punjab, Haryana, and Uttar Pradesh. Major milestones in this undertaking were the development of high-yielding varieties of wheat. The Green revolution is revolutionary in character due to the introduction of new technology, new ideas, the new application of inputs like HYV seeds, fertilizers, irrigation water, pesticides, etc. As all these were brought suddenly and spread quickly to attain dramatic results thus it is termed as a revolution in green agriculture.
'''
# The number of dimensions the resulting output embeddings should have
inference_params = dict(dimensions = 1024)
# Model Predict
model_prediction = Model("https://clarifai.com/openai/embed/models/text-embedding-3-large").predict_by_bytes(text.encode(), "text", inference_params=inference_params)
# print(model_prediction.outputs[0].data.text.raw)
embeddings = model_prediction.outputs[0].data.embeddings[0].vector
num_dimensions= model_prediction.outputs[0].data.embeddings[0].num_dimensions
You can also run Openai Embedding API using other Clarifai Client Libraries like Java, cURL, NodeJS, PHP, etc here.
Use Cases
Embeddings are commonly used for:
- Search (where results are ranked by relevance to a query string)
- Clustering (where text strings are grouped by similarity)
- Recommendations (where items with related text strings are recommended)
- Anomaly detection (where outliers with little relatedness are identified)
- Diversity measurement (where similarity distributions are analyzed)
- Classification (where text strings are classified by their most similar label)
Evaluation
The performance of text-embedding-3-large has been rigorously evaluated against benchmarks:
- MIRACL: Achieved an average score increase from 31.4% to 54.9%, compared to text-embedding-ada-002.
- MTEB: Showed an improvement from 61.0% to 64.6% over text-embedding-ada-002.
Eval Benchmark Scores
Model | MIRACL Average | MTEB Average |
---|---|---|
ada v2 | 31.4 | 61.0 |
text-embedding-3-small | 44.0 | 62.3 |
text-embedding-3-large | 54.9 | 64.6 |
Advantages
- High-Dimensional Embeddings: Offers up to 3072 dimensions, providing richer and more nuanced text representations.
- Improved Performance: Demonstrates superior performance on benchmark evaluations.
- Flexible Usage: Developers can adjust the dimensionality of embeddings to balance between performance and resource constraints.
Native Support for Shortening Embeddings
text-embedding-3-large introduces a novel feature that allows developers to shorten embeddings without significantly losing the concept-representing properties. This is particularly useful in scenarios where resource constraints are a concern, such as when dealing with vector data stores with dimensional limitations.
Embedding Shortening Performance
Model | Embedding Size | Average MTEB Score |
---|---|---|
ada v2 | 1536 | 61.0 |
text-embedding-3-small | 512 | 61.6 |
text-embedding-3-large | 1536 | 64.6 |
text-embedding-3-large | 256 | 62.0 |
text-embedding-3-large | 1024 | 64.1 |
text-embedding-3-large | 3072 | 64.6 |
This flexibility enables developers to use text-embedding-3-large even in environments with dimensional constraints, ensuring access to the model's high performance while managing resource usage effectively.
- ID
- Nametext-embedding-3-large
- Model Type IDText Embedder
- Descriptiontext-embedding-3-large is a high-performance, flexible text embedding model with up to 3072 dimensions, outperforming its predecessor
- Last UpdatedOct 17, 2024
- PrivacyPUBLIC
- Use Case
- Toolkit
- License
- Share
- Badge
Concept | Date |
---|