• Community
  • Model
  • text-embedding-3-small

text-embedding-3-small

The text-embedding-3-small is a highly efficient, flexible model with improved performance over its predecessor, text-embedding-ada-002, in various natural language processing tasks.

Notes

Introduction

Embeddings are critical in representing concepts within content such as natural language or code in a form that machine learning models can understand. The text-embedding-3-small is a highly efficient model designed to improve performance over its predecessor, the text-embedding-ada-002, released in December 2022.

text-embedding-3-small Model

The text-embedding-3-small model is a new, highly efficient embedding model designed to significantly upgrade over its predecessor, the text-embedding-ada-002 model. It converts text into a sequence of numbers, representing the underlying concepts effectively for machine learning applications.

Run Openai embedding Model with an API

Running the API with Clarifai's Python SDK

You can run the Embedding-3-small Model Model API using Clarifai’s Python SDK.

Export your PAT as an environment variable. Then, import and initialize the API Client.

Find your PAT in your security settings.

export CLARIFAI_PAT={your personal access token}
from clarifai.client.model import Model

text = '''In India Green Revolution commenced in the early 1960s that led to an increase in food grain production, especially in Punjab, Haryana, and Uttar Pradesh. Major milestones in this undertaking were the development of high-yielding varieties of wheat. The Green revolution is revolutionary in character due to the introduction of new technology, new ideas, the new application of inputs like HYV seeds, fertilizers, irrigation water, pesticides, etc. As all these were brought suddenly and spread quickly to attain dramatic results thus it is termed as a revolution in green agriculture.
'''

# The number of dimensions the resulting output embeddings should have
inference_params = dict(dimensions = 1024)

# Model Predict
model_prediction = Model("https://clarifai.com/openai/embed/models/text-embedding-3-small").predict_by_bytes(text.encode(), "text", inference_params=inference_params)
# print(model_prediction.outputs[0].data.text.raw)

embeddings = model_prediction.outputs[0].data.embeddings[0].vector

num_dimensions= model_prediction.outputs[0].data.embeddings[0].num_dimensions

You can also run Openai Embedding API using other Clarifai Client Libraries like Java, cURL, NodeJS, PHP, etc here.

Comparison with text-embedding-ada-002

The text-embedding-3-small outperforms the older text-embedding-ada-002 model, showcasing better performance in benchmarks. Notably, it achieves an average score of 44.0% on the MIRACL benchmark (up from 31.4%) and 62.3% on the MTEB benchmark (up from 61.0%).

Performance Comparison

Eval Benchmarkada v2text-embedding-3-smalltext-embedding-3-large
MIRACL average31.444.054.9
MTEB average61.062.364.6

Use Cases

Embeddings are commonly used for:

  • Search (where results are ranked by relevance to a query string)
  • Clustering (where text strings are grouped by similarity)
  • Recommendations (where items with related text strings are recommended)
  • Anomaly detection (where outliers with little relatedness are identified)
  • Diversity measurement (where similarity distributions are analyzed)
  • Classification (where text strings are classified by their most similar label)

Evaluation

The model was evaluated using benchmarks like MIRACL and MTEB, where it demonstrated significant improvements over its predecessor.

Benchmarktext-embedding-ada-002text-embedding-3-small
MIRACL average31.444.0
MTEB average61.062.3

Advantages

  • Stronger Performance: text-embedding-3-small outperforms text-embedding-ada-002, demonstrating improved average scores on benchmarks.
  • Native Support for Shortening Embeddings: The model allows developers to adjust the embedding size (dimensions) without significantly losing conceptual representation.

Native Support for Shortening Embeddings

text-embedding-3-small introduces a novel feature that allows developers to shorten embeddings without significantly losing the concept-representing properties. This is particularly useful in scenarios where resource constraints are a concern, such as when dealing with vector data stores with dimensional limitations.

Embedding Shortening Performance

ModelEmbedding SizeAverage MTEB Score
ada v2153661.0
text-embedding-3-small51261.6
text-embedding-3-large153664.6

This flexibility enables developers to use text-embedding-3-large even in environments with dimensional constraints, ensuring access to the model's high performance while managing resource usage effectively.

Limitations

  • Trade-off Between Size and Accuracy: While the model allows size adjustments, there is a trade-off between embedding size and accuracy.
  • ID
  • Name
    text-embedding-3-small
  • Model Type ID
    Text Embedder
  • Description
    The text-embedding-3-small is a highly efficient, flexible model with improved performance over its predecessor, text-embedding-ada-002, in various natural language processing tasks.
  • Last Updated
    Oct 17, 2024
  • Privacy
    PUBLIC
  • Use Case
  • Toolkit
  • License
  • Share
    • Badge
      text-embedding-3-small