Home
PlaygroundNEW
ComputeNEW
Community
HELPContact UsDocumentationQuick Start GuideAPI StatusJoin our Discord ChannelProduct Roadmap
Explore
Models
Workflows
Apps / Templates
Modules

Explore The World's AI™

Use any open and closed source models and workflows from leading partners and the community.

🔥 Trending Models

general-image-recognition

Identifies a variety of concepts in images and video including objects, themes, and more. Trained with over 10,000 concepts and 20M images.

Llama-3_2-3B-Instruct

Llama 3.2 (1B) is a multilingual, instruction-tuned LLM by Meta, optimized for dialogue, retrieval, and summarization. It uses an autoregressive transformer with SFT and RLHF for improved alignment and outperforms many industry models.

gemma-3-4b-it

Gemma 3 (4B) is a multilingual, multimodal open model by Google, handling text and image inputs with a 128K context window. It excels in tasks like QA and summarization while being efficient for deployment on limited-resource devices.

MiniCPM-o-2_6-language

MiniCPM-o-2d6-language is the latest series of end-side multimodal LLMs (MLLMs) upgraded from MiniCPM-V. The models can now take images, video, text, and audio as inputs and provide high-quality text output in an end-to-end fashion

Qwen2_5-VL-7B-Instruct

Qwen2.5-VL is a vision-language model designed for AI agents, finance, and commerce. It excels in visual recognition, reasoning, long video analysis, object localization, and structured data extraction.

Qwen2_5-Coder-7B-Instruct-vllm

Qwen2.5-Coder is a code-specific LLM series (0.5B–32B) with improved code generation, reasoning, and fixing. Trained on 5.5T tokens, the 32B model rivals GPT-4o in coding capabilities.

phi-4

Phi-4 is a state-of-the-art open model trained on high-quality synthetic, public, and academic data for advanced reasoning. It uses fine-tuning and preference optimization for precise instruction adherence and safety.

MiniCPM3-4B

MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable with many recent 7B~9B models.

QwQ-32B-AWQ

QwQ is the reasoning model of the Qwen series, designed for enhanced problem-solving and downstream task performance. QwQ-32B competes with top reasoning models like DeepSeek-R1 and o1-mini.

phi-4-mini-instruct

Phi-4-mini-instruct is a lightweight open model from the Phi-4 family, optimized for reasoning with high-quality data. It supports a 128K context window and uses fine-tuning for precise instruction adherence and safety.

DeepSeek-R1-Distill-Qwen-7B

DeepSeek-R1-Distill-Qwen-7B is a 7B-parameter dense model distilled from DeepSeek-R1 based on Qwen-7B.

gpt-4o

GPT-4o is a multimodal AI model that excels in processing and generating text, audio, and images, offering rapid response times and improved performance across languages and tasks, while incorporating advanced safety features

claude-3_5-sonnet

Claude 3.5 Sonnet is a high-speed, advanced AI model excelling in reasoning, knowledge, coding, and visual tasks, ideal for complex applications.

pixtral-12b

Pixtral 12B is a natively multimodal model excelling in multimodal reasoning, instruction following, and text benchmarks with a 12B parameter architecture supporting variable image sizes and long context inputs
Model display image

got-ocr-2_0

The OCR-2.0 model (GOT) is a versatile and efficient optical character recognition system designed to handle diverse tasks, including text, formulas, and charts, through a unified end-to-end architecture.

o4-mini

o4-mini model from openAI.

o3

OpenAI o3 reasoning multimodal.

grok-3

--

Large Language Models

SORT BY
Last Updated
Last Created
Model Name
Model display image

llama-3_2-11b-vision-instruct

Llama-3.2-11B-Vision-Instruct is a multimodal LLM by Meta designed for visual reasoning, image captioning, and VQA tasks, supporting text + image inputs with 11B parameters
Model display image

DeepSeek-R1-Distill-Qwen-32B

DeepSeek-R1-Distill-Qwen-32B is a 32B-parameter dense model distilled from DeepSeek-R1 based on Qwen-32B.

llama-3_3-70b-instruct

Llama 3.3 (70B) is a multilingual instruction-tuned LLM optimized for dialogue, trained on 15T+ tokens, supporting 8 languages, and incorporating strong safety measures
Model display image

deepseek-coder-33b-instruct

DeepSeek-Coder-33B-Instruct model is a SOTA 33 billion parameter code generation model, fine-tuned on 2 billion tokens of instruction data, offering superior performance in code completion and infilling tasks across more than 80 programming languages.

minicpm-o-2_6

MiniCPM-o is the latest series of end-side multimodal LLMs (MLLMs) ungraded from MiniCPM-V. The models can now take images, video, text, and audio as inputs and provide high-quality text and speech outputs in an end-to-end fashion
Model display image

DeepSeek-R1-Distill-Qwen-1_5B

DeepSeek-R1-Distill-Qwen-1_5B is a 1.5B-parameter dense model distilled from DeepSeek-R1 based on Qwen-1.5B.

Vision Language Models

SORT BY
Last Updated
Last Created
Model Name
Model display image

gemini-2_0-flash

Gemini 2.0 Flash is a fast, low-latency multimodal model with enhanced performance and new capabilities
Model display image

gemini-2_0-flash-lite

Gemini 2.0 Flash-Lite is our fastest and most cost efficient Flash model. It's an upgrade path for 1.5 Flash users who want better quality for the same price and speed.
Model display image

llama-3_2-11b-vision-instruct

Llama-3.2-11B-Vision-Instruct is a multimodal LLM by Meta designed for visual reasoning, image captioning, and VQA tasks, supporting text + image inputs with 11B parameters

minicpm-o-2_6

MiniCPM-o is the latest series of end-side multimodal LLMs (MLLMs) ungraded from MiniCPM-V. The models can now take images, video, text, and audio as inputs and provide high-quality text and speech outputs in an end-to-end fashion
Model display image

llava-1_5-7b

LLaVA-1.5 is a state-of-the-art language vision model that represents a significant advancement in the field of multimodal artificial intelligence
Model display image

florence-2-large

Florence-2-large is a lightweight, versatile vision-language model by Microsoft, excelling in multiple tasks using a unified representation and the extensive FLD-5B dataset

Popular Workflows

SORT BY
Last Updated
Last Created
Model Name
Model display image

General

A general image workflow that combines detection, classification, and embedding to identify general concepts including objects, themes, moods, etc.
Model display image

rag-agent-gpt4-turbo-React-few-shot

RAG Agent uses GPT-4 Turbo LLM model with ReAct prompting, optimizing dynamic reasoning and action planning.

Face-Sentiment

Multi-model workflow that combines face detection and sentiment classification of 7 concepts: anger, disgust, fear, neutral, happiness, sadness, contempt, and surprise.

Demographics

Multi-model workflow that detects, crops, and recognizes demographic characteristics of faces. Visually classifies age, gender, and multi-culture characteristics.
© 2025 Clarifai. All rights reserved.This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
  • Clarifai Docs
  • System Status