Explore The World's AI^TM Community

Everything you need to build your AI-powered apps. Discover, build and share AI models, workflows and app components; powered by Clarifai's low code, no code platform.

Image Recognition

Identifies a variety of concepts in images and video including objects, themes, and more. Trained with over 10,000 concepts and 20M images.

478

Model

gemma-3-4b-it

Gemma 3 (4B) is a multilingual, multimodal open model by Google, handling text and image inputs with a 128K context window. It excels in tasks like QA and summarization while being efficient for deployment on limited-resource devices.

Model

MiniCPM-o-2_6-language

MiniCPM-o-2d6-language is the latest series of end-side multimodal LLMs (MLLMs) upgraded from MiniCPM-V. The models can now take images, video, text, and audio as inputs and provide high-quality text output in an end-to-end fashion

Model

Qwen2_5-VL-7B-Instruct

Qwen2.5-VL is a vision-language model designed for AI agents, finance, and commerce. It excels in visual recognition, reasoning, long video analysis, object localization, and structured data extraction.

Model

Qwen2_5-Coder-7B-Instruct-vllm

Qwen2.5-Coder is a code-specific LLM series (0.5B–32B) with improved code generation, reasoning, and fixing. Trained on 5.5T tokens, the 32B model rivals GPT-4o in coding capabilities.

Model

phi-4

Phi-4 is a state-of-the-art open model trained on high-quality synthetic, public, and academic data for advanced reasoning. It uses fine-tuning and preference optimization for precise instruction adherence and safety.

Model

MiniCPM3-4B

MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable with many recent 7B~9B models.

Model

QwQ-32B-AWQ

QwQ is the reasoning model of the Qwen series, designed for enhanced problem-solving and downstream task performance. QwQ-32B competes with top reasoning models like DeepSeek-R1 and o1-mini.

Model

Phi-4-mini-instruct

Phi-4-mini-instruct is a lightweight open model from the Phi-4 family, optimized for reasoning with high-quality data. It supports a 128K context window and uses fine-tuning for precise instruction adherence and safety.

Model

Llama-3_2-3B-Instruct

Llama 3.2 (1B) is a multilingual, instruction-tuned LLM by Meta, optimized for dialogue, retrieval, and summarization. It uses an autoregressive transformer with SFT and RLHF for improved alignment and outperforms many industry models.

Model

DeepSeek-R1-Distill-Qwen-7B

DeepSeek-R1-Distill-Qwen-7B is a 7B-parameter dense model distilled from DeepSeek-R1 based on Qwen-7B.

Model

gpt-4o

GPT-4o is a multimodal AI model that excels in processing and generating text, audio, and images, offering rapid response times and improved performance across languages and tasks, while incorporating advanced safety features

Model

claude-3_5-sonnet

Claude 3.5 Sonnet is a high-speed, advanced AI model excelling in reasoning, knowledge, coding, and visual tasks, ideal for complex applications.

Model

llama-3_2-11b-vision-instruct

Llama-3.2-11B-Vision-Instruct is a multimodal LLM by Meta designed for visual reasoning, image captioning, and VQA tasks, supporting text + image inputs with 11B parameters

Model

pixtral-12b

Pixtral 12B is a natively multimodal model excelling in multimodal reasoning, instruction following, and text benchmarks with a 12B parameter architecture supporting variable image sizes and long context inputs

Model

got-ocr-2_0

The OCR-2.0 model (GOT) is a versatile and efficient optical character recognition system designed to handle diverse tasks, including text, formulas, and charts, through a unified end-to-end architecture.

Featured Workflows

Explore all workflows

Workflow

Demographics

Multi-model workflow that detects, crops, and recognizes demographic characteristics of faces. Visually classifies age, gender, and multi-culture characteristics.

Workflow

Face-Sentiment

Multi-model workflow that combines face detection and sentiment classification of 7 concepts: anger, disgust, fear, neutral, happiness, sadness, contempt, and surprise.

Workflow

General

A general image workflow that combines detection, classification, and embedding to identify general concepts including objects, themes, moods, etc.

Workflow

rag-agent-gpt4-turbo-React-few-shot

RAG Agent uses GPT-4 Turbo LLM model with ReAct prompting, optimizing dynamic reasoning and action planning.

Knowledge Center

Explore The World's AI^TM Community

Everything you need to build your AI-powered apps. Discover, build and share AI models, workflows and app components; powered by Clarifai's low code, no code platform.

Image Recognition

gemma-3-4b-it

MiniCPM-o-2_6-language

Qwen2_5-VL-7B-Instruct

Qwen2_5-Coder-7B-Instruct-vllm

phi-4

MiniCPM3-4B

QwQ-32B-AWQ

Phi-4-mini-instruct

Llama-3_2-3B-Instruct

DeepSeek-R1-Distill-Qwen-7B

gpt-4o

claude-3_5-sonnet

llama-3_2-11b-vision-instruct

pixtral-12b

got-ocr-2_0

Demographics

Face-Sentiment

General

rag-agent-gpt4-turbo-React-few-shot

Oops! Page Not Found.

...even AI can't find what you are looking for

Explore The World's AITM Community

Everything you need to build your AI-powered apps. Discover, build and share AI models, workflows and app components; powered by Clarifai's low code, no code platform.

Image Recognition

gemma-3-4b-it

MiniCPM-o-2_6-language

Qwen2_5-VL-7B-Instruct

Qwen2_5-Coder-7B-Instruct-vllm

phi-4

MiniCPM3-4B

QwQ-32B-AWQ

Phi-4-mini-instruct

Llama-3_2-3B-Instruct

DeepSeek-R1-Distill-Qwen-7B

gpt-4o

claude-3_5-sonnet

llama-3_2-11b-vision-instruct

pixtral-12b

got-ocr-2_0

Demographics

Face-Sentiment

General

rag-agent-gpt4-turbo-React-few-shot

Oops! Page Not Found.

...even AI can't find what you are looking for

Explore The World's AI^TM Community