Qwen2_5-VL-7B-Instruct

Qwen2.5-VL is a vision-language model designed for AI agents, finance, and commerce. It excels in visual recognition, reasoning, long video analysis, object localization, and structured data extraction.

Input

Prompt:

Press Ctrl + Enter to submit
The maximum number of tokens to generate. Shorter token lengths will provide faster performance.
A decimal number that determines the degree of randomness in the response
An alternative to sampling with temperature, where the model considers the results of the tokens with top_p probability mass.

Output

Notes

  • ID
  • Model Type ID
    Multimodal To Text
  • Input Type
    image
  • Output Type
    text
  • Description
    Qwen2.5-VL is a vision-language model designed for AI agents, finance, and commerce. It excels in visual recognition, reasoning, long video analysis, object localization, and structured data extraction.
  • Last Updated
    Mar 14, 2025
  • Privacy
    PUBLIC
  • License
  • Share
  • Badge
    Qwen2_5-VL-7B-Instruct