Fuyu-8B is an open-source, simplified multimodal architecture with a decoder-only transformer, supporting arbitrary image resolutions, and excelling in diverse applications, including question answering and complex visual understanding
The maximum number of tokens to generate. Shorter token lengths will provide faster performance.
A decimal number that determines the degree of randomness in the response
ResetModel loading...
Output
Notes
ID
Model Type ID
Multimodal To Text
Input Type
image
Output Type
text
Description
Fuyu-8B is an open-source, simplified multimodal architecture with a decoder-only transformer, supporting arbitrary image resolutions, and excelling in diverse applications, including question answering and complex visual understanding