Input

Prompt:

Press Ctrl + Enter to submit

Output

Notes

Note

This model has special pricing. Check out here

Introduction

Imagen is a text-to-image diffusion model that combines the power of transformer language models with high-fidelity diffusion models to generate photorealistic images that are aligned with the input text. The model is developed by Google Research, Brain Team, and is trained on text-only corpora using a frozen T5-XXL encoder to map input text into a sequence of embeddings. Imagen is designed to be used in various applications, including creative image generation and editing.

Imagen Model

Imagen is a text-to-image diffusion model that generates images from input text. The model comprises a frozen T5-XXL encoder to map input text into a sequence of embeddings and a 64x64 image diffusion model, followed by two super-resolution diffusion models for generating high-resolution images. The key finding behind Imagen is that text embeddings from large LMs, pretrained on text-only corpora, are remarkably effective for text-to-image synthesis.

Use Cases with Input Examples:

Creative Image Generation: Imagen can be used to generate creative and photorealistic images from textual descriptions.

Input “A beautiful sunset over the ocean with palm trees in the foreground”

Product Design: Imagen can be used to generate images of products that do not yet exist.

Input “A sleek and modern electric car with a panoramic sunroof and a range of 500 miles”.

Virtual Reality: Imagen can be used to generate images for virtual reality environments.

Input “A medieval castle with a moat and a drawbridge”.

Dataset Information:

The training data for Imagen was drawn from several pre-existing datasets of image and English alt-text pairs, including COCO, Open Images, and Conceptual Captions. A subset of this data was filtered to remove noise and undesirable content, such as pornographic imagery and toxic language. However, a recent audit of one of Imagen's data sources, LAION-400M, uncovered a wide range of inappropriate content, including pornographic imagery, racist slurs, and harmful social stereotypes.

Evaluation:

Imagen achieves a new state-of-the-art FID score of 7.27 on the COCO dataset, without ever training on COCO, and human raters find Imagen samples to be on par with the COCO data itself in image-text alignment. To assess text-to-image models in greater depth, Google Research introduces DrawBench, a comprehensive and challenging benchmark for text-to-image models. With DrawBench, Imagen is compared with recent methods including VQ-GAN+CLIP, Latent Diffusion Models, GLIDE, and DALL-E 2, and human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment.

Limitations:

Dataset Bias: Imagen's training data was drawn from pre-existing datasets of image and English alt-text pairs, which may contain biases and stereotypes. A recent audit of one of Imagen's data sources, LAION-400M, uncovered a wide range of inappropriate content, including pornographic imagery, racist slurs, and harmful social stereotypes. This dataset bias may lead to the model reproducing these associations and causing significant representational harm that would disproportionately impact individuals and communities already experiencing marginalization, discrimination, and exclusion within society.
Limited Image Fidelity: Imagen exhibits serious limitations when generating images depicting people. Human evaluations found Imagen obtains significantly higher preference rates when evaluated on images that do not portray people, indicating a degradation in image fidelity. This limitation may restrict the use cases of Imagen in certain applications.
Social and Cultural Biases: Imagen encodes several social biases and stereotypes, including an overall bias towards generating images of people with lighter skin tones and a tendency for images portraying different professions to align with Western gender stereotypes. Even when focusing generations away from people, Imagen encodes a range of social and cultural biases when generating images of activities, events, and objects. This limitation may lead to the model reproducing and perpetuating harmful stereotypes and biases.

Disclaimer

Please be advised that this model utilizes wrapped Artificial Intelligence (AI) provided by GCP (the "Vendor"). These AI models may collect, process, and store data as part of their operations. By using our website and accessing these AI models, you hereby consent to the data practices of the Vendor. We do not have control over the data collection, processing, and storage practices of the Vendor. Therefore, we cannot be held responsible or liable for any data handling practices, data loss, or breaches that may occur. It is your responsibility to review the privacy policies and terms of service of the Vendor to understand their data practices. You can access the Vendor's privacy policy and terms of service at https://cloud.google.com/privacy.

We disclaim all liability with respect to the actions or omissions of the Vendor, and we encourage you to exercise caution and to ensure that you are comfortable with these practices before utilizing the AI models hosted on our site.

ID
Model Type ID
Text To Image
Input Type
text
Output Type
image
Description
Generates Image based on given input text
Last Updated
Oct 17, 2024
Privacy
PUBLIC
Use Case
Toolkit
License
Share
Badge