A developer’s guide to getting started with Imagen 3 on Vertex AI

4 months ago 28
News Banner

Looking for an Interim or Fractional CTO to support your business?

Read more

Over the past few months, early users put Imagen 3 on Vertex AI through its paces and shared valuable insights with us. It’s clear that users want an AI model that generates stunning visuals and empowers your practical creative applications. We’ve used their feedback to identify three common themes:

  • Demand for unparalleled quality across diverse artistic styles and formats

  • Desire for strong prompt adherence and fast image generation

  • Controls to protect and build trust with SynthID watermarking and advanced safety filters

Throughout this post, we will walk you through each of these concepts in depth. We will also provide some code examples and best prompt practices so you can get the most out of Imagen 3. 

Uncompromising quality and versatility

Imagen 3 sets a new standard in quality and control over your generated images. This text-to-image model produces photorealistic visuals with exceptional composition, sharpness, color accuracy, and resolution. With Imagen 3, you can explore a wider spectrum of artistic styles and formats. From photorealistic masterpieces to whimsical claymation scenes, the model's expanded range of styles and formats provides the tools to express your unique artistic vision. 

To demonstrate these photorealistic capabilities, let’s walk through an example of creating image mockups for a new cookbook cover. Using the following prompt, the generated image has incredible detail, composition and photorealism.

code_block <ListValue: [StructValue([('code', 'import vertexai\r\nfrom vertexai.preview.vision_models import ImageGenerationModel\r\n\r\n# TODO(developer): Update and un-comment below lines\r\n# project_id = "PROJECT_ID"\r\n\r\nvertexai.init(project=PROJECT_ID, location="us-central1")\r\n\r\ngeneration_model = ImageGenerationModel.from_pretrained("imagen-3.0-generate-001")\r\n\r\nprompt = """\r\nA photorealistic image of a cookbook laying on a wooden kitchen table, the cover facing forward featuring a smiling family sitting at a similar table, soft overhead lighting illuminating the scene, the cookbook is the main focus of the image.\r\n"""\r\n\r\nimage = generation_model.generate_images(\r\n prompt=prompt,\r\n number_of_images=1,\r\n aspect_ratio="1:1",\r\n safety_filter_level="block_some",\r\n person_generation="allow_all",\r\n)\r\n\r\n# OPTIONAL: View the generated image in a notebook\r\n# image[0].show()'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e16e5bd2640>)])]>
1-Cookbook

Text rendering 

Imagen 3 also brings new possibilities when it comes to rendering text within images. A fun way to play around with this feature is to generate images of greeting cards, posters, and social media posts with captions in various fonts and colors. This feature is as easy as adding a short text description you would like to see to the prompt. Let’s say you would like to add a title and regenerate a cookbook cover.

code_block <ListValue: [StructValue([('code', 'prompt = """\r\nA photorealistic image of a cookbook laying on a wooden kitchen table, the cover facing forward featuring a smiling family sitting at a similar table, soft overhead lighting illuminating the scene, the cookbook is the main focus of the image.\r\n\r\nAdd a title to the center of the cookbook cover that reads, "Everyday Recipes" in orange block letters. \r\n"""\r\n\r\nimage = generation_model.generate_images(\r\n prompt=prompt,\r\n number_of_images=1,\r\n aspect_ratio="1:1",\r\n safety_filter_level="block_some",\r\n person_generation="allow_all",\r\n)\r\n\r\n# OPTIONAL: View the generated image in a notebook\r\n# image[0].show()'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e16e5bd2160>)])]>
2-Cookbook-with-title

Closer to your intent

Imagen 3's prompt comprehension translates your natural language descriptions, no matter how nuanced, into closely matched visuals. You can specify everything from specific camera angles to types of lenses to image compositions in your description. Imagen 3 adheres closely to the prompt, which helps close the gap between your mental picture and the final image. You can provide the model with simple subject-action-setting prompts or intricate, multi-layered descriptions, and the model adapts to your creative process to enable a broad range of styles.

Since Imagen 3 does well with elaborate prompts, providing robust details usually yields higher quality and more precise results. Below are a few options to consider when crafting your prompts:

  • Arrangement: Direct the scene by specifying where you want subjects positioned.

  • Lighting: Create atmosphere with soft or harsh lighting, and control its direction and focus.

  • Angles & lenses: Add depth and perspective with camera angles and lens choices.

  • Styles: Go beyond photorealism and generate digital art, cinematic, vintage, minimalist images, and more.

Reduced latency

While Imagen 3 is our highest quality model to date, we are also offering Imagen 3 Fast, which is optimized for generation speed. Imagen 3 Fast is suitable for creating brighter, higher contrast images. Compared to Imagen 2, you can see a 40% decrease in latency. To demonstrate these two models, you can generate two images with the same prompt. Let’s generate two options for a photo of a salad to add to the same cookbook from earlier.

code_block <ListValue: [StructValue([('code', 'generation_model_fast = ImageGenerationModel.from_pretrained(\r\n "imagen-3.0-fast-generate-001"\r\n)\r\n\r\nprompt = """\r\nA photorealistic image of a garden salad overflowing with colorful vegetables like bell peppers, cucumbers, tomatoes, and leafy greens, sitting in a wooden bowl in the center of the image on a white marble table. Natural light illuminates the scene, casting soft shadows and highlighting the freshness of the ingredients. \r\n""" \r\n\r\n# Imagen 3 Fast image generation\r\nfast_image = generation_model_fast.generate_images(\r\n prompt=prompt,\r\n number_of_images=1,\r\n aspect_ratio="1:1",\r\n safety_filter_level="block_some",\r\n person_generation="allow_all",\r\n)\r\n\r\n# OPTIONAL: View the generated image in a notebook\r\n# fast_image[0].show()'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e16e5bd2040>)])]>
3-Imagen3-fast-salad

Image generated by Imagen 3 Fast

code_block <ListValue: [StructValue([('code', 'prompt = """\r\nA photorealistic image of a garden salad overflowing with colorful vegetables like bell peppers, cucumbers, tomatoes, and leafy greens, sitting in a wooden bowl in the center of the image on a white marble table. Natural light illuminates the scene, casting soft shadows and highlighting the freshness of the ingredients. \r\n""" \r\n\r\n# Imagen 3 image generation\r\nimage = generation_model.generate_images(\r\n prompt=prompt,\r\n number_of_images=1,\r\n aspect_ratio="1:1",\r\n safety_filter_level="block_some",\r\n person_generation="allow_all",\r\n)\r\n\r\n# OPTIONAL: View the generated image in a notebook\r\n# image[0].show()'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e16e5bd27f0>)])]>
4-Imagen3-salad

Image generated by Imagen 3

Protect your work and create responsibly

Imagen 3 has built in safeguards that let you focus on your artistic vision without compromising control. In partnership with Google DeepMind, Imagen 3 utilizes SynthID, a technology which embeds an invisible watermark at the pixel level. By default, a digital watermark is added to all Imagen 3 generated images, but you can explicitly enable this feature with the add_watermark parameter. You can also use the API to verify whether an image was generated using Imagen. This verifies the authenticity of your AI-generated images, providing transparency and helping to safeguard your work from misuse.

With Imagen 3's advanced safety filters, you can also control the types of images generated to make sure they meet your brand values or principles. To configure safety filter thresholds for generated images, modify the safety_filter_level. The safety level can be changed to “block_most”, “block_some”, or “block_few”. To change the safety setting that controls the type of people generated, modify person_generation to “allow_all”, “allow_adult”, or “dont_allow”.

code_block <ListValue: [StructValue([('code', '# Imagen 3 image generation\r\nimage = generation_model.generate_images(\r\n prompt=prompt,\r\n number_of_images=1,\r\n aspect_ratio="1:1",\r\n safety_filter_level="block_some",\r\n person_generation="allow_all",\r\n add_watermark=True,\r\n)'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e16e207be20>)])]>

What’s next?

Imagen 3 is now generally available with an allowlist. We’re currently prioritizing access to Imagen 3 on Vertex AI for developers at businesses with well-defined use cases. You can sign up for access through this form. We'll review your application and get back to you as soon as possible. 

In the meantime you can learn more about Imagen 3 and integrate its capabilities in your applications by checking out the resources below! 

Read Entire Article