← Back to Skills

Image Generation

Generate and edit images from text prompts using diffusion-style models.

Category

creative

Provider

computer agents

Code Files

1

name:
image-generation
description:
Generate and edit images using AI. Use when asked to create, generate, draw, or edit images, illustrations, diagrams, comics, pictures, artwork, logos, or any visual content. Also use for image editing, style transfer, adding elements to photos, or combining multiple images.

Image Generation & Editing

Generate and edit images using Gemini 3 Pro Image - a state-of-the-art model for professional image creation.

When to Use

Use this skill when you need to:

  • Generate images from text descriptions
  • Edit existing images (add/remove/modify elements)
  • Combine multiple images into new compositions
  • Apply style transfers to images
  • Create visual assets, illustrations, or diagrams
  • Generate images with text/logos (high-fidelity text rendering)

Usage

Generate from Text (Text-to-Image)

terminal
Loading...

Edit an Existing Image

terminal
Loading...

Combine Multiple Images

terminal
Loading...

Specify Output Options

terminal
Loading...

Options

OptionShortDefaultDescription
--output-oAuto-generated in /workspace/generated_images/Output file path
--input-iNoneInput image for editing (can specify multiple)
--aspect-ratio-a1:1Output aspect ratio
--resolution-r1KOutput resolution (1K, 2K, or 4K)

By default, images are saved to /workspace/generated_images/ with timestamped filenames like image_20250120_143052_your_prompt.png.

Aspect Ratios

1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9

Resolutions

  • 1K - ~1024px (default, fastest)
  • 2K - ~2048px (higher quality)
  • 4K - ~4096px (highest quality, slower)

Examples

Text-to-Image Generation

terminal
Loading...

Image Editing

terminal
Loading...

Multi-Image Composition

terminal
Loading...

Capabilities

Gemini 3 Pro Image Features

  • High-resolution output: 1K, 2K, and 4K generation
  • Advanced text rendering: Legible, stylized text for logos, diagrams, marketing
  • Thinking mode: Model reasons through complex prompts for better results
  • Up to 14 reference images: Mix images for composition (5 high-fidelity people)
  • Semantic masking: Edit specific parts without explicit masks

Requirements

  • GEMINI_API_KEY environment variable must be set
  • Python 3.10+ with google-genai and Pillow packages installed

Tips for Better Results

For Generation

  • Be descriptive: "A photorealistic close-up portrait with soft golden hour lighting" beats "a portrait"
  • Specify style: Include art style references (minimalist, photorealistic, watercolor, etc.)
  • Add camera details: Mention lens type, lighting setup, camera angle for photorealistic images
  • Use step-by-step: For complex scenes, describe background first, then foreground elements

For Editing

  • Be specific about what to preserve: "Keep the woman's face unchanged, only add..."
  • Describe the integration: "The hat should look naturally placed, matching the lighting"
  • Use semantic descriptions: Instead of "mask the sofa", say "change only the sofa"

For Text in Images

  • Specify font style descriptively: "clean, bold, sans-serif" or "elegant script"
  • Place text explicitly: "text at the top center of the image"