← Back to Skills
Image Generation
Generate and edit images from text prompts using diffusion-style models.
name:
image-generation
description:
Generate and edit images using AI. Use when asked to create, generate, draw, or edit images, illustrations, diagrams, comics, pictures, artwork, logos, or any visual content. Also use for image editing, style transfer, adding elements to photos, or combining multiple images.
Image Generation & Editing
Generate and edit images using Gemini 3 Pro Image - a state-of-the-art model for professional image creation.
When to Use
Use this skill when you need to:
- Generate images from text descriptions
- Edit existing images (add/remove/modify elements)
- Combine multiple images into new compositions
- Apply style transfers to images
- Create visual assets, illustrations, or diagrams
- Generate images with text/logos (high-fidelity text rendering)
Usage
Generate from Text (Text-to-Image)
terminal
Loading...
Edit an Existing Image
terminal
Loading...
Combine Multiple Images
terminal
Loading...
Specify Output Options
terminal
Loading...
Options
| Option | Short | Default | Description |
|---|---|---|---|
--output | -o | Auto-generated in /workspace/generated_images/ | Output file path |
--input | -i | None | Input image for editing (can specify multiple) |
--aspect-ratio | -a | 1:1 | Output aspect ratio |
--resolution | -r | 1K | Output resolution (1K, 2K, or 4K) |
By default, images are saved to /workspace/generated_images/ with timestamped filenames like image_20250120_143052_your_prompt.png.
Aspect Ratios
1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
Resolutions
1K- ~1024px (default, fastest)2K- ~2048px (higher quality)4K- ~4096px (highest quality, slower)
Examples
Text-to-Image Generation
terminal
Loading...
Image Editing
terminal
Loading...
Multi-Image Composition
terminal
Loading...
Capabilities
Gemini 3 Pro Image Features
- High-resolution output: 1K, 2K, and 4K generation
- Advanced text rendering: Legible, stylized text for logos, diagrams, marketing
- Thinking mode: Model reasons through complex prompts for better results
- Up to 14 reference images: Mix images for composition (5 high-fidelity people)
- Semantic masking: Edit specific parts without explicit masks
Requirements
GEMINI_API_KEYenvironment variable must be set- Python 3.10+ with
google-genaiandPillowpackages installed
Tips for Better Results
For Generation
- Be descriptive: "A photorealistic close-up portrait with soft golden hour lighting" beats "a portrait"
- Specify style: Include art style references (minimalist, photorealistic, watercolor, etc.)
- Add camera details: Mention lens type, lighting setup, camera angle for photorealistic images
- Use step-by-step: For complex scenes, describe background first, then foreground elements
For Editing
- Be specific about what to preserve: "Keep the woman's face unchanged, only add..."
- Describe the integration: "The hat should look naturally placed, matching the lighting"
- Use semantic descriptions: Instead of "mask the sofa", say "change only the sofa"
For Text in Images
- Specify font style descriptively: "clean, bold, sans-serif" or "elegant script"
- Place text explicitly: "text at the top center of the image"