Files
liaoxin-skills/skills/gemini-web-generate/SKILL.md
T

5.2 KiB

name, description
name description
gemini-web-generate Generate images via Gemini web interface using a headless browser automation CLI. Supports text-to-image, image-to-image, multi-image reference, multi-turn conversations, and session management. Use when: (1) User wants to generate images with Gemini, (2) User says 'generate with Gemini' or 'Gemini 生图', (3) Image-to-image or style transfer tasks, (4) Continuing an existing Gemini image conversation.

Gemini Web Image Generation

Generate images through Gemini's web interface using a bundled Puppeteer-based CLI that automates the full workflow: navigation, image upload, prompt submission, download, and cleanup.

Architecture

Component Path
CLI entry scripts/cli.js
Node binary /home/dazhi/.nvm/versions/node/v22.22.0/bin/node
Browser CDP http://127.0.0.1:9223 (managed by browser tool)

Quick Start

All commands use absolute paths for reliability:

NODE="/home/dazhi/.nvm/versions/node/v22.22.0/bin/node"
CLI="<skill-dir>/scripts/cli.js"

Text-to-Image (default: single mode)

$NODE $CLI generate --prompt "a cute cat" --mode single

Image-to-Image

$NODE $CLI generate --prompt "convert to watercolor style" --image /path/to/ref.png --mode single

Multi-image Reference

$NODE $CLI generate --prompt "blend these images" --images "/path/a.png,/path/b.png" --mode single

Multi-turn Conversation

# First turn (session stays open)
$NODE $CLI generate --prompt "draw a sunset"
# Continue
$NODE $CLI generate --session <id> --prompt "add a boat"
# Last turn (auto-close)
$NODE $CLI generate --session <id> --prompt "make it night" --mode single

Core Workflow

1. Ensure Browser is Running

browser action=start

2. Execute Generation

Default to --mode single (auto-close after download). Only omit --mode for multi-turn sessions.

$NODE $CLI generate --prompt "..." --mode single

CLI handles: open tab → navigate to Gemini → paste reference images → type prompt → submit → wait for generation → download to output/originals/ → close tab (single mode).

Default timeouts: generation 300s (5min), download 120s.

3. Handle Results

On success: Move downloaded image and clean up:

# Move latest generated image
LATEST=$(ls -t <skill-dir>/scripts/output/originals/ | head -1)
mv "<skill-dir>/scripts/output/originals/$LATEST" ~/.openclaw/workspace/media/generated/

# Clean up moved files
for f in Gemini_Generated_Image_*.png generated-*.png; do
  if [ -f "/home/dazhi/.openclaw/workspace/media/generated/$f" ]; then
    rm "<skill-dir>/scripts/output/originals/$f" 2>/dev/null
  fi
done

On timeout or error: Check status:

$NODE $CLI status --session <id> --wait

If status returns done, run download:

$NODE $CLI download --session <id>

Add --screenshot for diagnostics: When errors/timeouts are expected, add --screenshot to auto-capture a screenshot on failure.

4. Deliver to User

Use message tool with media parameter to send the final image.

Secondary Workflows

Continue from Chat URL

$NODE $CLI generate --chatUrl "https://gemini.google.com/app/xxxx" --prompt "new instruction"

List Active Sessions

$NODE $CLI sessions

Find Lost Session

$NODE $CLI find_session --chatUrl "https://gemini.google.com/app/xxxx" --open

Close a Session

$NODE $CLI close --session <id>

Error Handling

Status Action
CLI exits success Move image from output/originals/
CLI exits timeout Run status --session <id> --wait to check if still generating
Status returns done Run download --session <id>
Status returns error Report error, suggest retry with modified prompt
Status returns generating Continue waiting with --wait
Session lost Use find_session --chatUrl <url> --open to recover
Browser not started Run browser action=start first
Downloaded image not found Check output/originals/ directory, verify filename and size

Critical Rules

  1. Browser management: Use browser tool for lifecycle; CLI only handles Gemini interaction
  2. Always use absolute paths for node and CLI paths
  3. Default to --mode single: Auto-close after download unless in multi-turn conversation
  4. Multi-turn continuation: Do NOT add --mode when continuing; close with --mode single on last turn or close command
  5. Reference image path handling:
    • Never use absolute paths with spaces — shell splits them
    • cd to reference image directory first, then use relative paths
    • Use ls | grep to dynamically get filenames
  6. Avoid numbered reference images (e.g., 01-xxx.jpg) — Gemini may rate-limit repeated use
  7. Stop after 3 consecutive failures — investigate root cause instead of retrying
  8. Always move downloaded images to ~/media/generated/ and clean up originals
  9. Verify downloads: Check filename and file size in originals before moving
  10. Use --screenshot for diagnostic capture on errors
  11. No --json needed: Read CLI text output directly
  12. Don't use screenshot previews: CLI downloads original images directly