5.2 KiB
name, description
| name | description |
|---|---|
| gemini-web-generate | Generate images via Gemini web interface using a headless browser automation CLI. Supports text-to-image, image-to-image, multi-image reference, multi-turn conversations, and session management. Use when: (1) User wants to generate images with Gemini, (2) User says 'generate with Gemini' or 'Gemini 生图', (3) Image-to-image or style transfer tasks, (4) Continuing an existing Gemini image conversation. |
Gemini Web Image Generation
Generate images through Gemini's web interface using a bundled Puppeteer-based CLI that automates the full workflow: navigation, image upload, prompt submission, download, and cleanup.
Architecture
| Component | Path |
|---|---|
| CLI entry | scripts/cli.js |
| Node binary | /home/dazhi/.nvm/versions/node/v22.22.0/bin/node |
| Browser CDP | http://127.0.0.1:9223 (managed by browser tool) |
Quick Start
All commands use absolute paths for reliability:
NODE="/home/dazhi/.nvm/versions/node/v22.22.0/bin/node"
CLI="<skill-dir>/scripts/cli.js"
Text-to-Image (default: single mode)
$NODE $CLI generate --prompt "a cute cat" --mode single
Image-to-Image
$NODE $CLI generate --prompt "convert to watercolor style" --image /path/to/ref.png --mode single
Multi-image Reference
$NODE $CLI generate --prompt "blend these images" --images "/path/a.png,/path/b.png" --mode single
Multi-turn Conversation
# First turn (session stays open)
$NODE $CLI generate --prompt "draw a sunset"
# Continue
$NODE $CLI generate --session <id> --prompt "add a boat"
# Last turn (auto-close)
$NODE $CLI generate --session <id> --prompt "make it night" --mode single
Core Workflow
1. Ensure Browser is Running
browser action=start
2. Execute Generation
Default to --mode single (auto-close after download). Only omit --mode for multi-turn sessions.
$NODE $CLI generate --prompt "..." --mode single
CLI handles: open tab → navigate to Gemini → paste reference images → type prompt → submit → wait for generation → download to output/originals/ → close tab (single mode).
Default timeouts: generation 300s (5min), download 120s.
3. Handle Results
On success: Move downloaded image and clean up:
# Move latest generated image
LATEST=$(ls -t <skill-dir>/scripts/output/originals/ | head -1)
mv "<skill-dir>/scripts/output/originals/$LATEST" ~/.openclaw/workspace/media/generated/
# Clean up moved files
for f in Gemini_Generated_Image_*.png generated-*.png; do
if [ -f "/home/dazhi/.openclaw/workspace/media/generated/$f" ]; then
rm "<skill-dir>/scripts/output/originals/$f" 2>/dev/null
fi
done
On timeout or error: Check status:
$NODE $CLI status --session <id> --wait
If status returns done, run download:
$NODE $CLI download --session <id>
Add --screenshot for diagnostics: When errors/timeouts are expected, add --screenshot to auto-capture a screenshot on failure.
4. Deliver to User
Use message tool with media parameter to send the final image.
Secondary Workflows
Continue from Chat URL
$NODE $CLI generate --chatUrl "https://gemini.google.com/app/xxxx" --prompt "new instruction"
List Active Sessions
$NODE $CLI sessions
Find Lost Session
$NODE $CLI find_session --chatUrl "https://gemini.google.com/app/xxxx" --open
Close a Session
$NODE $CLI close --session <id>
Error Handling
| Status | Action |
|---|---|
CLI exits success |
Move image from output/originals/ |
CLI exits timeout |
Run status --session <id> --wait to check if still generating |
Status returns done |
Run download --session <id> |
Status returns error |
Report error, suggest retry with modified prompt |
Status returns generating |
Continue waiting with --wait |
| Session lost | Use find_session --chatUrl <url> --open to recover |
| Browser not started | Run browser action=start first |
| Downloaded image not found | Check output/originals/ directory, verify filename and size |
Critical Rules
- Browser management: Use
browsertool for lifecycle; CLI only handles Gemini interaction - Always use absolute paths for node and CLI paths
- Default to
--mode single: Auto-close after download unless in multi-turn conversation - Multi-turn continuation: Do NOT add
--modewhen continuing; close with--mode singleon last turn orclosecommand - Reference image path handling:
- ❌ Never use absolute paths with spaces — shell splits them
- ✅
cdto reference image directory first, then use relative paths - ✅ Use
ls | grepto dynamically get filenames
- Avoid numbered reference images (e.g.,
01-xxx.jpg) — Gemini may rate-limit repeated use - Stop after 3 consecutive failures — investigate root cause instead of retrying
- Always move downloaded images to
~/media/generated/and clean up originals - Verify downloads: Check filename and file size in originals before moving
- Use
--screenshotfor diagnostic capture on errors - No
--jsonneeded: Read CLI text output directly - Don't use screenshot previews: CLI downloads original images directly