--- name: gemini-web-generate description: "Generate images via Gemini web interface using a headless browser automation CLI. Supports text-to-image, image-to-image, multi-image reference, multi-turn conversations, and session management. Use when: (1) User wants to generate images with Gemini, (2) User says 'generate with Gemini' or 'Gemini 生图', (3) Image-to-image or style transfer tasks, (4) Continuing an existing Gemini image conversation." --- # Gemini Web Image Generation Generate images through Gemini's web interface using a bundled Puppeteer-based CLI that automates the full workflow: navigation, image upload, prompt submission, download, and cleanup. ## Architecture | Component | Path | |------|------| | CLI entry | `scripts/cli.js` | | Node binary | `/home/dazhi/.nvm/versions/node/v22.22.0/bin/node` | | Browser CDP | `http://127.0.0.1:9223` (managed by `browser` tool) | ## Quick Start All commands use absolute paths for reliability: ```bash NODE="/home/dazhi/.nvm/versions/node/v22.22.0/bin/node" CLI="/scripts/cli.js" ``` ### Text-to-Image (default: single mode) ```bash $NODE $CLI generate --prompt "a cute cat" --mode single ``` ### Image-to-Image ```bash $NODE $CLI generate --prompt "convert to watercolor style" --image /path/to/ref.png --mode single ``` ### Multi-image Reference ```bash $NODE $CLI generate --prompt "blend these images" --images "/path/a.png,/path/b.png" --mode single ``` ### Multi-turn Conversation ```bash # First turn (session stays open) $NODE $CLI generate --prompt "draw a sunset" # Continue $NODE $CLI generate --session --prompt "add a boat" # Last turn (auto-close) $NODE $CLI generate --session --prompt "make it night" --mode single ``` ## Core Workflow ### 1. Ensure Browser is Running ``` browser action=start ``` ### 2. Execute Generation Default to `--mode single` (auto-close after download). Only omit `--mode` for multi-turn sessions. ```bash $NODE $CLI generate --prompt "..." --mode single ``` CLI handles: open tab → navigate to Gemini → paste reference images → type prompt → submit → wait for generation → download to `output/originals/` → close tab (single mode). Default timeouts: generation 300s (5min), download 120s. ### 3. Handle Results **On success**: Move downloaded image and clean up: ```bash # Move latest generated image LATEST=$(ls -t /scripts/output/originals/ | head -1) mv "/scripts/output/originals/$LATEST" ~/.openclaw/workspace/media/generated/ # Clean up moved files for f in Gemini_Generated_Image_*.png generated-*.png; do if [ -f "/home/dazhi/.openclaw/workspace/media/generated/$f" ]; then rm "/scripts/output/originals/$f" 2>/dev/null fi done ``` **On timeout or error**: Check status: ```bash $NODE $CLI status --session --wait ``` If status returns `done`, run download: ```bash $NODE $CLI download --session ``` **Add `--screenshot` for diagnostics**: When errors/timeouts are expected, add `--screenshot` to auto-capture a screenshot on failure. ### 4. Deliver to User Use `message` tool with `media` parameter to send the final image. ## Secondary Workflows ### Continue from Chat URL ```bash $NODE $CLI generate --chatUrl "https://gemini.google.com/app/xxxx" --prompt "new instruction" ``` ### List Active Sessions ```bash $NODE $CLI sessions ``` ### Find Lost Session ```bash $NODE $CLI find_session --chatUrl "https://gemini.google.com/app/xxxx" --open ``` ### Close a Session ```bash $NODE $CLI close --session ``` ## Error Handling | Status | Action | |------|------| | CLI exits `success` | Move image from `output/originals/` | | CLI exits `timeout` | Run `status --session --wait` to check if still generating | | Status returns `done` | Run `download --session ` | | Status returns `error` | Report error, suggest retry with modified prompt | | Status returns `generating` | Continue waiting with `--wait` | | Session lost | Use `find_session --chatUrl --open` to recover | | Browser not started | Run `browser action=start` first | | Downloaded image not found | Check `output/originals/` directory, verify filename and size | ## Critical Rules 1. **Browser management**: Use `browser` tool for lifecycle; CLI only handles Gemini interaction 2. **Always use absolute paths** for node and CLI paths 3. **Default to `--mode single`**: Auto-close after download unless in multi-turn conversation 4. **Multi-turn continuation**: Do NOT add `--mode` when continuing; close with `--mode single` on last turn or `close` command 5. **Reference image path handling**: - ❌ Never use absolute paths with spaces — shell splits them - ✅ `cd` to reference image directory first, then use relative paths - ✅ Use `ls | grep` to dynamically get filenames 6. **Avoid numbered reference images** (e.g., `01-xxx.jpg`) — Gemini may rate-limit repeated use 7. **Stop after 3 consecutive failures** — investigate root cause instead of retrying 8. **Always move downloaded images** to `~/media/generated/` and clean up originals 9. **Verify downloads**: Check filename and file size in originals before moving 10. **Use `--screenshot`** for diagnostic capture on errors 11. **No `--json` needed**: Read CLI text output directly 12. **Don't use screenshot previews**: CLI downloads original images directly