Files
liaoxin-skills/skills/gemini-web-generate/SKILL.md
T

167 lines
5.2 KiB
Markdown

---
name: gemini-web-generate
description: "Generate images via Gemini web interface using a headless browser automation CLI. Supports text-to-image, image-to-image, multi-image reference, multi-turn conversations, and session management. Use when: (1) User wants to generate images with Gemini, (2) User says 'generate with Gemini' or 'Gemini 生图', (3) Image-to-image or style transfer tasks, (4) Continuing an existing Gemini image conversation."
---
# Gemini Web Image Generation
Generate images through Gemini's web interface using a bundled Puppeteer-based CLI that automates the full workflow: navigation, image upload, prompt submission, download, and cleanup.
## Architecture
| Component | Path |
|------|------|
| CLI entry | `scripts/cli.js` |
| Node binary | `/home/dazhi/.nvm/versions/node/v22.22.0/bin/node` |
| Browser CDP | `http://127.0.0.1:9223` (managed by `browser` tool) |
## Quick Start
All commands use absolute paths for reliability:
```bash
NODE="/home/dazhi/.nvm/versions/node/v22.22.0/bin/node"
CLI="<skill-dir>/scripts/cli.js"
```
### Text-to-Image (default: single mode)
```bash
$NODE $CLI generate --prompt "a cute cat" --mode single
```
### Image-to-Image
```bash
$NODE $CLI generate --prompt "convert to watercolor style" --image /path/to/ref.png --mode single
```
### Multi-image Reference
```bash
$NODE $CLI generate --prompt "blend these images" --images "/path/a.png,/path/b.png" --mode single
```
### Multi-turn Conversation
```bash
# First turn (session stays open)
$NODE $CLI generate --prompt "draw a sunset"
# Continue
$NODE $CLI generate --session <id> --prompt "add a boat"
# Last turn (auto-close)
$NODE $CLI generate --session <id> --prompt "make it night" --mode single
```
## Core Workflow
### 1. Ensure Browser is Running
```
browser action=start
```
### 2. Execute Generation
Default to `--mode single` (auto-close after download). Only omit `--mode` for multi-turn sessions.
```bash
$NODE $CLI generate --prompt "..." --mode single
```
CLI handles: open tab → navigate to Gemini → paste reference images → type prompt → submit → wait for generation → download to `output/originals/` → close tab (single mode).
Default timeouts: generation 300s (5min), download 120s.
### 3. Handle Results
**On success**: Move downloaded image and clean up:
```bash
# Move latest generated image
LATEST=$(ls -t <skill-dir>/scripts/output/originals/ | head -1)
mv "<skill-dir>/scripts/output/originals/$LATEST" ~/.openclaw/workspace/media/generated/
# Clean up moved files
for f in Gemini_Generated_Image_*.png generated-*.png; do
if [ -f "/home/dazhi/.openclaw/workspace/media/generated/$f" ]; then
rm "<skill-dir>/scripts/output/originals/$f" 2>/dev/null
fi
done
```
**On timeout or error**: Check status:
```bash
$NODE $CLI status --session <id> --wait
```
If status returns `done`, run download:
```bash
$NODE $CLI download --session <id>
```
**Add `--screenshot` for diagnostics**: When errors/timeouts are expected, add `--screenshot` to auto-capture a screenshot on failure.
### 4. Deliver to User
Use `message` tool with `media` parameter to send the final image.
## Secondary Workflows
### Continue from Chat URL
```bash
$NODE $CLI generate --chatUrl "https://gemini.google.com/app/xxxx" --prompt "new instruction"
```
### List Active Sessions
```bash
$NODE $CLI sessions
```
### Find Lost Session
```bash
$NODE $CLI find_session --chatUrl "https://gemini.google.com/app/xxxx" --open
```
### Close a Session
```bash
$NODE $CLI close --session <id>
```
## Error Handling
| Status | Action |
|------|------|
| CLI exits `success` | Move image from `output/originals/` |
| CLI exits `timeout` | Run `status --session <id> --wait` to check if still generating |
| Status returns `done` | Run `download --session <id>` |
| Status returns `error` | Report error, suggest retry with modified prompt |
| Status returns `generating` | Continue waiting with `--wait` |
| Session lost | Use `find_session --chatUrl <url> --open` to recover |
| Browser not started | Run `browser action=start` first |
| Downloaded image not found | Check `output/originals/` directory, verify filename and size |
## Critical Rules
1. **Browser management**: Use `browser` tool for lifecycle; CLI only handles Gemini interaction
2. **Always use absolute paths** for node and CLI paths
3. **Default to `--mode single`**: Auto-close after download unless in multi-turn conversation
4. **Multi-turn continuation**: Do NOT add `--mode` when continuing; close with `--mode single` on last turn or `close` command
5. **Reference image path handling**:
- ❌ Never use absolute paths with spaces — shell splits them
-`cd` to reference image directory first, then use relative paths
- ✅ Use `ls | grep` to dynamically get filenames
6. **Avoid numbered reference images** (e.g., `01-xxx.jpg`) — Gemini may rate-limit repeated use
7. **Stop after 3 consecutive failures** — investigate root cause instead of retrying
8. **Always move downloaded images** to `~/media/generated/` and clean up originals
9. **Verify downloads**: Check filename and file size in originals before moving
10. **Use `--screenshot`** for diagnostic capture on errors
11. **No `--json` needed**: Read CLI text output directly
12. **Don't use screenshot previews**: CLI downloads original images directly