From 4b611d4f22fef9974235fbb5347eff39d0d7e86c Mon Sep 17 00:00:00 2001 From: liaoxin Date: Thu, 14 May 2026 22:28:36 +0800 Subject: [PATCH] =?UTF-8?q?refine:=20gemini-web-generate=20SKILL=20?= =?UTF-8?q?=E6=94=B9=E4=B8=BA=E4=B8=AD=E6=96=87=EF=BC=8C=E8=A1=A5=E5=85=A8?= =?UTF-8?q?=E7=AE=A1=E9=81=93/=E6=96=87=E4=BB=B6=E6=8F=90=E7=A4=BA?= =?UTF-8?q?=E8=AF=8D=E7=94=A8=E6=B3=95?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- skills/gemini-web-generate/SKILL.md | 243 ++++++++++-------- skills/gemini-web-generate/scripts/.gitignore | 1 + 2 files changed, 139 insertions(+), 105 deletions(-) diff --git a/skills/gemini-web-generate/SKILL.md b/skills/gemini-web-generate/SKILL.md index 58ddaa2..4fd1c76 100644 --- a/skills/gemini-web-generate/SKILL.md +++ b/skills/gemini-web-generate/SKILL.md @@ -1,166 +1,199 @@ --- name: gemini-web-generate -description: "Generate images via Gemini web interface using a headless browser automation CLI. Supports text-to-image, image-to-image, multi-image reference, multi-turn conversations, and session management. Use when: (1) User wants to generate images with Gemini, (2) User says 'generate with Gemini' or 'Gemini 生图', (3) Image-to-image or style transfer tasks, (4) Continuing an existing Gemini image conversation." +description: "通过 Gemini 网页版生图。支持文生图、图生图(单张/多张参考图)、多轮对话生图、会话管理。底层使用 Puppeteer 驱动的 CLI 脚本自动化全流程。使用场景:(1) 用户要求用 Gemini 生成图片,(2) 用户说「Gemini 生图」,(3) 图生图 / 风格转换,(4) 继续已有 Gemini 生图对话。" --- -# Gemini Web Image Generation +# Gemini 网页版生图 -Generate images through Gemini's web interface using a bundled Puppeteer-based CLI that automates the full workflow: navigation, image upload, prompt submission, download, and cleanup. +通过 Puppeteer 驱动的 CLI 脚本自动化 Gemini 网页生图全流程:打开标签页 → 导航 → 粘贴参考图 → 输入提示词 → 发送 → 等待生成 → 下载原图 → 清理。 -## Architecture +## 组件 -| Component | Path | -|------|------| -| CLI entry | `scripts/cli.js` | -| Node binary | `/home/dazhi/.nvm/versions/node/v22.22.0/bin/node` | -| Browser CDP | `http://127.0.0.1:9223` (managed by `browser` tool) | +| 组件 | 说明 | +|:---|:---| +| CLI 入口 | `scripts/cli.js` | +| Node 路径 | `/home/dazhi/.nvm/versions/node/v22.22.0/bin/node` | +| 浏览器 CDP | `http://127.0.0.1:9223`(由 `browser` 工具管理) | -## Quick Start +## 部署 -All commands use absolute paths for reliability: +首次使用前需安装依赖: + +```bash +cd /scripts && npm install +``` + +## 命令速查 + +以下 `` 指本 skill 的安装根目录。 ```bash NODE="/home/dazhi/.nvm/versions/node/v22.22.0/bin/node" -CLI="/scripts/cli.js" +CLI="/scripts/cli.js" ``` -### Text-to-Image (default: single mode) +### 文生图 ```bash -$NODE $CLI generate --prompt "a cute cat" --mode single +$NODE $CLI generate --prompt "一只可爱的猫,水彩风格" --mode single ``` -### Image-to-Image +### 图生图 ```bash -$NODE $CLI generate --prompt "convert to watercolor style" --image /path/to/ref.png --mode single +# 单张参考图 +$NODE $CLI generate --prompt "换成水墨风格" --image /path/to/ref.png --mode single + +# 多张参考图(最多 10 张,逗号分隔) +$NODE $CLI generate --prompt "融合这些图片的风格" --images "/path/a.png,/path/b.png" --mode single ``` -### Multi-image Reference +### 从文件或管道读取提示词 ```bash -$NODE $CLI generate --prompt "blend these images" --images "/path/a.png,/path/b.png" --mode single +# 从文件读取(支持换行) +$NODE $CLI generate --prompt-file /path/to/prompt.txt --mode single + +# 从管道读取 +echo "复杂的多行提示词..." | $NODE $CLI generate --prompt stdin --mode single ``` -### Multi-turn Conversation +### 多轮对话生图 ```bash -# First turn (session stays open) -$NODE $CLI generate --prompt "draw a sunset" -# Continue -$NODE $CLI generate --session --prompt "add a boat" -# Last turn (auto-close) -$NODE $CLI generate --session --prompt "make it night" --mode single +# 首轮(不加 --mode,标签页保持打开) +$NODE $CLI generate --prompt "画一幅日落风景" + +# 续次(不加 --mode,可传入新的参考图) +$NODE $CLI generate --session --prompt "在画面中加入一艘小船" +$NODE $CLI generate --session --prompt "换成夜晚风格" --image /path/to/ref.png + +# 末轮(加 --mode single,自动关闭标签页) +$NODE $CLI generate --session --prompt "最终调整" --mode single ``` -## Core Workflow +⚠️ 续次不要加 `--mode`,保持标签页打开。关闭方式二选一:末轮加 `--mode single` 自动关,或用 `close` 命令手动关。 -### 1. Ensure Browser is Running +### 通过对话链接继续 + +```bash +$NODE $CLI generate --chatUrl "https://gemini.google.com/app/xxxx" --prompt "换成水彩风格" +``` + +### 状态与诊断 + +```bash +# 立即检查状态 +$NODE $CLI status --session + +# 持续轮询直到完成(适用于 CLI 超时后确认是否后台还在生成) +$NODE $CLI status --session --wait + +# 带截图诊断 +$NODE $CLI status --session --wait --screenshot +``` + +状态返回值:`idle`(空闲)、`generating`(生成中)、`done`(完成)、`error`(异常)、`page_error`(页面崩溃)、`not_logged_in`(未登录)。 + +### 下载 + +```bash +# 下载新生成的所有图片 +$NODE $CLI download --session + +# 下载指定索引(从 0 开始) +$NODE $CLI download --session --index 2 +``` + +### 会话管理 + +```bash +# 列出所有活跃会话 +$NODE $CLI sessions + +# 通过对话链接找回丢失的 session +$NODE $CLI find_session --chatUrl "https://gemini.google.com/app/xxxx" --open + +# 关闭指定会话 +$NODE $CLI close --session +``` + +## 标准生图流程 + +### 1. 确保浏览器已启动 ``` browser action=start ``` -### 2. Execute Generation +### 2. 执行生图 -Default to `--mode single` (auto-close after download). Only omit `--mode` for multi-turn sessions. +默认使用 `--mode single`(生成后自动关闭标签页)。仅多轮对话时不加 `--mode`。 ```bash -$NODE $CLI generate --prompt "..." --mode single +NODE="/home/dazhi/.nvm/versions/node/v22.22.0/bin/node" +CLI="/scripts/cli.js" + +$NODE $CLI generate --prompt "提示词" --mode single ``` -CLI handles: open tab → navigate to Gemini → paste reference images → type prompt → submit → wait for generation → download to `output/originals/` → close tab (single mode). +CLI 自动完成:创建标签页 → 导航 Gemini → 粘贴参考图 → 输入提示词 → 发送 → 等待生成 → 下载到 `output/originals/` → 关闭标签页(single 模式)。 -Default timeouts: generation 300s (5min), download 120s. +超时默认值:生成 300 秒(5 分钟),下载 120 秒。 -### 3. Handle Results +推荐加 `--screenshot`:出错或超时时自动截图保存,便于诊断。 -**On success**: Move downloaded image and clean up: +### 3. 处理结果 + +**成功时**:移动图片到最终目录并清理: ```bash -# Move latest generated image -LATEST=$(ls -t /scripts/output/originals/ | head -1) -mv "/scripts/output/originals/$LATEST" ~/.openclaw/workspace/media/generated/ +# 移动最新生成的图片 +LATEST=$(ls -t /scripts/output/originals/ | head -1) +mv "/scripts/output/originals/$LATEST" ~/.openclaw/workspace/media/generated/ -# Clean up moved files +# 清理原目录中已移动的文件 for f in Gemini_Generated_Image_*.png generated-*.png; do - if [ -f "/home/dazhi/.openclaw/workspace/media/generated/$f" ]; then - rm "/scripts/output/originals/$f" 2>/dev/null - fi + [ -f "~/.openclaw/workspace/media/generated/$f" ] && rm "/scripts/output/originals/$f" 2>/dev/null done ``` -**On timeout or error**: Check status: +**超时或失败时**:用 status 命令检查: ```bash $NODE $CLI status --session --wait ``` -If status returns `done`, run download: +- 返回 `done` → 执行 `$NODE $CLI download --session ` 手动下载 +- 返回 `generating` → 继续等待(`--wait` 会自动轮询) +- 返回 `error` → 报告错误,建议修改提示词重试 -```bash -$NODE $CLI download --session -``` +**确认下载成功**:下载后必须检查 `output/originals/` 目录,确认文件名和文件大小。 -**Add `--screenshot` for diagnostics**: When errors/timeouts are expected, add `--screenshot` to auto-capture a screenshot on failure. +### 4. 发送图片 -### 4. Deliver to User +用 `message` 工具的 `media` 参数发送最终图片。不要用截图预览——CLI 下载的是原图。 -Use `message` tool with `media` parameter to send the final image. +## 关键规则 -## Secondary Workflows - -### Continue from Chat URL - -```bash -$NODE $CLI generate --chatUrl "https://gemini.google.com/app/xxxx" --prompt "new instruction" -``` - -### List Active Sessions - -```bash -$NODE $CLI sessions -``` - -### Find Lost Session - -```bash -$NODE $CLI find_session --chatUrl "https://gemini.google.com/app/xxxx" --open -``` - -### Close a Session - -```bash -$NODE $CLI close --session -``` - -## Error Handling - -| Status | Action | -|------|------| -| CLI exits `success` | Move image from `output/originals/` | -| CLI exits `timeout` | Run `status --session --wait` to check if still generating | -| Status returns `done` | Run `download --session ` | -| Status returns `error` | Report error, suggest retry with modified prompt | -| Status returns `generating` | Continue waiting with `--wait` | -| Session lost | Use `find_session --chatUrl --open` to recover | -| Browser not started | Run `browser action=start` first | -| Downloaded image not found | Check `output/originals/` directory, verify filename and size | - -## Critical Rules - -1. **Browser management**: Use `browser` tool for lifecycle; CLI only handles Gemini interaction -2. **Always use absolute paths** for node and CLI paths -3. **Default to `--mode single`**: Auto-close after download unless in multi-turn conversation -4. **Multi-turn continuation**: Do NOT add `--mode` when continuing; close with `--mode single` on last turn or `close` command -5. **Reference image path handling**: - - ❌ Never use absolute paths with spaces — shell splits them - - ✅ `cd` to reference image directory first, then use relative paths - - ✅ Use `ls | grep` to dynamically get filenames -6. **Avoid numbered reference images** (e.g., `01-xxx.jpg`) — Gemini may rate-limit repeated use -7. **Stop after 3 consecutive failures** — investigate root cause instead of retrying -8. **Always move downloaded images** to `~/media/generated/` and clean up originals -9. **Verify downloads**: Check filename and file size in originals before moving -10. **Use `--screenshot`** for diagnostic capture on errors -11. **No `--json` needed**: Read CLI text output directly -12. **Don't use screenshot previews**: CLI downloads original images directly +1. **浏览器管理**用 `browser` 工具,CLI 只负责 Gemini 交互 +2. **所有路径用绝对路径**:node 路径、CLI 路径都用明确的值 +3. **默认单次生图**:所有生图请求默认加 `--mode single`,生成后自动关闭标签页 +4. **多轮续次不加 mode**:续次不加 `--mode`。关闭:末轮加 `--mode single` 自动关,或 `close --session ` 手动关 +5. **不需要 `--json`**:直接读 CLI 文本输出即可 +6. **参考图路径处理**(⚠️ 重要教训): + - ❌ 不要直接用带空格的绝对路径 — shell 会把空格当参数分隔符 + - ✅ 先 `cd` 到参考图目录,用相对路径 + - ✅ 用 `ls | grep` 动态获取文件名,避免手动写带空格的路径 + ```bash + cd /home/dazhi/.openclaw/workspace/media/参考图/ + FILE1=$(ls | grep -v "^[0-9]" | head -1) + FILE2=$(ls | grep -v "^[0-9]" | head -2 | tail -1) + $NODE $CLI generate --prompt "..." --images "$FILE1,$FILE2" --mode single + ``` +7. **不要用带序号的参考图**(如 `01-xxx.jpg`)— Gemini 会因重复使用而限流 +8. **失败 3 次后停下来检查** — 不要盲目重试,先排查路径/Gemini 状态/限流原因 +9. **下载后立即移动**到 `media/generated/`,并清理 originals 中的副本 +10. **生成完成后初步检查图片内容**:确认内容符合预期再发送给用户 +11. **参考图限流**:同一批参考图重复使用多次后 Gemini 可能只回文字,换新参考图或间隔一段时间再试 +12. **善用 `--screenshot`**:出错或超时自动截图,快速定位问题 diff --git a/skills/gemini-web-generate/scripts/.gitignore b/skills/gemini-web-generate/scripts/.gitignore index ea1472e..1d8e95c 100644 --- a/skills/gemini-web-generate/scripts/.gitignore +++ b/skills/gemini-web-generate/scripts/.gitignore @@ -1 +1,2 @@ output/ +sessions.json