refine: gemini-web-generate SKILL 改为中文,补全管道/文件提示词用法
This commit is contained in:
+138
-105
@@ -1,166 +1,199 @@
|
||||
---
|
||||
name: gemini-web-generate
|
||||
description: "Generate images via Gemini web interface using a headless browser automation CLI. Supports text-to-image, image-to-image, multi-image reference, multi-turn conversations, and session management. Use when: (1) User wants to generate images with Gemini, (2) User says 'generate with Gemini' or 'Gemini 生图', (3) Image-to-image or style transfer tasks, (4) Continuing an existing Gemini image conversation."
|
||||
description: "通过 Gemini 网页版生图。支持文生图、图生图(单张/多张参考图)、多轮对话生图、会话管理。底层使用 Puppeteer 驱动的 CLI 脚本自动化全流程。使用场景:(1) 用户要求用 Gemini 生成图片,(2) 用户说「Gemini 生图」,(3) 图生图 / 风格转换,(4) 继续已有 Gemini 生图对话。"
|
||||
---
|
||||
|
||||
# Gemini Web Image Generation
|
||||
# Gemini 网页版生图
|
||||
|
||||
Generate images through Gemini's web interface using a bundled Puppeteer-based CLI that automates the full workflow: navigation, image upload, prompt submission, download, and cleanup.
|
||||
通过 Puppeteer 驱动的 CLI 脚本自动化 Gemini 网页生图全流程:打开标签页 → 导航 → 粘贴参考图 → 输入提示词 → 发送 → 等待生成 → 下载原图 → 清理。
|
||||
|
||||
## Architecture
|
||||
## 组件
|
||||
|
||||
| Component | Path |
|
||||
|------|------|
|
||||
| CLI entry | `scripts/cli.js` |
|
||||
| Node binary | `/home/dazhi/.nvm/versions/node/v22.22.0/bin/node` |
|
||||
| Browser CDP | `http://127.0.0.1:9223` (managed by `browser` tool) |
|
||||
| 组件 | 说明 |
|
||||
|:---|:---|
|
||||
| CLI 入口 | `scripts/cli.js` |
|
||||
| Node 路径 | `/home/dazhi/.nvm/versions/node/v22.22.0/bin/node` |
|
||||
| 浏览器 CDP | `http://127.0.0.1:9223`(由 `browser` 工具管理) |
|
||||
|
||||
## Quick Start
|
||||
## 部署
|
||||
|
||||
All commands use absolute paths for reliability:
|
||||
首次使用前需安装依赖:
|
||||
|
||||
```bash
|
||||
cd <skill 安装目录>/scripts && npm install
|
||||
```
|
||||
|
||||
## 命令速查
|
||||
|
||||
以下 `<skill>` 指本 skill 的安装根目录。
|
||||
|
||||
```bash
|
||||
NODE="/home/dazhi/.nvm/versions/node/v22.22.0/bin/node"
|
||||
CLI="<skill-dir>/scripts/cli.js"
|
||||
CLI="<skill>/scripts/cli.js"
|
||||
```
|
||||
|
||||
### Text-to-Image (default: single mode)
|
||||
### 文生图
|
||||
|
||||
```bash
|
||||
$NODE $CLI generate --prompt "a cute cat" --mode single
|
||||
$NODE $CLI generate --prompt "一只可爱的猫,水彩风格" --mode single
|
||||
```
|
||||
|
||||
### Image-to-Image
|
||||
### 图生图
|
||||
|
||||
```bash
|
||||
$NODE $CLI generate --prompt "convert to watercolor style" --image /path/to/ref.png --mode single
|
||||
# 单张参考图
|
||||
$NODE $CLI generate --prompt "换成水墨风格" --image /path/to/ref.png --mode single
|
||||
|
||||
# 多张参考图(最多 10 张,逗号分隔)
|
||||
$NODE $CLI generate --prompt "融合这些图片的风格" --images "/path/a.png,/path/b.png" --mode single
|
||||
```
|
||||
|
||||
### Multi-image Reference
|
||||
### 从文件或管道读取提示词
|
||||
|
||||
```bash
|
||||
$NODE $CLI generate --prompt "blend these images" --images "/path/a.png,/path/b.png" --mode single
|
||||
# 从文件读取(支持换行)
|
||||
$NODE $CLI generate --prompt-file /path/to/prompt.txt --mode single
|
||||
|
||||
# 从管道读取
|
||||
echo "复杂的多行提示词..." | $NODE $CLI generate --prompt stdin --mode single
|
||||
```
|
||||
|
||||
### Multi-turn Conversation
|
||||
### 多轮对话生图
|
||||
|
||||
```bash
|
||||
# First turn (session stays open)
|
||||
$NODE $CLI generate --prompt "draw a sunset"
|
||||
# Continue
|
||||
$NODE $CLI generate --session <id> --prompt "add a boat"
|
||||
# Last turn (auto-close)
|
||||
$NODE $CLI generate --session <id> --prompt "make it night" --mode single
|
||||
# 首轮(不加 --mode,标签页保持打开)
|
||||
$NODE $CLI generate --prompt "画一幅日落风景"
|
||||
|
||||
# 续次(不加 --mode,可传入新的参考图)
|
||||
$NODE $CLI generate --session <id> --prompt "在画面中加入一艘小船"
|
||||
$NODE $CLI generate --session <id> --prompt "换成夜晚风格" --image /path/to/ref.png
|
||||
|
||||
# 末轮(加 --mode single,自动关闭标签页)
|
||||
$NODE $CLI generate --session <id> --prompt "最终调整" --mode single
|
||||
```
|
||||
|
||||
## Core Workflow
|
||||
⚠️ 续次不要加 `--mode`,保持标签页打开。关闭方式二选一:末轮加 `--mode single` 自动关,或用 `close` 命令手动关。
|
||||
|
||||
### 1. Ensure Browser is Running
|
||||
### 通过对话链接继续
|
||||
|
||||
```bash
|
||||
$NODE $CLI generate --chatUrl "https://gemini.google.com/app/xxxx" --prompt "换成水彩风格"
|
||||
```
|
||||
|
||||
### 状态与诊断
|
||||
|
||||
```bash
|
||||
# 立即检查状态
|
||||
$NODE $CLI status --session <id>
|
||||
|
||||
# 持续轮询直到完成(适用于 CLI 超时后确认是否后台还在生成)
|
||||
$NODE $CLI status --session <id> --wait
|
||||
|
||||
# 带截图诊断
|
||||
$NODE $CLI status --session <id> --wait --screenshot
|
||||
```
|
||||
|
||||
状态返回值:`idle`(空闲)、`generating`(生成中)、`done`(完成)、`error`(异常)、`page_error`(页面崩溃)、`not_logged_in`(未登录)。
|
||||
|
||||
### 下载
|
||||
|
||||
```bash
|
||||
# 下载新生成的所有图片
|
||||
$NODE $CLI download --session <id>
|
||||
|
||||
# 下载指定索引(从 0 开始)
|
||||
$NODE $CLI download --session <id> --index 2
|
||||
```
|
||||
|
||||
### 会话管理
|
||||
|
||||
```bash
|
||||
# 列出所有活跃会话
|
||||
$NODE $CLI sessions
|
||||
|
||||
# 通过对话链接找回丢失的 session
|
||||
$NODE $CLI find_session --chatUrl "https://gemini.google.com/app/xxxx" --open
|
||||
|
||||
# 关闭指定会话
|
||||
$NODE $CLI close --session <id>
|
||||
```
|
||||
|
||||
## 标准生图流程
|
||||
|
||||
### 1. 确保浏览器已启动
|
||||
|
||||
```
|
||||
browser action=start
|
||||
```
|
||||
|
||||
### 2. Execute Generation
|
||||
### 2. 执行生图
|
||||
|
||||
Default to `--mode single` (auto-close after download). Only omit `--mode` for multi-turn sessions.
|
||||
默认使用 `--mode single`(生成后自动关闭标签页)。仅多轮对话时不加 `--mode`。
|
||||
|
||||
```bash
|
||||
$NODE $CLI generate --prompt "..." --mode single
|
||||
NODE="/home/dazhi/.nvm/versions/node/v22.22.0/bin/node"
|
||||
CLI="<skill>/scripts/cli.js"
|
||||
|
||||
$NODE $CLI generate --prompt "提示词" --mode single
|
||||
```
|
||||
|
||||
CLI handles: open tab → navigate to Gemini → paste reference images → type prompt → submit → wait for generation → download to `output/originals/` → close tab (single mode).
|
||||
CLI 自动完成:创建标签页 → 导航 Gemini → 粘贴参考图 → 输入提示词 → 发送 → 等待生成 → 下载到 `output/originals/` → 关闭标签页(single 模式)。
|
||||
|
||||
Default timeouts: generation 300s (5min), download 120s.
|
||||
超时默认值:生成 300 秒(5 分钟),下载 120 秒。
|
||||
|
||||
### 3. Handle Results
|
||||
推荐加 `--screenshot`:出错或超时时自动截图保存,便于诊断。
|
||||
|
||||
**On success**: Move downloaded image and clean up:
|
||||
### 3. 处理结果
|
||||
|
||||
**成功时**:移动图片到最终目录并清理:
|
||||
|
||||
```bash
|
||||
# Move latest generated image
|
||||
LATEST=$(ls -t <skill-dir>/scripts/output/originals/ | head -1)
|
||||
mv "<skill-dir>/scripts/output/originals/$LATEST" ~/.openclaw/workspace/media/generated/
|
||||
# 移动最新生成的图片
|
||||
LATEST=$(ls -t <skill>/scripts/output/originals/ | head -1)
|
||||
mv "<skill>/scripts/output/originals/$LATEST" ~/.openclaw/workspace/media/generated/
|
||||
|
||||
# Clean up moved files
|
||||
# 清理原目录中已移动的文件
|
||||
for f in Gemini_Generated_Image_*.png generated-*.png; do
|
||||
if [ -f "/home/dazhi/.openclaw/workspace/media/generated/$f" ]; then
|
||||
rm "<skill-dir>/scripts/output/originals/$f" 2>/dev/null
|
||||
fi
|
||||
[ -f "~/.openclaw/workspace/media/generated/$f" ] && rm "<skill>/scripts/output/originals/$f" 2>/dev/null
|
||||
done
|
||||
```
|
||||
|
||||
**On timeout or error**: Check status:
|
||||
**超时或失败时**:用 status 命令检查:
|
||||
|
||||
```bash
|
||||
$NODE $CLI status --session <id> --wait
|
||||
```
|
||||
|
||||
If status returns `done`, run download:
|
||||
- 返回 `done` → 执行 `$NODE $CLI download --session <id>` 手动下载
|
||||
- 返回 `generating` → 继续等待(`--wait` 会自动轮询)
|
||||
- 返回 `error` → 报告错误,建议修改提示词重试
|
||||
|
||||
```bash
|
||||
$NODE $CLI download --session <id>
|
||||
```
|
||||
**确认下载成功**:下载后必须检查 `output/originals/` 目录,确认文件名和文件大小。
|
||||
|
||||
**Add `--screenshot` for diagnostics**: When errors/timeouts are expected, add `--screenshot` to auto-capture a screenshot on failure.
|
||||
### 4. 发送图片
|
||||
|
||||
### 4. Deliver to User
|
||||
用 `message` 工具的 `media` 参数发送最终图片。不要用截图预览——CLI 下载的是原图。
|
||||
|
||||
Use `message` tool with `media` parameter to send the final image.
|
||||
## 关键规则
|
||||
|
||||
## Secondary Workflows
|
||||
|
||||
### Continue from Chat URL
|
||||
|
||||
```bash
|
||||
$NODE $CLI generate --chatUrl "https://gemini.google.com/app/xxxx" --prompt "new instruction"
|
||||
```
|
||||
|
||||
### List Active Sessions
|
||||
|
||||
```bash
|
||||
$NODE $CLI sessions
|
||||
```
|
||||
|
||||
### Find Lost Session
|
||||
|
||||
```bash
|
||||
$NODE $CLI find_session --chatUrl "https://gemini.google.com/app/xxxx" --open
|
||||
```
|
||||
|
||||
### Close a Session
|
||||
|
||||
```bash
|
||||
$NODE $CLI close --session <id>
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
| Status | Action |
|
||||
|------|------|
|
||||
| CLI exits `success` | Move image from `output/originals/` |
|
||||
| CLI exits `timeout` | Run `status --session <id> --wait` to check if still generating |
|
||||
| Status returns `done` | Run `download --session <id>` |
|
||||
| Status returns `error` | Report error, suggest retry with modified prompt |
|
||||
| Status returns `generating` | Continue waiting with `--wait` |
|
||||
| Session lost | Use `find_session --chatUrl <url> --open` to recover |
|
||||
| Browser not started | Run `browser action=start` first |
|
||||
| Downloaded image not found | Check `output/originals/` directory, verify filename and size |
|
||||
|
||||
## Critical Rules
|
||||
|
||||
1. **Browser management**: Use `browser` tool for lifecycle; CLI only handles Gemini interaction
|
||||
2. **Always use absolute paths** for node and CLI paths
|
||||
3. **Default to `--mode single`**: Auto-close after download unless in multi-turn conversation
|
||||
4. **Multi-turn continuation**: Do NOT add `--mode` when continuing; close with `--mode single` on last turn or `close` command
|
||||
5. **Reference image path handling**:
|
||||
- ❌ Never use absolute paths with spaces — shell splits them
|
||||
- ✅ `cd` to reference image directory first, then use relative paths
|
||||
- ✅ Use `ls | grep` to dynamically get filenames
|
||||
6. **Avoid numbered reference images** (e.g., `01-xxx.jpg`) — Gemini may rate-limit repeated use
|
||||
7. **Stop after 3 consecutive failures** — investigate root cause instead of retrying
|
||||
8. **Always move downloaded images** to `~/media/generated/` and clean up originals
|
||||
9. **Verify downloads**: Check filename and file size in originals before moving
|
||||
10. **Use `--screenshot`** for diagnostic capture on errors
|
||||
11. **No `--json` needed**: Read CLI text output directly
|
||||
12. **Don't use screenshot previews**: CLI downloads original images directly
|
||||
1. **浏览器管理**用 `browser` 工具,CLI 只负责 Gemini 交互
|
||||
2. **所有路径用绝对路径**:node 路径、CLI 路径都用明确的值
|
||||
3. **默认单次生图**:所有生图请求默认加 `--mode single`,生成后自动关闭标签页
|
||||
4. **多轮续次不加 mode**:续次不加 `--mode`。关闭:末轮加 `--mode single` 自动关,或 `close --session <id>` 手动关
|
||||
5. **不需要 `--json`**:直接读 CLI 文本输出即可
|
||||
6. **参考图路径处理**(⚠️ 重要教训):
|
||||
- ❌ 不要直接用带空格的绝对路径 — shell 会把空格当参数分隔符
|
||||
- ✅ 先 `cd` 到参考图目录,用相对路径
|
||||
- ✅ 用 `ls | grep` 动态获取文件名,避免手动写带空格的路径
|
||||
```bash
|
||||
cd /home/dazhi/.openclaw/workspace/media/参考图/
|
||||
FILE1=$(ls | grep -v "^[0-9]" | head -1)
|
||||
FILE2=$(ls | grep -v "^[0-9]" | head -2 | tail -1)
|
||||
$NODE $CLI generate --prompt "..." --images "$FILE1,$FILE2" --mode single
|
||||
```
|
||||
7. **不要用带序号的参考图**(如 `01-xxx.jpg`)— Gemini 会因重复使用而限流
|
||||
8. **失败 3 次后停下来检查** — 不要盲目重试,先排查路径/Gemini 状态/限流原因
|
||||
9. **下载后立即移动**到 `media/generated/`,并清理 originals 中的副本
|
||||
10. **生成完成后初步检查图片内容**:确认内容符合预期再发送给用户
|
||||
11. **参考图限流**:同一批参考图重复使用多次后 Gemini 可能只回文字,换新参考图或间隔一段时间再试
|
||||
12. **善用 `--screenshot`**:出错或超时自动截图,快速定位问题
|
||||
|
||||
Reference in New Issue
Block a user