20 Commits

Author SHA1 Message Date
Stefano Fiorini
6e2fd17734 refactor: consolidate web scraping into web-automation 2026-03-10 19:24:17 -05:00
Stefano Fiorini
4b505e4421 feat: add safe Playwright scraper skill 2026-03-10 19:07:30 -05:00
Stefano Fiorini
60363f9f0c chore: ignore local worktrees 2026-03-10 18:59:11 -05:00
Stefano Fiorini
7322d53fa6 Add Google Maps route avoidance flags 2026-03-10 00:51:35 -05:00
Stefano Fiorini
d3a2b9faae feat: prefer workspace key path for google maps 2026-03-10 00:35:25 -05:00
Stefano Fiorini
7361b31c7c Add Google integrations and documentation 2026-03-08 21:29:37 -05:00
Stefano Fiorini
976888f002 Add elevenlabs-stt skill and documentation 2026-03-08 21:11:09 -05:00
Stefano Fiorini
6c12b74cca Document native build step for web-automation setup 2026-03-08 21:06:57 -05:00
Stefano Fiorini
3e611621d6 Make portainer skill use portable config paths 2026-03-08 20:56:17 -05:00
Stefano Fiorini
f38b13c563 Trim searxng skill to minimal custom-skill layout 2026-03-08 20:35:48 -05:00
Stefano Fiorini
ae90cda182 Add searxng skill with portable configuration 2026-03-08 20:34:29 -05:00
Stefano Fiorini
0324ef0810 Improve gitea-api config discovery and repo pagination 2026-03-08 19:26:46 -05:00
Luke
f435487eb0 Add portainer skill for stack management via API
- 11 scripts for stack lifecycle (start/stop/restart, update, prune)
- Detailed documentation with usage examples and workflow
- Updated README.md files with portainer skill info
2026-02-12 05:25:44 +00:00
Luke
47314f50c9 Harden flow.ts click handling for login flows
- Improve text click matching with exact/partial patterns
- Add fallback login navigation when click target is JS/non-clickable
- Add Corriere-specific fallback to /account/login
- Keep flow resilient when click does not trigger navigation
2026-02-11 22:12:42 +00:00
Luke
a6dffe0091 Add general flow runner and document natural-language usage
- Add flow.ts for go/click/type/press/wait/screenshot flows
- Update web-automation docs with natural-language examples
- Update SKILL.md quick reference for flow.ts
- Remove temp script files
2026-02-11 22:01:38 +00:00
Luke
2d0614e020 Restructure docs and simplify README
- Rewrite main README as project intro + info pointers
- Keep skills table focused on current 2 skills
- Add docs/README.md index with links to skill docs
- Add detailed docs for gitea-api and web-automation
2026-02-11 20:08:26 +00:00
Luke
658562ae35 Add web-automation skill
- Browse and scrape web pages using Playwright with Camoufox anti-detection browser
- Supports automated web workflows, authenticated sessions, and bot protection bypass
- Includes scripts for browse, scrape, auth, and local app scanning
- Updated README with skill documentation and system library requirements
2026-02-11 18:46:59 +00:00
88a3644959 Update README.md 2026-02-09 02:50:07 +00:00
e834d94f12 Update README.md 2026-02-09 02:46:25 +00:00
5d2b8869ea Update README.md 2026-02-09 02:42:00 +00:00
48 changed files with 7818 additions and 23 deletions

1
.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
.worktrees/

View File

@@ -1,31 +1,37 @@
# stef-openclaw-skill # stef-openclaw-skills
A curated collection of OpenClaw skills for Stef. A curated collection of practical OpenClaw skills by Stefano.
This repository is organized so an OpenClaw bot can install one or more skills directly from the repo URL/path. This repository contains practical OpenClaw skills and companion integrations. Install the repo (or a single path), then use each skill through OpenClaw and each integration as a local helper CLI.
## Repository Layout ## Where to get information
```text - Skill docs index: [`docs/README.md`](docs/README.md)
stef-openclaw-skill/ - Skill implementation files: `skills/<skill-name>/`
├── README.md - Per-skill runtime instructions: `skills/<skill-name>/SKILL.md`
└── skills/ - Integration implementation files: `integrations/<integration-name>/`
└── gitea-api/ - Integration docs: `docs/*.md`
├── SKILL.md
└── scripts/
├── gitea.py
└── gitea.sh
```
## Skills ## Skills
| Skill | Purpose | Path | | Skill | What it does | Path |
|---|---|---| |---|---|---|
| `gitea-api` | Interact with any Gitea instance via REST API (create repos, issues, PRs, releases, branches, clone) without `tea` CLI. | `skills/gitea-api` | | `elevenlabs-stt` | Transcribe local audio files with ElevenLabs Speech-to-Text, with diarization, language hints, event tags, and JSON output. | `skills/elevenlabs-stt` |
| `gitea-api` | Interact with Gitea via REST API (repos, issues, PRs, releases, branches, user info). | `skills/gitea-api` |
| `portainer` | Manage Portainer stacks via API (list, start/stop/restart, update, prune images). | `skills/portainer` |
| `searxng` | Search through a local or self-hosted SearXNG instance for web, news, images, and more. | `skills/searxng` |
| `web-automation` | One-shot extraction plus broader browsing/scraping with Playwright + Camoufox (auth flows, extraction, bot-protected sites). | `skills/web-automation` |
## Install Ideas ## Integrations
| Integration | What it does | Path |
|---|---|---|
| `google-maps` | Traffic-aware ETA and leave-by calculations using Google Maps APIs. | `integrations/google-maps` |
| `google-workspace` | Gmail and Google Calendar helper CLI for profile, mail, calendar search, and event creation. | `integrations/google-workspace` |
## Install ideas
- Install the whole repo as a skill source. - Install the whole repo as a skill source.
- Install a single skill by path from this repo (e.g. `skills/gitea-api`). - Install a single skill by path (example: `skills/gitea-api`).
(Exact install command can vary by OpenClaw/ClawHub version.) (Exact install command can vary by OpenClaw/ClawHub version.)

16
docs/README.md Normal file
View File

@@ -0,0 +1,16 @@
# Skill Documentation Index
This folder contains detailed docs for each skill in this repository.
## Skills
- [`elevenlabs-stt`](elevenlabs-stt.md) — Local audio transcription through ElevenLabs Speech-to-Text
- [`gitea-api`](gitea-api.md) — REST-based Gitea automation (no `tea` CLI required)
- [`portainer`](portainer.md) — Portainer stack management (list, lifecycle, updates, image pruning)
- [`searxng`](searxng.md) — Privacy-respecting metasearch via a local or self-hosted SearXNG instance
- [`web-automation`](web-automation.md) — One-shot extraction plus Playwright + Camoufox browser automation and scraping
## Integrations
- [`google-maps`](google-maps.md) — Traffic-aware ETA and leave-by calculations via Google Maps APIs
- [`google-workspace`](google-workspace.md) — Gmail and Google Calendar helper CLI

41
docs/elevenlabs-stt.md Normal file
View File

@@ -0,0 +1,41 @@
# elevenlabs-stt
Transcribe local audio files with ElevenLabs Speech-to-Text.
## What this skill is for
- Local audio transcription
- Voice note transcription
- Optional speaker diarization
- Language hints and event tagging
- JSON output for programmatic use
## Requirements
Required binaries:
- `curl`
- `jq`
- `python3`
Preferred auth:
- `ELEVENLABS_API_KEY` in the environment
Fallback auth:
- local OpenClaw config lookup from `~/.openclaw/openclaw.json` or `~/.openclaw/secrets.json`
## Wrapper
Use the bundled script directly:
```bash
bash skills/elevenlabs-stt/scripts/transcribe.sh /path/to/audio.mp3
bash skills/elevenlabs-stt/scripts/transcribe.sh /path/to/audio.mp3 --diarize --lang en
bash skills/elevenlabs-stt/scripts/transcribe.sh /path/to/audio.mp3 --json
bash skills/elevenlabs-stt/scripts/transcribe.sh /path/to/audio.mp3 --events
```
## Notes
- Uses ElevenLabs STT model `scribe_v2`.
- Uploads a local file directly to ElevenLabs.
- If `ELEVENLABS_API_KEY` is not exported, the script tries local OpenClaw config/secrets automatically.

36
docs/gitea-api.md Normal file
View File

@@ -0,0 +1,36 @@
# gitea-api
Use Gitea via REST API without relying on the `tea` CLI.
## What this skill is for
- Create/list repositories
- Create/list/update issues
- Work with pull requests and releases
- Manage branches and user/repo metadata
## Setup
Create:
`~/.clawdbot/credentials/gitea/config.json`
```json
{
"url": "https://git.fiorinis.com",
"token": "your-personal-access-token"
}
```
## Wrapper
You can use the helper script:
```bash
bash skills/gitea-api/scripts/gitea.sh <command>
```
## Notes
- Works against any Gitea instance with a valid token.
- This skill is API-first and does not require `tea`.

46
docs/google-maps.md Normal file
View File

@@ -0,0 +1,46 @@
# google-maps integration
Google Maps traffic/ETA helper CLI using Geocoding API and Routes API.
## What this integration is for
- Drive ETA between two places
- Leave-by time estimation for a target arrival
- Quick traffic-aware route summaries
## Runtime
- `node`
- no package dependencies beyond built-in runtime APIs on current Node
## Auth
Preferred:
- `GOOGLE_MAPS_API_KEY` environment variable
Preferred key file:
- `~/.openclaw/workspace/.clawdbot/credentials/google-maps/apikey.txt`
Legacy fallback key file:
- `~/.openclaw/credentials/google-maps/apikey.txt`
Required Google APIs for the key:
- Geocoding API
- Routes API
## Commands
```bash
node integrations/google-maps/traffic.js eta --from "DFW Airport" --to "Love Field" --departAt now
node integrations/google-maps/traffic.js eta --from "DFW Airport" --to "Love Field" --avoidTolls
node integrations/google-maps/traffic.js leave-by --from "Home" --to "DFW Airport" --arriveBy 2026-03-17T08:30:00-05:00
node integrations/google-maps/traffic.js leave-by --from "Home" --to "DFW Airport" --arriveBy 2026-03-17T08:30:00-05:00 --avoidTolls
```
## Optional route modifiers
- `--avoidTolls`
- `--avoidHighways`
- `--avoidFerries`
These flags are passed through to the Google Routes API `routeModifiers` field and work with both `eta` and `leave-by`.

47
docs/google-workspace.md Normal file
View File

@@ -0,0 +1,47 @@
# google-workspace integration
Google Workspace helper CLI for Gmail and Google Calendar.
## What this integration is for
- Show mailbox identity/profile
- Send email
- Search mail
- Search calendar events
- Create calendar events
## Runtime
- `node`
- dependency: `googleapis`
Install inside the integration folder if needed:
```bash
cd integrations/google-workspace
npm install
```
## Auth and defaults
Default impersonation:
- `stefano@fiorinis.com`
Key lookup order:
1. `GW_KEY_PATH`
2. `~/.openclaw/workspace/.clawdbot/credentials/google-workspace/service-account.json`
3. `~/.clawdbot/credentials/google-workspace/service-account.json`
Optional env:
- `GW_IMPERSONATE`
- `GW_KEY_PATH`
## Commands
```bash
node integrations/google-workspace/gw.js whoami
node integrations/google-workspace/gw.js send --to "user@example.com" --subject "Hello" --body "Hi there"
node integrations/google-workspace/gw.js search-mail --query "from:someone@example.com newer_than:7d" --max 10
node integrations/google-workspace/gw.js search-calendar --timeMin 2026-03-17T00:00:00-05:00 --timeMax 2026-03-18T00:00:00-05:00 --max 20
node integrations/google-workspace/gw.js create-event --summary "Meeting" --start 2026-03-20T09:00:00-05:00 --end 2026-03-20T10:00:00-05:00
```

View File

@@ -0,0 +1,33 @@
# Google Maps Workspace Key Repo Sync Design
**Problem:** The live local `google-maps` integration was updated to prefer the workspace credential path for the API key, but the repository source copy still points at the legacy path. The docs were partially updated, and the repo needs to be the source of truth again before committing and pushing.
## Current state
- Live local integration [traffic.js](/Users/stefano/.openclaw/workspace/integrations/google-maps/traffic.js) prefers:
- `~/.openclaw/workspace/.clawdbot/credentials/google-maps/apikey.txt`
- with legacy fallback `~/.openclaw/credentials/google-maps/apikey.txt`
- Repo copy [traffic.js](/Users/stefano/.openclaw/workspace/projects/stef-openclaw-skills/integrations/google-maps/traffic.js) still points only at the legacy path.
- Repo doc [google-maps.md](/Users/stefano/.openclaw/workspace/projects/stef-openclaw-skills/docs/google-maps.md) now documents the new canonical path and legacy fallback.
## Approaches considered
1. Recommended: sync the repo source copy to match the live integration exactly and keep the docs aligned.
- Minimal change.
- Restores source/runtime consistency.
- Safe to commit and push.
2. Revert the live integration to match the repo.
- Wrong direction because the live behavior is already verified and desired.
3. Update only docs and leave source drift.
- Not acceptable because the repository would remain misleading.
## Selected design
- Update the repo copy of `integrations/google-maps/traffic.js` to match the live integration behavior:
- prefer `~/.openclaw/workspace/.clawdbot/credentials/google-maps/apikey.txt`
- keep `~/.openclaw/credentials/google-maps/apikey.txt` as legacy fallback
- Keep the repo docs aligned with that behavior.
- Verify the repo copy via code inspection and a live ETA command using the runtime integration.
- Commit only the repo changes and push to `origin/main`.

View File

@@ -0,0 +1,92 @@
# Google Maps Workspace Key Repo Sync Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** Sync the `stef-openclaw-skills` repository source and docs with the verified Google Maps workspace-key behavior, then commit and push the change.
**Architecture:** Apply the same path-resolution change from the live integration to the repo copy, keep the docs aligned, and verify behavior with inspection plus a live ETA command. Commit only the repo files that reflect this feature.
**Tech Stack:** Node.js CLI, local file path resolution, git
---
### Task 1: Sync repo source copy
**Files:**
- Modify: `/Users/stefano/.openclaw/workspace/projects/stef-openclaw-skills/integrations/google-maps/traffic.js`
**Step 1: Write the failing check**
Run:
```bash
sed -n '1,50p' /Users/stefano/.openclaw/workspace/projects/stef-openclaw-skills/integrations/google-maps/traffic.js
```
Expected: only the legacy path is present and there is no workspace credential fallback array.
**Step 2: Write minimal implementation**
Apply the same path lookup logic as the live local integration:
- prefer `~/.openclaw/workspace/.clawdbot/credentials/google-maps/apikey.txt`
- keep `~/.openclaw/credentials/google-maps/apikey.txt` as fallback
- keep `GOOGLE_MAPS_API_KEY` support unchanged
**Step 3: Verify source sync**
Run:
```bash
sed -n '1,60p' /Users/stefano/.openclaw/workspace/projects/stef-openclaw-skills/integrations/google-maps/traffic.js
```
Expected: workspace path is primary and legacy path remains fallback.
### Task 2: Verify docs alignment
**Files:**
- Modify or confirm: `/Users/stefano/.openclaw/workspace/projects/stef-openclaw-skills/docs/google-maps.md`
**Step 1: Verify doc text**
Run:
```bash
sed -n '1,80p' /Users/stefano/.openclaw/workspace/projects/stef-openclaw-skills/docs/google-maps.md
```
Expected: docs show workspace key path as preferred and legacy path as fallback.
### Task 3: Verify behavior and publish
**Files:**
- Commit: `/Users/stefano/.openclaw/workspace/projects/stef-openclaw-skills/integrations/google-maps/traffic.js`
- Commit: `/Users/stefano/.openclaw/workspace/projects/stef-openclaw-skills/docs/google-maps.md`
- Commit: `/Users/stefano/.openclaw/workspace/projects/stef-openclaw-skills/docs/plans/2026-03-10-google-maps-workspace-key-repo-sync-design.md`
- Commit: `/Users/stefano/.openclaw/workspace/projects/stef-openclaw-skills/docs/plans/2026-03-10-google-maps-workspace-key-repo-sync.md`
**Step 1: Run live verification**
Run:
```bash
node /Users/stefano/.openclaw/workspace/integrations/google-maps/traffic.js eta --from "DFW Airport" --to "Downtown Dallas" --departAt now
```
Expected: JSON ETA output succeeds with the workspace key file.
**Step 2: Inspect git diff**
Run:
```bash
git -C /Users/stefano/.openclaw/workspace/projects/stef-openclaw-skills diff -- integrations/google-maps/traffic.js docs/google-maps.md docs/plans/2026-03-10-google-maps-workspace-key-repo-sync-design.md docs/plans/2026-03-10-google-maps-workspace-key-repo-sync.md
```
Expected: only the intended repo sync changes appear.
**Step 3: Commit**
Run:
```bash
git -C /Users/stefano/.openclaw/workspace/projects/stef-openclaw-skills add integrations/google-maps/traffic.js docs/google-maps.md docs/plans/2026-03-10-google-maps-workspace-key-repo-sync-design.md docs/plans/2026-03-10-google-maps-workspace-key-repo-sync.md && git -C /Users/stefano/.openclaw/workspace/projects/stef-openclaw-skills commit -m "docs: sync google maps workspace key path"
```
Expected: commit succeeds.
**Step 4: Push**
Run:
```bash
git -C /Users/stefano/.openclaw/workspace/projects/stef-openclaw-skills push origin main
```
Expected: push succeeds.

185
docs/portainer.md Normal file
View File

@@ -0,0 +1,185 @@
# Portainer Skill
Interact with Portainer stacks via API key authentication. Manage stacks, resolve identifiers, update deployments, and clean up old images.
## Overview
This skill provides a comprehensive set of commands for managing Portainer Docker stacks through the API. All stack commands accept stack names and automatically resolve IDs internally.
## Prerequisites
### Auth Configuration
Create a config file at one of these locations:
```
workspace/.clawdbot/credentials/portainer/config.json
~/.clawdbot/credentials/portainer/config.json
```
With the following content:
```json
{
"base_url": "https://your-portainer-instance.com",
"api_key": "YOUR_PORTAINER_API_KEY"
}
```
To generate an API key:
1. Log into Portainer
2. Go to User Settings → Access tokens
3. Create a new token with appropriate permissions
## Commands
### Stack Identification
#### Get Stack ID
```bash
bash scripts/get-stack-id.sh "<stack-name>"
```
Resolves a stack name to its numeric ID. Prints only the ID on success.
#### Get Endpoint ID
```bash
bash scripts/get-endpoint-id.sh "<endpoint-name>"
```
Resolves an endpoint (environment) name to its ID.
### Stack Inventory
#### List All Stacks
```bash
bash scripts/list-stacks.sh
```
Lists all stacks with their ID, Name, and Status.
#### Get Stack Status
```bash
bash scripts/get-stack-status.sh "<stack-name>"
```
Returns JSON with stack details: Id, Name, Status, Type, EndpointId, CreationDate, UpdatedDate.
### Stack Lifecycle
#### Stop Stack
```bash
bash scripts/stop-stack.sh "<stack-name>"
```
#### Start Stack
```bash
bash scripts/start-stack.sh "<stack-name>"
```
#### Restart Stack
```bash
bash scripts/restart-stack.sh "<stack-name>"
```
### Stack Configuration
#### Get Environment Variables
```bash
bash scripts/get-stack-env.sh "<stack-name>"
```
Returns JSON array of `{name, value}` objects.
#### Get Compose File
```bash
bash scripts/get-stack-compose.sh "<stack-name>"
```
Returns the raw docker-compose.yml content.
### Stack Updates
#### Update Stack
```bash
bash scripts/update-stack.sh "<stack-name>" "<compose-file>" [options]
```
Options:
- `--pull` — Force pull images and redeploy (like `docker compose down/pull/up`)
- `--env-file <file>` — Path to a file with env vars (format: NAME=value per line)
Notes:
- Without `--env-file`, existing environment variables are preserved
- The `--pull` flag may return HTTP 504 for large images, but the operation completes in the background
#### Prune Stack Images
```bash
bash scripts/prune-stack-images.sh "<stack-name>"
```
Removes dangling images on the endpoint. Run this after `update-stack --pull` completes.
## Typical Workflow
### Updating a Stack with a New Image Version
```bash
# 1. Get the current compose file
bash scripts/get-stack-compose.sh "my-stack" > /tmp/my-stack-compose.yml
# 2. Update the stack (pull new images)
bash scripts/update-stack.sh "my-stack" "/tmp/my-stack-compose.yml" --pull
# 3. Wait for update to complete (even if you see a 504 timeout)
# 4. Clean up old images
bash scripts/prune-stack-images.sh "my-stack"
```
### Modifying a Stack's Compose File
```bash
# 1. Get the current compose file
bash scripts/get-stack-compose.sh "my-stack" > /tmp/my-stack-compose.yml
# 2. Edit the compose file
nano /tmp/my-stack-compose.yml
# 3. Update the stack with the modified file
bash scripts/update-stack.sh "my-stack" "/tmp/my-stack-compose.yml"
# 4. Optionally prune old images if you changed image tags
bash scripts/prune-stack-images.sh "my-stack"
```
## Error Handling
All scripts:
- Exit with code 0 on success
- Exit with non-zero code on failure
- Print error messages to stderr
- Print results to stdout
Common errors:
- **Missing config file**: Create the auth configuration file in the workspace `.clawdbot/credentials/portainer/` path or in `~/.clawdbot/credentials/portainer/`
- **Invalid API key**: Generate a new API key in Portainer
- **Stack not found**: Check the stack name with `list-stacks.sh`
- **504 Gateway Timeout**: The operation is still running in the background; wait and then run the prune command
## API Reference
This skill uses the following Portainer API endpoints:
| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/api/stacks` | GET | List all stacks |
| `/api/stacks/{id}` | GET | Get stack details |
| `/api/stacks/{id}` | PUT | Update stack |
| `/api/stacks/{id}/file` | GET | Get compose file |
| `/api/stacks/{id}/start` | POST | Start stack |
| `/api/stacks/{id}/stop` | POST | Stop stack |
| `/api/stacks/{id}/restart` | POST | Restart stack |
| `/api/endpoints` | GET | List endpoints |
| `/api/endpoints/{id}/docker/containers/json` | GET | List containers |
| `/api/endpoints/{id}/docker/images/json` | GET | List images |
| `/api/endpoints/{id}/docker/images/{id}` | DELETE | Remove image |
## Notes
- All `*-stack.sh` commands resolve the stack ID internally from the name
- Endpoint ID is fetched automatically from stack info for lifecycle and update operations
- The `--pull` flag triggers Portainer to pull new images and recreate containers
- Large image pulls may cause HTTP 504 timeouts, but operations complete server-side
- Use `prune-stack-images.sh` after updates to clean up old dangling images

50
docs/searxng.md Normal file
View File

@@ -0,0 +1,50 @@
# searxng
Search the web through a local or self-hosted SearXNG instance.
## What this skill is for
- General web search
- News, image, and video search
- Privacy-respecting search without external API keys
- Programmatic search output via JSON
## Runtime requirements
- `python3`
- Python packages: `httpx`, `rich`
## Configuration
Preferred:
- `SEARXNG_URL` environment variable
Optional config file:
- workspace `.clawdbot/credentials/searxng/config.json`
- or `~/.clawdbot/credentials/searxng/config.json`
Example:
```json
{
"url": "https://search.fiorinis.com"
}
```
## Wrapper
Use the bundled script directly:
```bash
python3 skills/searxng/scripts/searxng.py search "OpenClaw" -n 5
python3 skills/searxng/scripts/searxng.py search "latest AI news" --category news
python3 skills/searxng/scripts/searxng.py search "OpenClaw" --format json
```
## Notes
- Falls back to `http://localhost:8080` if no URL is configured.
- Prints the URL it attempted when connection fails.
- Uses the SearXNG JSON API endpoint.

125
docs/web-automation.md Normal file
View File

@@ -0,0 +1,125 @@
# web-automation
Automated web browsing and scraping using Playwright, with one-shot extraction and broader Camoufox-based automation under a single skill.
## What this skill is for
- One-shot extraction from one URL with JSON output
- Automating web workflows
- Authenticated session flows (logins/cookies)
- Extracting page content to markdown
- Working with bot-protected or dynamic pages
## Command selection
- Use `node skills/web-automation/scripts/extract.js "<URL>"` for one-shot extraction from a single URL
- Use `npx tsx scrape.ts ...` for markdown scraping modes
- Use `npx tsx browse.ts ...`, `auth.ts`, or `flow.ts` for interactive or authenticated flows
## Requirements
- Node.js 20+
- `pnpm`
- Network access to download browser binaries
## First-time setup
```bash
cd ~/.openclaw/workspace/skills/web-automation/scripts
pnpm install
npx playwright install chromium
npx camoufox-js fetch
pnpm approve-builds
pnpm rebuild better-sqlite3 esbuild
```
## System libraries (for OpenClaw Docker builds)
```bash
export OPENCLAW_DOCKER_APT_PACKAGES="ffmpeg jq curl libnss3 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libasound2"
```
## Native module note
If `pnpm install` warns that build scripts were ignored for native modules such as `better-sqlite3` or `esbuild`, run:
```bash
pnpm approve-builds
pnpm rebuild better-sqlite3 esbuild
```
Without this, `browse.ts` and `scrape.ts` may fail before launch because the native bindings are missing.
## Common commands
```bash
# One-shot JSON extraction
node skills/web-automation/scripts/extract.js "https://example.com"
# Browse a page
npx tsx browse.ts --url "https://example.com"
# Scrape markdown
npx tsx scrape.ts --url "https://example.com" --mode main --output page.md
# Authenticate flow
npx tsx auth.ts --url "https://example.com/login"
# General natural-language browser flow
npx tsx flow.ts --instruction 'go to https://search.fiorinis.com then type "pippo" then press enter then wait 2s'
```
## One-shot extraction (`extract.js`)
Use `extract.js` when the task is just: open one URL, render it, and return structured content.
### Features
- JavaScript rendering
- lightweight stealth and bounded anti-bot shaping
- JSON-only output
- optional screenshot and saved HTML
- browser sandbox left enabled
### Options
```bash
WAIT_TIME=5000 node skills/web-automation/scripts/extract.js "https://example.com"
SCREENSHOT_PATH=/tmp/page.png node skills/web-automation/scripts/extract.js "https://example.com"
SAVE_HTML=true node skills/web-automation/scripts/extract.js "https://example.com"
HEADLESS=false node skills/web-automation/scripts/extract.js "https://example.com"
USER_AGENT="Mozilla/5.0 ..." node skills/web-automation/scripts/extract.js "https://example.com"
```
### Output fields
- `requestedUrl`
- `finalUrl`
- `title`
- `content`
- `metaDescription`
- `status`
- `elapsedSeconds`
- `challengeDetected`
- optional `screenshot`
- optional `htmlFile`
## Natural-language flow runner (`flow.ts`)
Use `flow.ts` when you want a general command style like:
- "go to this site"
- "find this button and click it"
- "type this and press enter"
### Example
```bash
npx tsx flow.ts --instruction 'go to https://example.com then click on "Sign in" then type "stef@example.com" in #email then press enter'
```
You can also use JSON steps for deterministic runs:
```bash
npx tsx flow.ts --steps '[{"action":"goto","url":"https://example.com"},{"action":"click","text":"Sign in"}]'
```

View File

@@ -0,0 +1,217 @@
#!/usr/bin/env node
const fs = require('fs');
const path = require('path');
const DEFAULT_KEY_PATH = path.join(process.env.HOME || '', '.openclaw/workspace/.clawdbot/credentials/google-maps/apikey.txt');
const FALLBACK_KEY_PATHS = [
DEFAULT_KEY_PATH,
path.join(process.env.HOME || '', '.openclaw/credentials/google-maps/apikey.txt'),
];
function parseArgs(argv) {
const out = { _: [] };
for (let i = 0; i < argv.length; i++) {
const t = argv[i];
if (t.startsWith('--')) {
const k = t.slice(2);
const n = argv[i + 1];
if (!n || n.startsWith('--')) out[k] = true;
else {
out[k] = n;
i++;
}
} else out._.push(t);
}
return out;
}
function usage() {
console.log(`Google Maps Traffic CLI
Commands:
eta --from "Origin" --to "Destination" [--departAt now|ISO]
leave-by --from "Origin" --to "Destination" --arriveBy ISO
Optional flags:
--keyPath <path> API key file path (default: ${DEFAULT_KEY_PATH})
--timeZone <IANA> Display timezone (default: America/Chicago)
--avoidTolls Avoid toll roads when possible
--avoidHighways Avoid highways when possible
--avoidFerries Avoid ferries when possible
Notes:
- Requires Google Maps APIs (Routes API + Geocoding API) enabled for the key.
`);
}
function must(opts, keys) {
const miss = keys.filter((k) => !opts[k]);
if (miss.length) throw new Error(`Missing: ${miss.map((k) => '--' + k).join(', ')}`);
}
function readApiKey(opts) {
if (process.env.GOOGLE_MAPS_API_KEY) return process.env.GOOGLE_MAPS_API_KEY.trim();
const candidates = opts.keyPath ? [opts.keyPath] : FALLBACK_KEY_PATHS;
for (const p of candidates) {
if (!fs.existsSync(p)) continue;
const key = fs.readFileSync(p, 'utf8').trim();
if (!key) throw new Error(`API key file is empty: ${p}`);
return key;
}
throw new Error(`API key file not found. Checked: ${candidates.join(', ')}`);
}
async function geocode(address, key) {
const u = new URL('https://maps.googleapis.com/maps/api/geocode/json');
u.searchParams.set('address', address);
u.searchParams.set('key', key);
const r = await fetch(u);
const j = await r.json();
if (j.status !== 'OK' || !j.results?.length) {
throw new Error(`Geocoding failed for "${address}": ${j.status}${j.error_message ? ` (${j.error_message})` : ''}`);
}
const loc = j.results[0].geometry.location;
return { lat: loc.lat, lng: loc.lng, formatted: j.results[0].formatted_address };
}
function parseGoogleDuration(s) {
// e.g. "1534s"
const m = /^(-?\d+)s$/.exec(String(s || ''));
if (!m) return null;
return Number(m[1]);
}
function fmtMinutes(sec) {
return `${Math.round(sec / 60)} min`;
}
function routeModifiersFromOpts(opts = {}) {
const modifiers = {};
if (opts.avoidTolls) modifiers.avoidTolls = true;
if (opts.avoidHighways) modifiers.avoidHighways = true;
if (opts.avoidFerries) modifiers.avoidFerries = true;
return Object.keys(modifiers).length ? modifiers : undefined;
}
async function computeRoute({ from, to, departAt, key, modifiers }) {
const [o, d] = await Promise.all([geocode(from, key), geocode(to, key)]);
const body = {
origin: { location: { latLng: { latitude: o.lat, longitude: o.lng } } },
destination: { location: { latLng: { latitude: d.lat, longitude: d.lng } } },
travelMode: 'DRIVE',
routingPreference: 'TRAFFIC_AWARE',
computeAlternativeRoutes: false,
languageCode: 'en-US',
units: 'IMPERIAL',
};
if (modifiers) body.routeModifiers = modifiers;
if (departAt && departAt !== 'now') body.departureTime = new Date(departAt).toISOString();
const res = await fetch('https://routes.googleapis.com/directions/v2:computeRoutes', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-Goog-Api-Key': key,
'X-Goog-FieldMask': 'routes.duration,routes.distanceMeters,routes.staticDuration',
},
body: JSON.stringify(body),
});
if (!res.ok) {
const txt = await res.text();
throw new Error(`Routes API error ${res.status}: ${txt}`);
}
const data = await res.json();
if (!data.routes?.length) throw new Error('No route found');
const r = data.routes[0];
const durationSec = parseGoogleDuration(r.duration);
const staticSec = parseGoogleDuration(r.staticDuration);
return {
origin: o.formatted,
destination: d.formatted,
durationSec,
staticSec,
distanceMiles: r.distanceMeters ? r.distanceMeters / 1609.344 : null,
};
}
function localIso(ts, tz = 'America/Chicago') {
return new Date(ts).toLocaleString('en-US', { timeZone: tz, hour12: true });
}
async function cmdEta(opts) {
must(opts, ['from', 'to']);
const key = readApiKey(opts);
const tz = opts.timeZone || 'America/Chicago';
const departTs = opts.departAt && opts.departAt !== 'now' ? new Date(opts.departAt).getTime() : Date.now();
const modifiers = routeModifiersFromOpts(opts);
const route = await computeRoute({ from: opts.from, to: opts.to, departAt: opts.departAt || 'now', key, modifiers });
const arriveTs = departTs + (route.durationSec || 0) * 1000;
console.log(JSON.stringify({
from: route.origin,
to: route.destination,
departureLocal: localIso(departTs, tz),
arrivalLocal: localIso(arriveTs, tz),
eta: fmtMinutes(route.durationSec),
trafficDelay: route.staticSec && route.durationSec ? fmtMinutes(Math.max(0, route.durationSec - route.staticSec)) : null,
distanceMiles: route.distanceMiles ? Number(route.distanceMiles.toFixed(1)) : null,
avoidTolls: !!opts.avoidTolls,
avoidHighways: !!opts.avoidHighways,
avoidFerries: !!opts.avoidFerries,
timeZone: tz,
}, null, 2));
}
async function cmdLeaveBy(opts) {
must(opts, ['from', 'to', 'arriveBy']);
const key = readApiKey(opts);
const tz = opts.timeZone || 'America/Chicago';
const arriveTs = new Date(opts.arriveBy).getTime();
if (!Number.isFinite(arriveTs)) throw new Error('Invalid --arriveBy ISO datetime');
const modifiers = routeModifiersFromOpts(opts);
// two-pass estimate
let departGuess = arriveTs - 45 * 60 * 1000;
for (let i = 0; i < 2; i++) {
const route = await computeRoute({ from: opts.from, to: opts.to, departAt: new Date(departGuess).toISOString(), key, modifiers });
departGuess = arriveTs - (route.durationSec || 0) * 1000;
}
const finalRoute = await computeRoute({ from: opts.from, to: opts.to, departAt: new Date(departGuess).toISOString(), key, modifiers });
console.log(JSON.stringify({
from: finalRoute.origin,
to: finalRoute.destination,
leaveByLocal: localIso(departGuess, tz),
targetArrivalLocal: localIso(arriveTs, tz),
eta: fmtMinutes(finalRoute.durationSec),
trafficDelay: finalRoute.staticSec && finalRoute.durationSec ? fmtMinutes(Math.max(0, finalRoute.durationSec - finalRoute.staticSec)) : null,
distanceMiles: finalRoute.distanceMiles ? Number(finalRoute.distanceMiles.toFixed(1)) : null,
avoidTolls: !!opts.avoidTolls,
avoidHighways: !!opts.avoidHighways,
avoidFerries: !!opts.avoidFerries,
timeZone: tz,
}, null, 2));
}
(async function main() {
try {
const args = parseArgs(process.argv.slice(2));
const cmd = args._[0];
if (!cmd || cmd === 'help' || cmd === '--help' || cmd === '-h') return usage();
if (cmd === 'eta') return await cmdEta(args);
if (cmd === 'leave-by') return await cmdLeaveBy(args);
throw new Error(`Unknown command: ${cmd}`);
} catch (e) {
console.error(`ERROR: ${e.message}`);
process.exit(1);
}
})();

View File

@@ -0,0 +1,248 @@
#!/usr/bin/env node
/*
Google Workspace helper CLI
Commands:
whoami
send --to <email> --subject <text> --body <text> [--html]
search-mail --query <gmail query> [--max 10]
search-calendar --query <text> [--max 10] [--timeMin ISO] [--timeMax ISO] [--calendar primary]
create-event --summary <text> --start <ISO> --end <ISO> [--timeZone America/Chicago] [--description <text>] [--location <text>] [--calendar primary]
*/
const fs = require('fs');
const path = require('path');
const { google } = require('googleapis');
const DEFAULT_SUBJECT = process.env.GW_IMPERSONATE || 'stefano@fiorinis.com';
const DEFAULT_KEY_CANDIDATES = [
process.env.GW_KEY_PATH,
path.join(process.env.HOME || '', '.openclaw/workspace/.clawdbot/credentials/google-workspace/service-account.json'),
path.join(process.env.HOME || '', '.clawdbot/credentials/google-workspace/service-account.json'),
].filter(Boolean);
const SCOPES = [
'https://www.googleapis.com/auth/gmail.send',
'https://www.googleapis.com/auth/gmail.compose',
'https://www.googleapis.com/auth/gmail.readonly',
'https://www.googleapis.com/auth/calendar',
'https://www.googleapis.com/auth/calendar.events',
];
function resolveKeyPath() {
for (const p of DEFAULT_KEY_CANDIDATES) {
if (p && fs.existsSync(p)) return p;
}
return null;
}
function parseArgs(argv) {
const out = { _: [] };
for (let i = 0; i < argv.length; i++) {
const token = argv[i];
if (token.startsWith('--')) {
const key = token.slice(2);
const next = argv[i + 1];
if (!next || next.startsWith('--')) {
out[key] = true;
} else {
out[key] = next;
i++;
}
} else {
out._.push(token);
}
}
return out;
}
function usage() {
console.log(`Google Workspace CLI\n
Env (optional):
GW_IMPERSONATE user to impersonate (default: ${DEFAULT_SUBJECT})
GW_KEY_PATH service-account key path
Commands:
whoami
send --to <email> --subject <text> --body <text> [--html]
search-mail --query <gmail query> [--max 10]
search-calendar --query <text> [--max 10] [--timeMin ISO] [--timeMax ISO] [--calendar primary]
create-event --summary <text> --start <ISO> --end <ISO> [--timeZone America/Chicago] [--description <text>] [--location <text>] [--calendar primary]
`);
}
function assertRequired(opts, required) {
const missing = required.filter((k) => !opts[k]);
if (missing.length) {
throw new Error(`Missing required options: ${missing.map((m) => `--${m}`).join(', ')}`);
}
}
function makeRawEmail({ from, to, subject, body, isHtml = false }) {
const contentType = isHtml ? 'text/html; charset="UTF-8"' : 'text/plain; charset="UTF-8"';
const msg = [
`From: ${from}`,
`To: ${to}`,
`Subject: ${subject}`,
'MIME-Version: 1.0',
`Content-Type: ${contentType}`,
'',
body,
].join('\r\n');
return Buffer.from(msg)
.toString('base64')
.replace(/\+/g, '-')
.replace(/\//g, '_')
.replace(/=+$/g, '');
}
async function getClients() {
const keyPath = resolveKeyPath();
if (!keyPath) {
throw new Error('Service account key not found. Set GW_KEY_PATH or place the file in ~/.openclaw/workspace/.clawdbot/credentials/google-workspace/service-account.json');
}
const auth = new google.auth.GoogleAuth({
keyFile: keyPath,
scopes: SCOPES,
clientOptions: { subject: DEFAULT_SUBJECT },
});
const authClient = await auth.getClient();
const gmail = google.gmail({ version: 'v1', auth: authClient });
const calendar = google.calendar({ version: 'v3', auth: authClient });
return { gmail, calendar, keyPath };
}
async function cmdWhoami(clients) {
const profile = await clients.gmail.users.getProfile({ userId: 'me' });
console.log(JSON.stringify({
impersonating: DEFAULT_SUBJECT,
keyPath: clients.keyPath,
profile: profile.data,
}, null, 2));
}
async function cmdSend(clients, opts) {
assertRequired(opts, ['to', 'subject', 'body']);
const raw = makeRawEmail({
from: DEFAULT_SUBJECT,
to: opts.to,
subject: opts.subject,
body: opts.body,
isHtml: !!opts.html,
});
const res = await clients.gmail.users.messages.send({
userId: 'me',
requestBody: { raw },
});
console.log(JSON.stringify({ ok: true, id: res.data.id, threadId: res.data.threadId }, null, 2));
}
async function cmdSearchMail(clients, opts) {
assertRequired(opts, ['query']);
const maxResults = Math.max(1, Math.min(50, Number(opts.max || 10)));
const list = await clients.gmail.users.messages.list({
userId: 'me',
q: opts.query,
maxResults,
});
const ids = (list.data.messages || []).map((m) => m.id).filter(Boolean);
const out = [];
for (const id of ids) {
const msg = await clients.gmail.users.messages.get({
userId: 'me',
id,
format: 'metadata',
metadataHeaders: ['From', 'To', 'Subject', 'Date'],
});
const headers = Object.fromEntries((msg.data.payload?.headers || []).map((h) => [h.name, h.value]));
out.push({
id,
threadId: msg.data.threadId,
snippet: msg.data.snippet,
from: headers.From,
to: headers.To,
subject: headers.Subject,
date: headers.Date,
});
}
console.log(JSON.stringify({ count: out.length, messages: out }, null, 2));
}
async function cmdSearchCalendar(clients, opts) {
const maxResults = Math.max(1, Math.min(50, Number(opts.max || 10)));
const res = await clients.calendar.events.list({
calendarId: opts.calendar || 'primary',
q: opts.query || undefined,
timeMin: opts.timeMin,
timeMax: opts.timeMax,
singleEvents: true,
orderBy: 'startTime',
maxResults,
});
const events = (res.data.items || []).map((e) => ({
id: e.id,
summary: e.summary,
status: e.status,
start: e.start,
end: e.end,
location: e.location,
hangoutLink: e.hangoutLink,
}));
console.log(JSON.stringify({ count: events.length, events }, null, 2));
}
async function cmdCreateEvent(clients, opts) {
assertRequired(opts, ['summary', 'start', 'end']);
const tz = opts.timeZone || 'America/Chicago';
const event = {
summary: opts.summary,
description: opts.description,
location: opts.location,
start: { dateTime: opts.start, timeZone: tz },
end: { dateTime: opts.end, timeZone: tz },
};
const res = await clients.calendar.events.insert({
calendarId: opts.calendar || 'primary',
requestBody: event,
});
console.log(JSON.stringify({ ok: true, id: res.data.id, htmlLink: res.data.htmlLink }, null, 2));
}
(async function main() {
try {
const args = parseArgs(process.argv.slice(2));
const cmd = args._[0];
if (!cmd || cmd === 'help' || cmd === '--help' || cmd === '-h') {
usage();
process.exit(0);
}
const clients = await getClients();
if (cmd === 'whoami') return await cmdWhoami(clients);
if (cmd === 'send') return await cmdSend(clients, args);
if (cmd === 'search-mail') return await cmdSearchMail(clients, args);
if (cmd === 'search-calendar') return await cmdSearchCalendar(clients, args);
if (cmd === 'create-event') return await cmdCreateEvent(clients, args);
throw new Error(`Unknown command: ${cmd}`);
} catch (err) {
console.error(`ERROR: ${err.message}`);
process.exit(1);
}
})();

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,16 @@
{
"name": "google-workspace",
"version": "1.0.0",
"description": "",
"main": "gw.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
"keywords": [],
"author": "",
"license": "ISC",
"type": "commonjs",
"dependencies": {
"googleapis": "^171.4.0"
}
}

View File

@@ -0,0 +1,46 @@
---
name: elevenlabs-stt
description: Transcribe audio files with ElevenLabs Speech-to-Text (Scribe v2) from the local CLI. Use when you need local audio transcription with optional speaker diarization, language hints, event tagging, or JSON output via scripts/transcribe.sh.
---
# ElevenLabs Speech-to-Text
Use `scripts/transcribe.sh` to transcribe a local audio file with ElevenLabs STT.
## Requirements
Preferred: set `ELEVENLABS_API_KEY` in the environment before running the script.
Fallback: if the environment variable is not set, the script will try to read the key from local OpenClaw config files in `~/.openclaw/`.
Required binaries:
- `curl`
- `jq`
- `python3`
## Usage
Run from the skill directory or call the script by full path.
Examples:
```bash
scripts/transcribe.sh /path/to/audio.mp3
scripts/transcribe.sh /path/to/audio.mp3 --diarize --lang en
scripts/transcribe.sh /path/to/audio.mp3 --json
scripts/transcribe.sh /path/to/audio.mp3 --events
```
## Options
- `--diarize` — enable speaker diarization
- `--lang CODE` — pass an ISO language code hint such as `en`, `es`, or `fr`
- `--json` — print the full JSON response instead of only transcript text
- `--events` — include audio event tagging when supported
## Notes
- The script uploads a local file directly to ElevenLabs.
- The model is fixed to `scribe_v2` in the current script.
- The script returns plain transcript text by default, or pretty-printed JSON with `--json`.
- If the API returns an error payload, the script prints the error and exits non-zero.

View File

@@ -0,0 +1,143 @@
#!/usr/bin/env bash
set -euo pipefail
# ElevenLabs Speech-to-Text transcription script
# Usage: transcribe.sh <audio_file> [options]
show_help() {
cat << EOF
Usage: $(basename "$0") <audio_file> [options]
Options:
--diarize Enable speaker diarization
--lang CODE ISO language code (e.g., en, pt, es, fr)
--json Output full JSON response
--events Tag audio events (laughter, music, etc.)
-h, --help Show this help
Environment:
ELEVENLABS_API_KEY Required API key
Examples:
$(basename "$0") voice_note.ogg
$(basename "$0") meeting.mp3 --diarize --lang en
$(basename "$0") podcast.mp3 --json > transcript.json
EOF
exit 0
}
# Defaults
DIARIZE="false"
LANG_CODE=""
JSON_OUTPUT="false"
TAG_EVENTS="false"
FILE=""
# Parse arguments
while [[ $# -gt 0 ]]; do
case $1 in
-h|--help) show_help ;;
--diarize) DIARIZE="true"; shift ;;
--lang) LANG_CODE="$2"; shift 2 ;;
--json) JSON_OUTPUT="true"; shift ;;
--events) TAG_EVENTS="true"; shift ;;
-*) echo "Unknown option: $1" >&2; exit 1 ;;
*) FILE="$1"; shift ;;
esac
done
# Validate
if [[ -z "$FILE" ]]; then
echo "Error: No audio file specified" >&2
show_help
fi
if [[ ! -f "$FILE" ]]; then
echo "Error: File not found: $FILE" >&2
exit 1
fi
# API key (check env, then fall back to local OpenClaw config/secrets)
API_KEY="${ELEVENLABS_API_KEY:-}"
if [[ -z "$API_KEY" ]]; then
OPENCLAW_DIR="${HOME}/.openclaw"
for CANDIDATE in "$OPENCLAW_DIR/secrets.json" "$OPENCLAW_DIR/openclaw.json"; do
if [[ -f "$CANDIDATE" ]]; then
API_KEY=$(python3 - "$CANDIDATE" <<'PY'
import json, sys
path = sys.argv[1]
try:
with open(path) as f:
data = json.load(f)
except Exception:
print("")
raise SystemExit(0)
candidates = [
("elevenlabs", "apiKey"),
("messages", "tts", "elevenlabs", "apiKey"),
]
for cand in candidates:
cur = data
ok = True
for key in cand:
if isinstance(cur, dict) and key in cur:
cur = cur[key]
else:
ok = False
break
if ok and isinstance(cur, str) and cur:
print(cur)
raise SystemExit(0)
print("")
PY
)
if [[ -n "$API_KEY" ]]; then
break
fi
fi
done
fi
if [[ -z "$API_KEY" ]]; then
echo "Error: ELEVENLABS_API_KEY not set and no local OpenClaw ElevenLabs key was found" >&2
exit 1
fi
# Build curl command
CURL_ARGS=(
-s
-X POST
"https://api.elevenlabs.io/v1/speech-to-text"
-H "xi-api-key: $API_KEY"
-F "file=@$FILE"
-F "model_id=scribe_v2"
-F "diarize=$DIARIZE"
-F "tag_audio_events=$TAG_EVENTS"
)
if [[ -n "$LANG_CODE" ]]; then
CURL_ARGS+=(-F "language_code=$LANG_CODE")
fi
# Make request
RESPONSE=$(curl "${CURL_ARGS[@]}")
# Check for errors
if echo "$RESPONSE" | grep -q '"detail"'; then
echo "Error from API:" >&2
echo "$RESPONSE" | jq -r '.detail.message // .detail' >&2
exit 1
fi
# Output
if [[ "$JSON_OUTPUT" == "true" ]]; then
echo "$RESPONSE" | jq .
else
# Extract just the text
TEXT=$(echo "$RESPONSE" | jq -r '.text // empty')
if [[ -n "$TEXT" ]]; then
echo "$TEXT"
else
echo "$RESPONSE"
fi
fi

View File

@@ -9,10 +9,30 @@ import sys
import urllib.error import urllib.error
import urllib.request import urllib.request
CONFIG_PATHS = [ SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
"/home/node/.openclaw/workspace/.clawdbot/credentials/gitea/config.json",
os.path.expanduser("~/.clawdbot/credentials/gitea/config.json"),
] def candidate_config_paths():
paths = []
current = SCRIPT_DIR
while True:
paths.append(os.path.join(current, ".clawdbot", "credentials", "gitea", "config.json"))
parent = os.path.dirname(current)
if parent == current:
break
current = parent
paths.extend([
"/home/node/.openclaw/workspace/.clawdbot/credentials/gitea/config.json",
os.path.expanduser("~/.clawdbot/credentials/gitea/config.json"),
])
# Deduplicate while preserving order
return list(dict.fromkeys(paths))
CONFIG_PATHS = candidate_config_paths()
def get_config(): def get_config():
@@ -77,12 +97,33 @@ def api_request(endpoint, method="GET", payload=None):
return None, None, str(e) return None, None, str(e)
def api_get_all_pages(endpoint, page_size=100):
page = 1
items = []
while True:
sep = "&" if "?" in endpoint else "?"
paged_endpoint = f"{endpoint}{sep}limit={page_size}&page={page}"
data, status, err = api_request(paged_endpoint)
if data is None:
return None, status, err
if not isinstance(data, list):
return data, status, err
items.extend(data)
if len(data) < page_size:
break
page += 1
return items, 200, None
def print_api_error(action, status, err): def print_api_error(action, status, err):
print(f"❌ Failed to {action}. status={status} error={err}") print(f"❌ Failed to {action}. status={status} error={err}")
def cmd_repos(_): def cmd_repos(_):
repos, status, err = api_request("/user/repos") repos, status, err = api_get_all_pages("/user/repos")
if repos is None: if repos is None:
print_api_error("list repos", status, err) print_api_error("list repos", status, err)
return 1 return 1

111
skills/portainer/SKILL.md Normal file
View File

@@ -0,0 +1,111 @@
---
name: portainer
description: Interact with Portainer stacks via API key authentication. Use for any Portainer stack operations including: listing stacks, resolving stack/endpoint IDs, getting stack status, starting/stopping/restarting stacks, retrieving env vars and compose files, and updating stacks with new compose content. All stack commands accept names and resolve IDs automatically.
---
# Portainer Skill
Manage Portainer stacks via API. All stack commands accept names and resolve IDs automatically.
## Required auth config
`workspace .clawdbot/credentials/portainer/config.json (preferred) or ~/.clawdbot/credentials/portainer/config.json`
```json
{
"base_url": "https://portainer.example.com",
"api_key": "YOUR_PORTAINER_API_KEY"
}
```
## Commands
### Resolve stack ID
```bash
bash scripts/get-stack-id.sh "<stack-name>"
```
Prints only the stack ID. Exits non-zero if not found.
### Resolve endpoint ID
```bash
bash scripts/get-endpoint-id.sh "<endpoint-name>"
```
Prints only the endpoint (environment) ID.
### List all stacks
```bash
bash scripts/list-stacks.sh
```
Outputs: `ID Name Status` (tab-aligned).
### Get stack status
```bash
bash scripts/get-stack-status.sh "<stack-name>"
```
Returns JSON with: Id, Name, Status, Type, EndpointId, CreationDate, UpdatedDate.
### Restart stack
```bash
bash scripts/restart-stack.sh "<stack-name>"
```
### Stop stack
```bash
bash scripts/stop-stack.sh "<stack-name>"
```
### Start stack
```bash
bash scripts/start-stack.sh "<stack-name>"
```
### Get stack env vars
```bash
bash scripts/get-stack-env.sh "<stack-name>"
```
Returns JSON array of `{name, value}` objects.
### Get stack compose file
```bash
bash scripts/get-stack-compose.sh "<stack-name>"
```
Returns the raw docker-compose.yml content.
### Update stack
```bash
bash scripts/update-stack.sh "<stack-name>" "<compose-file>" [--env-file "<env-file>"] [--prune-old]
```
Updates a stack with a new compose file. Preserves existing env vars unless `--env-file` is provided.
Options:
- `--pull` — Force pull images and redeploy (like `docker compose down/pull/up`). Note: may return 504 timeout for large images, but operation completes in background.
### Prune stack images
```bash
bash scripts/prune-stack-images.sh "<stack-name>"
```
Removes dangling images on the endpoint. Run this after `update-stack --pull` completes to clean up old image versions.
**Typical workflow:**
```bash
bash scripts/update-stack.sh "stack-name" "compose.yml" --pull
# wait for update to complete (even if 504 timeout)
bash scripts/prune-stack-images.sh "stack-name"
```
## Notes
- All `*-stack.sh` commands resolve the stack ID internally from the name.
- Endpoint ID is fetched automatically from stack info for lifecycle and update operations.
- `update-stack.sh` is the primary command for deploying new versions — it will trigger Portainer to pull new images if the compose file references updated image tags.

View File

@@ -0,0 +1,78 @@
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SEARCH_DIR="$SCRIPT_DIR"
CONFIG_PATH=""
while true; do
CANDIDATE="$SEARCH_DIR/.clawdbot/credentials/portainer/config.json"
if [[ -f "$CANDIDATE" ]]; then
CONFIG_PATH="$CANDIDATE"
break
fi
PARENT="$(dirname "$SEARCH_DIR")"
if [[ "$PARENT" == "$SEARCH_DIR" ]]; then
break
fi
SEARCH_DIR="$PARENT"
done
if [[ -z "$CONFIG_PATH" && -f "$HOME/.clawdbot/credentials/portainer/config.json" ]]; then
CONFIG_PATH="$HOME/.clawdbot/credentials/portainer/config.json"
fi
err() {
echo "Error: $*" >&2
exit 1
}
require_cmd() {
command -v "$1" >/dev/null 2>&1 || err "Required command not found: $1"
}
usage() {
cat >&2 <<'EOF'
Usage: get-endpoint-id.sh "<endpoint-name>"
Looks up a Portainer endpoint (environment) by exact name and prints its ID.
Requires config at:
workspace .clawdbot/credentials/portainer/config.json or ~/.clawdbot/credentials/portainer/config.json
EOF
exit 2
}
[[ $# -eq 1 ]] || usage
ENDPOINT_NAME="$1"
require_cmd curl
require_cmd jq
[[ -f "$CONFIG_PATH" ]] || err "Missing config file: $CONFIG_PATH"
BASE_URL="$(jq -r '.base_url // empty' "$CONFIG_PATH")"
API_KEY="$(jq -r '.api_key // empty' "$CONFIG_PATH")"
[[ -n "$BASE_URL" ]] || err "config.base_url is missing"
[[ -n "$API_KEY" ]] || err "config.api_key is missing"
BASE_URL="${BASE_URL%/}"
response="$(curl -sS -w $'\n%{http_code}' \
-H "X-API-Key: $API_KEY" \
"$BASE_URL/api/endpoints")"
http_code="$(printf '%s' "$response" | tail -n1)"
body="$(printf '%s' "$response" | sed '$d')"
if [[ "$http_code" -lt 200 || "$http_code" -ge 300 ]]; then
msg="$(printf '%s' "$body" | jq -r '.message // empty' 2>/dev/null || true)"
[[ -n "$msg" ]] || msg="$body"
err "Portainer API request failed (HTTP $http_code): $msg"
fi
endpoint_id="$(printf '%s' "$body" | jq -r --arg name "$ENDPOINT_NAME" '.[] | select(.Name == $name) | .Id' | head -n1)"
[[ -n "$endpoint_id" ]] || err "No endpoint found with name: $ENDPOINT_NAME"
printf '%s\n' "$endpoint_id"

View File

@@ -0,0 +1,81 @@
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SEARCH_DIR="$SCRIPT_DIR"
CONFIG_PATH=""
while true; do
CANDIDATE="$SEARCH_DIR/.clawdbot/credentials/portainer/config.json"
if [[ -f "$CANDIDATE" ]]; then
CONFIG_PATH="$CANDIDATE"
break
fi
PARENT="$(dirname "$SEARCH_DIR")"
if [[ "$PARENT" == "$SEARCH_DIR" ]]; then
break
fi
SEARCH_DIR="$PARENT"
done
if [[ -z "$CONFIG_PATH" && -f "$HOME/.clawdbot/credentials/portainer/config.json" ]]; then
CONFIG_PATH="$HOME/.clawdbot/credentials/portainer/config.json"
fi
err() {
echo "Error: $*" >&2
exit 1
}
require_cmd() {
command -v "$1" >/dev/null 2>&1 || err "Required command not found: $1"
}
usage() {
cat >&2 <<'EOF'
Usage: get-stack-compose.sh "<stack-name>"
Gets the docker-compose.yml content for a Portainer stack by name.
Requires config at:
workspace .clawdbot/credentials/portainer/config.json or ~/.clawdbot/credentials/portainer/config.json
EOF
exit 2
}
[[ $# -eq 1 ]] || usage
STACK_NAME="$1"
require_cmd curl
require_cmd jq
[[ -f "$CONFIG_PATH" ]] || err "Missing config file: $CONFIG_PATH"
BASE_URL="$(jq -r '.base_url // empty' "$CONFIG_PATH")"
API_KEY="$(jq -r '.api_key // empty' "$CONFIG_PATH")"
[[ -n "$BASE_URL" ]] || err "config.base_url is missing"
[[ -n "$API_KEY" ]] || err "config.api_key is missing"
BASE_URL="${BASE_URL%/}"
# Resolve stack ID from name
STACK_ID="$(bash "$SCRIPT_DIR/get-stack-id.sh" "$STACK_NAME")"
[[ -n "$STACK_ID" ]] || err "Failed to resolve stack ID for: $STACK_NAME"
# Get stack file content
response="$(curl -sS -w $'\n%{http_code}' \
-H "X-API-Key: $API_KEY" \
"$BASE_URL/api/stacks/$STACK_ID/file")"
http_code="$(printf '%s' "$response" | tail -n1)"
body="$(printf '%s' "$response" | sed '$d')"
if [[ "$http_code" -lt 200 || "$http_code" -ge 300 ]]; then
msg="$(printf '%s' "$body" | jq -r '.message // empty' 2>/dev/null || true)"
[[ -n "$msg" ]] || msg="$body"
err "Failed to fetch stack file (HTTP $http_code): $msg"
fi
# Extract StackFileContent
printf '%s' "$body" | jq -r '.StackFileContent // empty'

View File

@@ -0,0 +1,82 @@
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SEARCH_DIR="$SCRIPT_DIR"
CONFIG_PATH=""
while true; do
CANDIDATE="$SEARCH_DIR/.clawdbot/credentials/portainer/config.json"
if [[ -f "$CANDIDATE" ]]; then
CONFIG_PATH="$CANDIDATE"
break
fi
PARENT="$(dirname "$SEARCH_DIR")"
if [[ "$PARENT" == "$SEARCH_DIR" ]]; then
break
fi
SEARCH_DIR="$PARENT"
done
if [[ -z "$CONFIG_PATH" && -f "$HOME/.clawdbot/credentials/portainer/config.json" ]]; then
CONFIG_PATH="$HOME/.clawdbot/credentials/portainer/config.json"
fi
err() {
echo "Error: $*" >&2
exit 1
}
require_cmd() {
command -v "$1" >/dev/null 2>&1 || err "Required command not found: $1"
}
usage() {
cat >&2 <<'EOF'
Usage: get-stack-env.sh "<stack-name>"
Gets environment variables for a Portainer stack by name (resolves ID automatically).
Outputs JSON array of {name, value} objects.
Requires config at:
workspace .clawdbot/credentials/portainer/config.json or ~/.clawdbot/credentials/portainer/config.json
EOF
exit 2
}
[[ $# -eq 1 ]] || usage
STACK_NAME="$1"
require_cmd curl
require_cmd jq
[[ -f "$CONFIG_PATH" ]] || err "Missing config file: $CONFIG_PATH"
BASE_URL="$(jq -r '.base_url // empty' "$CONFIG_PATH")"
API_KEY="$(jq -r '.api_key // empty' "$CONFIG_PATH")"
[[ -n "$BASE_URL" ]] || err "config.base_url is missing"
[[ -n "$API_KEY" ]] || err "config.api_key is missing"
BASE_URL="${BASE_URL%/}"
# Resolve stack ID from name
STACK_ID="$(bash "$SCRIPT_DIR/get-stack-id.sh" "$STACK_NAME")"
[[ -n "$STACK_ID" ]] || err "Failed to resolve stack ID for: $STACK_NAME"
# Get stack details (includes Env)
response="$(curl -sS -w $'\n%{http_code}' \
-H "X-API-Key: $API_KEY" \
"$BASE_URL/api/stacks/$STACK_ID")"
http_code="$(printf '%s' "$response" | tail -n1)"
body="$(printf '%s' "$response" | sed '$d')"
if [[ "$http_code" -lt 200 || "$http_code" -ge 300 ]]; then
msg="$(printf '%s' "$body" | jq -r '.message // empty' 2>/dev/null || true)"
[[ -n "$msg" ]] || msg="$body"
err "Failed to fetch stack info (HTTP $http_code): $msg"
fi
# Extract and output env array
printf '%s' "$body" | jq '.Env // []'

View File

@@ -0,0 +1,78 @@
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SEARCH_DIR="$SCRIPT_DIR"
CONFIG_PATH=""
while true; do
CANDIDATE="$SEARCH_DIR/.clawdbot/credentials/portainer/config.json"
if [[ -f "$CANDIDATE" ]]; then
CONFIG_PATH="$CANDIDATE"
break
fi
PARENT="$(dirname "$SEARCH_DIR")"
if [[ "$PARENT" == "$SEARCH_DIR" ]]; then
break
fi
SEARCH_DIR="$PARENT"
done
if [[ -z "$CONFIG_PATH" && -f "$HOME/.clawdbot/credentials/portainer/config.json" ]]; then
CONFIG_PATH="$HOME/.clawdbot/credentials/portainer/config.json"
fi
err() {
echo "Error: $*" >&2
exit 1
}
require_cmd() {
command -v "$1" >/dev/null 2>&1 || err "Required command not found: $1"
}
usage() {
cat >&2 <<'EOF'
Usage: get-stack-id.sh "<stack-name>"
Looks up a Portainer stack by exact name and prints its ID.
Requires config at:
workspace .clawdbot/credentials/portainer/config.json or ~/.clawdbot/credentials/portainer/config.json
EOF
exit 2
}
[[ $# -eq 1 ]] || usage
STACK_NAME="$1"
require_cmd curl
require_cmd jq
[[ -f "$CONFIG_PATH" ]] || err "Missing config file: $CONFIG_PATH"
BASE_URL="$(jq -r '.base_url // empty' "$CONFIG_PATH")"
API_KEY="$(jq -r '.api_key // empty' "$CONFIG_PATH")"
[[ -n "$BASE_URL" ]] || err "config.base_url is missing"
[[ -n "$API_KEY" ]] || err "config.api_key is missing"
BASE_URL="${BASE_URL%/}"
response="$(curl -sS -w $'\n%{http_code}' \
-H "X-API-Key: $API_KEY" \
"$BASE_URL/api/stacks")"
http_code="$(printf '%s' "$response" | tail -n1)"
body="$(printf '%s' "$response" | sed '$d')"
if [[ "$http_code" -lt 200 || "$http_code" -ge 300 ]]; then
msg="$(printf '%s' "$body" | jq -r '.message // empty' 2>/dev/null || true)"
[[ -n "$msg" ]] || msg="$body"
err "Portainer API request failed (HTTP $http_code): $msg"
fi
stack_id="$(printf '%s' "$body" | jq -r --arg name "$STACK_NAME" '.[] | select(.Name == $name) | .Id' | head -n1)"
[[ -n "$stack_id" ]] || err "No stack found with name: $STACK_NAME"
printf '%s\n' "$stack_id"

View File

@@ -0,0 +1,90 @@
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SEARCH_DIR="$SCRIPT_DIR"
CONFIG_PATH=""
while true; do
CANDIDATE="$SEARCH_DIR/.clawdbot/credentials/portainer/config.json"
if [[ -f "$CANDIDATE" ]]; then
CONFIG_PATH="$CANDIDATE"
break
fi
PARENT="$(dirname "$SEARCH_DIR")"
if [[ "$PARENT" == "$SEARCH_DIR" ]]; then
break
fi
SEARCH_DIR="$PARENT"
done
if [[ -z "$CONFIG_PATH" && -f "$HOME/.clawdbot/credentials/portainer/config.json" ]]; then
CONFIG_PATH="$HOME/.clawdbot/credentials/portainer/config.json"
fi
err() {
echo "Error: $*" >&2
exit 1
}
require_cmd() {
command -v "$1" >/dev/null 2>&1 || err "Required command not found: $1"
}
usage() {
cat >&2 <<'EOF'
Usage: get-stack-status.sh "<stack-name>"
Gets detailed status for a Portainer stack by name (resolves ID automatically).
Requires config at:
workspace .clawdbot/credentials/portainer/config.json or ~/.clawdbot/credentials/portainer/config.json
EOF
exit 2
}
[[ $# -eq 1 ]] || usage
STACK_NAME="$1"
require_cmd curl
require_cmd jq
[[ -f "$CONFIG_PATH" ]] || err "Missing config file: $CONFIG_PATH"
BASE_URL="$(jq -r '.base_url // empty' "$CONFIG_PATH")"
API_KEY="$(jq -r '.api_key // empty' "$CONFIG_PATH")"
[[ -n "$BASE_URL" ]] || err "config.base_url is missing"
[[ -n "$API_KEY" ]] || err "config.api_key is missing"
BASE_URL="${BASE_URL%/}"
# Resolve stack ID from name
STACK_ID="$(bash "$SCRIPT_DIR/get-stack-id.sh" "$STACK_NAME")"
[[ -n "$STACK_ID" ]] || err "Failed to resolve stack ID for: $STACK_NAME"
# Get stack details
response="$(curl -sS -w $'\n%{http_code}' \
-H "X-API-Key: $API_KEY" \
"$BASE_URL/api/stacks/$STACK_ID")"
http_code="$(printf '%s' "$response" | tail -n1)"
body="$(printf '%s' "$response" | sed '$d')"
if [[ "$http_code" -lt 200 || "$http_code" -ge 300 ]]; then
msg="$(printf '%s' "$body" | jq -r '.message // empty' 2>/dev/null || true)"
[[ -n "$msg" ]] || msg="$body"
err "Failed to fetch stack status (HTTP $http_code): $msg"
fi
# Output key fields
printf '%s' "$body" | jq '{
Id,
Name,
Status,
Type,
EndpointId,
SwarmId,
CreationDate,
UpdatedDate
}'

View File

@@ -0,0 +1,60 @@
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SEARCH_DIR="$SCRIPT_DIR"
CONFIG_PATH=""
while true; do
CANDIDATE="$SEARCH_DIR/.clawdbot/credentials/portainer/config.json"
if [[ -f "$CANDIDATE" ]]; then
CONFIG_PATH="$CANDIDATE"
break
fi
PARENT="$(dirname "$SEARCH_DIR")"
if [[ "$PARENT" == "$SEARCH_DIR" ]]; then
break
fi
SEARCH_DIR="$PARENT"
done
if [[ -z "$CONFIG_PATH" && -f "$HOME/.clawdbot/credentials/portainer/config.json" ]]; then
CONFIG_PATH="$HOME/.clawdbot/credentials/portainer/config.json"
fi
err() {
echo "Error: $*" >&2
exit 1
}
require_cmd() {
command -v "$1" >/dev/null 2>&1 || err "Required command not found: $1"
}
require_cmd curl
require_cmd jq
[[ -f "$CONFIG_PATH" ]] || err "Missing config file: $CONFIG_PATH"
BASE_URL="$(jq -r '.base_url // empty' "$CONFIG_PATH")"
API_KEY="$(jq -r '.api_key // empty' "$CONFIG_PATH")"
[[ -n "$BASE_URL" ]] || err "config.base_url is missing"
[[ -n "$API_KEY" ]] || err "config.api_key is missing"
BASE_URL="${BASE_URL%/}"
response="$(curl -sS -w $'\n%{http_code}' \
-H "X-API-Key: $API_KEY" \
"$BASE_URL/api/stacks")"
http_code="$(printf '%s' "$response" | tail -n1)"
body="$(printf '%s' "$response" | sed '$d')"
if [[ "$http_code" -lt 200 || "$http_code" -ge 300 ]]; then
msg="$(printf '%s' "$body" | jq -r '.message // empty' 2>/dev/null || true)"
[[ -n "$msg" ]] || msg="$body"
err "Portainer API request failed (HTTP $http_code): $msg"
fi
printf '%s' "$body" | jq -r '.[] | "\(.Id)\t\(.Name)\t\(.Status)"'

View File

@@ -0,0 +1,159 @@
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SEARCH_DIR="$SCRIPT_DIR"
CONFIG_PATH=""
while true; do
CANDIDATE="$SEARCH_DIR/.clawdbot/credentials/portainer/config.json"
if [[ -f "$CANDIDATE" ]]; then
CONFIG_PATH="$CANDIDATE"
break
fi
PARENT="$(dirname "$SEARCH_DIR")"
if [[ "$PARENT" == "$SEARCH_DIR" ]]; then
break
fi
SEARCH_DIR="$PARENT"
done
if [[ -z "$CONFIG_PATH" && -f "$HOME/.clawdbot/credentials/portainer/config.json" ]]; then
CONFIG_PATH="$HOME/.clawdbot/credentials/portainer/config.json"
fi
err() {
echo "Error: $*" >&2
exit 1
}
require_cmd() {
command -v "$1" >/dev/null 2>&1 || err "Required command not found: $1"
}
usage() {
cat >&2 <<'EOF'
Usage: prune-stack-images.sh "<stack-name>"
Removes unused images that were previously associated with a stack.
Run this after an update-stack --pull completes to clean up old image versions.
Requires config at:
workspace .clawdbot/credentials/portainer/config.json or ~/.clawdbot/credentials/portainer/config.json
EOF
exit 2
}
[[ $# -eq 1 ]] || usage
STACK_NAME="$1"
require_cmd curl
require_cmd jq
[[ -f "$CONFIG_PATH" ]] || err "Missing config file: $CONFIG_PATH"
BASE_URL="$(jq -r '.base_url // empty' "$CONFIG_PATH")"
API_KEY="$(jq -r '.api_key // empty' "$CONFIG_PATH")"
[[ -n "$BASE_URL" ]] || err "config.base_url is missing"
[[ -n "$API_KEY" ]] || err "config.api_key is missing"
BASE_URL="${BASE_URL%/}"
# Resolve stack ID from name
STACK_ID="$(bash "$SCRIPT_DIR/get-stack-id.sh" "$STACK_NAME")"
[[ -n "$STACK_ID" ]] || err "Failed to resolve stack ID for: $STACK_NAME"
# Get stack info for EndpointId
STACK_INFO="$(curl -sS -w $'\n%{http_code}' \
-H "X-API-Key: $API_KEY" \
"$BASE_URL/api/stacks/$STACK_ID")"
http_code="$(printf '%s' "$STACK_INFO" | tail -n1)"
body="$(printf '%s' "$STACK_INFO" | sed '$d')"
if [[ "$http_code" -lt 200 || "$http_code" -ge 300 ]]; then
msg="$(printf '%s' "$body" | jq -r '.message // empty' 2>/dev/null || true)"
[[ -n "$msg" ]] || msg="$body"
err "Failed to fetch stack info (HTTP $http_code): $msg"
fi
ENDPOINT_ID="$(printf '%s' "$body" | jq -r '.EndpointId')"
[[ -n "$ENDPOINT_ID" && "$ENDPOINT_ID" != "null" ]] || err "Could not determine endpoint ID for stack"
# Get images currently in use by this stack's containers
FILTERS="$(jq -n --arg project "$STACK_NAME" '{"label": ["com.docker.compose.project=\($project)"]}')"
CONTAINERS_RESPONSE="$(curl -sS -w $'\n%{http_code}' \
-H "X-API-Key: $API_KEY" \
-G \
--data-urlencode "all=1" \
--data-urlencode "filters=$FILTERS" \
"$BASE_URL/api/endpoints/$ENDPOINT_ID/docker/containers/json")"
containers_http_code="$(printf '%s' "$CONTAINERS_RESPONSE" | tail -n1)"
containers_body="$(printf '%s' "$CONTAINERS_RESPONSE" | sed '$d')"
if [[ "$containers_http_code" -lt 200 || "$containers_http_code" -ge 300 ]]; then
err "Failed to fetch containers (HTTP $containers_http_code)"
fi
# Extract image names/repoTags used by current containers
CURRENT_IMAGES="$(printf '%s' "$containers_body" | jq -r '.[].Image' | sort -u)"
if [[ -z "$CURRENT_IMAGES" ]]; then
echo "No containers found for stack '$STACK_NAME'"
exit 0
fi
echo "Current images in use by stack '$STACK_NAME':"
echo "$CURRENT_IMAGES" | while read -r img; do echo " - $img"; done
# Get all images on the endpoint
IMAGES_RESPONSE="$(curl -sS -w $'\n%{http_code}' \
-H "X-API-Key: $API_KEY" \
"$BASE_URL/api/endpoints/$ENDPOINT_ID/docker/images/json?all=false")"
images_http_code="$(printf '%s' "$IMAGES_RESPONSE" | tail -n1)"
images_body="$(printf '%s' "$IMAGES_RESPONSE" | sed '$d')"
if [[ "$images_http_code" -lt 200 || "$images_http_code" -ge 300 ]]; then
err "Failed to fetch images (HTTP $images_http_code)"
fi
# Find dangling images (no RepoTags or only <none>)
DANGLING="$(printf '%s' "$images_body" | jq -r '.[] | select(.RepoTags == null or (.RepoTags | length == 0) or (.RepoTags[0] | startswith("<none>"))) | .Id')"
if [[ -z "$DANGLING" ]]; then
echo "No dangling images found"
exit 0
fi
echo ""
echo "Dangling images found:"
echo "$DANGLING" | while read -r img; do echo " - $img"; done
# Remove dangling images
echo ""
echo "Removing dangling images..."
for IMAGE_ID in $DANGLING; do
DELETE_RESPONSE="$(curl -sS -w $'\n%{http_code}' -X DELETE \
-H "X-API-Key: $API_KEY" \
"$BASE_URL/api/endpoints/$ENDPOINT_ID/docker/images/$IMAGE_ID?force=false")"
delete_http_code="$(printf '%s' "$DELETE_RESPONSE" | tail -n1)"
delete_body="$(printf '%s' "$DELETE_RESPONSE" | sed '$d')"
if [[ "$delete_http_code" -ge 200 && "$delete_http_code" -lt 300 ]]; then
echo "Removed: $IMAGE_ID"
else
msg="$(printf '%s' "$delete_body" | jq -r '.message // empty' 2>/dev/null || true)"
[[ -n "$msg" ]] || msg="$delete_body"
echo "Warning: Could not remove $IMAGE_ID: $msg"
fi
done
echo ""
echo "Pruning complete"

View File

@@ -0,0 +1,97 @@
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SEARCH_DIR="$SCRIPT_DIR"
CONFIG_PATH=""
while true; do
CANDIDATE="$SEARCH_DIR/.clawdbot/credentials/portainer/config.json"
if [[ -f "$CANDIDATE" ]]; then
CONFIG_PATH="$CANDIDATE"
break
fi
PARENT="$(dirname "$SEARCH_DIR")"
if [[ "$PARENT" == "$SEARCH_DIR" ]]; then
break
fi
SEARCH_DIR="$PARENT"
done
if [[ -z "$CONFIG_PATH" && -f "$HOME/.clawdbot/credentials/portainer/config.json" ]]; then
CONFIG_PATH="$HOME/.clawdbot/credentials/portainer/config.json"
fi
err() {
echo "Error: $*" >&2
exit 1
}
require_cmd() {
command -v "$1" >/dev/null 2>&1 || err "Required command not found: $1"
}
usage() {
cat >&2 <<'EOF'
Usage: restart-stack.sh "<stack-name>"
Restarts a Portainer stack by name (resolves ID automatically).
Requires config at:
workspace .clawdbot/credentials/portainer/config.json or ~/.clawdbot/credentials/portainer/config.json
EOF
exit 2
}
[[ $# -eq 1 ]] || usage
STACK_NAME="$1"
require_cmd curl
require_cmd jq
[[ -f "$CONFIG_PATH" ]] || err "Missing config file: $CONFIG_PATH"
BASE_URL="$(jq -r '.base_url // empty' "$CONFIG_PATH")"
API_KEY="$(jq -r '.api_key // empty' "$CONFIG_PATH")"
[[ -n "$BASE_URL" ]] || err "config.base_url is missing"
[[ -n "$API_KEY" ]] || err "config.api_key is missing"
BASE_URL="${BASE_URL%/}"
# Resolve stack ID from name
STACK_ID="$(bash "$SCRIPT_DIR/get-stack-id.sh" "$STACK_NAME")"
[[ -n "$STACK_ID" ]] || err "Failed to resolve stack ID for: $STACK_NAME"
# Get endpoint ID for the stack (required for restart API)
STACK_INFO="$(curl -sS -w $'\n%{http_code}' \
-H "X-API-Key: $API_KEY" \
"$BASE_URL/api/stacks/$STACK_ID")"
http_code="$(printf '%s' "$STACK_INFO" | tail -n1)"
body="$(printf '%s' "$STACK_INFO" | sed '$d')"
if [[ "$http_code" -lt 200 || "$http_code" -ge 300 ]]; then
msg="$(printf '%s' "$body" | jq -r '.message // empty' 2>/dev/null || true)"
[[ -n "$msg" ]] || msg="$body"
err "Failed to fetch stack info (HTTP $http_code): $msg"
fi
ENDPOINT_ID="$(printf '%s' "$body" | jq -r '.EndpointId')"
[[ -n "$ENDPOINT_ID" && "$ENDPOINT_ID" != "null" ]] || err "Could not determine endpoint ID for stack"
# Restart the stack
response="$(curl -sS -w $'\n%{http_code}' -X POST \
-H "X-API-Key: $API_KEY" \
"$BASE_URL/api/stacks/$STACK_ID/restart?endpointId=$ENDPOINT_ID")"
http_code="$(printf '%s' "$response" | tail -n1)"
body="$(printf '%s' "$response" | sed '$d')"
if [[ "$http_code" -lt 200 || "$http_code" -ge 300 ]]; then
msg="$(printf '%s' "$body" | jq -r '.message // empty' 2>/dev/null || true)"
[[ -n "$msg" ]] || msg="$body"
err "Failed to restart stack (HTTP $http_code): $msg"
fi
echo "Stack '$STACK_NAME' (ID: $STACK_ID) restarted successfully"

View File

@@ -0,0 +1,97 @@
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SEARCH_DIR="$SCRIPT_DIR"
CONFIG_PATH=""
while true; do
CANDIDATE="$SEARCH_DIR/.clawdbot/credentials/portainer/config.json"
if [[ -f "$CANDIDATE" ]]; then
CONFIG_PATH="$CANDIDATE"
break
fi
PARENT="$(dirname "$SEARCH_DIR")"
if [[ "$PARENT" == "$SEARCH_DIR" ]]; then
break
fi
SEARCH_DIR="$PARENT"
done
if [[ -z "$CONFIG_PATH" && -f "$HOME/.clawdbot/credentials/portainer/config.json" ]]; then
CONFIG_PATH="$HOME/.clawdbot/credentials/portainer/config.json"
fi
err() {
echo "Error: $*" >&2
exit 1
}
require_cmd() {
command -v "$1" >/dev/null 2>&1 || err "Required command not found: $1"
}
usage() {
cat >&2 <<'EOF'
Usage: start-stack.sh "<stack-name>"
Starts a Portainer stack by name (resolves ID automatically).
Requires config at:
workspace .clawdbot/credentials/portainer/config.json or ~/.clawdbot/credentials/portainer/config.json
EOF
exit 2
}
[[ $# -eq 1 ]] || usage
STACK_NAME="$1"
require_cmd curl
require_cmd jq
[[ -f "$CONFIG_PATH" ]] || err "Missing config file: $CONFIG_PATH"
BASE_URL="$(jq -r '.base_url // empty' "$CONFIG_PATH")"
API_KEY="$(jq -r '.api_key // empty' "$CONFIG_PATH")"
[[ -n "$BASE_URL" ]] || err "config.base_url is missing"
[[ -n "$API_KEY" ]] || err "config.api_key is missing"
BASE_URL="${BASE_URL%/}"
# Resolve stack ID from name
STACK_ID="$(bash "$SCRIPT_DIR/get-stack-id.sh" "$STACK_NAME")"
[[ -n "$STACK_ID" ]] || err "Failed to resolve stack ID for: $STACK_NAME"
# Get endpoint ID for the stack
STACK_INFO="$(curl -sS -w $'\n%{http_code}' \
-H "X-API-Key: $API_KEY" \
"$BASE_URL/api/stacks/$STACK_ID")"
http_code="$(printf '%s' "$STACK_INFO" | tail -n1)"
body="$(printf '%s' "$STACK_INFO" | sed '$d')"
if [[ "$http_code" -lt 200 || "$http_code" -ge 300 ]]; then
msg="$(printf '%s' "$body" | jq -r '.message // empty' 2>/dev/null || true)"
[[ -n "$msg" ]] || msg="$body"
err "Failed to fetch stack info (HTTP $http_code): $msg"
fi
ENDPOINT_ID="$(printf '%s' "$body" | jq -r '.EndpointId')"
[[ -n "$ENDPOINT_ID" && "$ENDPOINT_ID" != "null" ]] || err "Could not determine endpoint ID for stack"
# Start the stack
response="$(curl -sS -w $'\n%{http_code}' -X POST \
-H "X-API-Key: $API_KEY" \
"$BASE_URL/api/stacks/$STACK_ID/start?endpointId=$ENDPOINT_ID")"
http_code="$(printf '%s' "$response" | tail -n1)"
body="$(printf '%s' "$response" | sed '$d')"
if [[ "$http_code" -lt 200 || "$http_code" -ge 300 ]]; then
msg="$(printf '%s' "$body" | jq -r '.message // empty' 2>/dev/null || true)"
[[ -n "$msg" ]] || msg="$body"
err "Failed to start stack (HTTP $http_code): $msg"
fi
echo "Stack '$STACK_NAME' (ID: $STACK_ID) started successfully"

View File

@@ -0,0 +1,97 @@
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SEARCH_DIR="$SCRIPT_DIR"
CONFIG_PATH=""
while true; do
CANDIDATE="$SEARCH_DIR/.clawdbot/credentials/portainer/config.json"
if [[ -f "$CANDIDATE" ]]; then
CONFIG_PATH="$CANDIDATE"
break
fi
PARENT="$(dirname "$SEARCH_DIR")"
if [[ "$PARENT" == "$SEARCH_DIR" ]]; then
break
fi
SEARCH_DIR="$PARENT"
done
if [[ -z "$CONFIG_PATH" && -f "$HOME/.clawdbot/credentials/portainer/config.json" ]]; then
CONFIG_PATH="$HOME/.clawdbot/credentials/portainer/config.json"
fi
err() {
echo "Error: $*" >&2
exit 1
}
require_cmd() {
command -v "$1" >/dev/null 2>&1 || err "Required command not found: $1"
}
usage() {
cat >&2 <<'EOF'
Usage: stop-stack.sh "<stack-name>"
Stops a Portainer stack by name (resolves ID automatically).
Requires config at:
workspace .clawdbot/credentials/portainer/config.json or ~/.clawdbot/credentials/portainer/config.json
EOF
exit 2
}
[[ $# -eq 1 ]] || usage
STACK_NAME="$1"
require_cmd curl
require_cmd jq
[[ -f "$CONFIG_PATH" ]] || err "Missing config file: $CONFIG_PATH"
BASE_URL="$(jq -r '.base_url // empty' "$CONFIG_PATH")"
API_KEY="$(jq -r '.api_key // empty' "$CONFIG_PATH")"
[[ -n "$BASE_URL" ]] || err "config.base_url is missing"
[[ -n "$API_KEY" ]] || err "config.api_key is missing"
BASE_URL="${BASE_URL%/}"
# Resolve stack ID from name
STACK_ID="$(bash "$SCRIPT_DIR/get-stack-id.sh" "$STACK_NAME")"
[[ -n "$STACK_ID" ]] || err "Failed to resolve stack ID for: $STACK_NAME"
# Get endpoint ID for the stack
STACK_INFO="$(curl -sS -w $'\n%{http_code}' \
-H "X-API-Key: $API_KEY" \
"$BASE_URL/api/stacks/$STACK_ID")"
http_code="$(printf '%s' "$STACK_INFO" | tail -n1)"
body="$(printf '%s' "$STACK_INFO" | sed '$d')"
if [[ "$http_code" -lt 200 || "$http_code" -ge 300 ]]; then
msg="$(printf '%s' "$body" | jq -r '.message // empty' 2>/dev/null || true)"
[[ -n "$msg" ]] || msg="$body"
err "Failed to fetch stack info (HTTP $http_code): $msg"
fi
ENDPOINT_ID="$(printf '%s' "$body" | jq -r '.EndpointId')"
[[ -n "$ENDPOINT_ID" && "$ENDPOINT_ID" != "null" ]] || err "Could not determine endpoint ID for stack"
# Stop the stack
response="$(curl -sS -w $'\n%{http_code}' -X POST \
-H "X-API-Key: $API_KEY" \
"$BASE_URL/api/stacks/$STACK_ID/stop?endpointId=$ENDPOINT_ID")"
http_code="$(printf '%s' "$response" | tail -n1)"
body="$(printf '%s' "$response" | sed '$d')"
if [[ "$http_code" -lt 200 || "$http_code" -ge 300 ]]; then
msg="$(printf '%s' "$body" | jq -r '.message // empty' 2>/dev/null || true)"
[[ -n "$msg" ]] || msg="$body"
err "Failed to stop stack (HTTP $http_code): $msg"
fi
echo "Stack '$STACK_NAME' (ID: $STACK_ID) stopped successfully"

View File

@@ -0,0 +1,250 @@
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SEARCH_DIR="$SCRIPT_DIR"
CONFIG_PATH=""
while true; do
CANDIDATE="$SEARCH_DIR/.clawdbot/credentials/portainer/config.json"
if [[ -f "$CANDIDATE" ]]; then
CONFIG_PATH="$CANDIDATE"
break
fi
PARENT="$(dirname "$SEARCH_DIR")"
if [[ "$PARENT" == "$SEARCH_DIR" ]]; then
break
fi
SEARCH_DIR="$PARENT"
done
if [[ -z "$CONFIG_PATH" && -f "$HOME/.clawdbot/credentials/portainer/config.json" ]]; then
CONFIG_PATH="$HOME/.clawdbot/credentials/portainer/config.json"
fi
err() {
echo "Error: $*" >&2
exit 1
}
require_cmd() {
command -v "$1" >/dev/null 2>&1 || err "Required command not found: $1"
}
usage() {
cat >&2 <<'EOF'
Usage: update-stack.sh "<stack-name>" "<compose-file>" [options]
Updates a Portainer stack by name with a new docker-compose file.
Preserves existing env vars unless --env-file is provided.
Arguments:
stack-name Name of the stack to update
compose-file Path to the new docker-compose.yml file
Options:
--env-file <file> Path to a file with env vars (format: NAME=value per line)
If not provided, existing env vars are preserved.
--pull Force pull images and redeploy (like docker compose down/pull/up).
--prune-old After update, remove images that were used by this stack
but are no longer in use (old versions left dangling).
Requires config at:
workspace .clawdbot/credentials/portainer/config.json or ~/.clawdbot/credentials/portainer/config.json
EOF
exit 2
}
# Parse arguments
[[ $# -ge 2 ]] || usage
STACK_NAME="$1"
COMPOSE_FILE="$2"
shift 2
ENV_FILE=""
PRUNE_OLD=false
PULL_IMAGE=false
while [[ $# -gt 0 ]]; do
case "$1" in
--env-file)
[[ $# -ge 2 ]] || err "--env-file requires a value"
ENV_FILE="$2"
shift 2
;;
--pull)
PULL_IMAGE=true
shift
;;
--prune-old)
PRUNE_OLD=true
shift
;;
*)
err "Unknown option: $1"
;;
esac
done
require_cmd curl
require_cmd jq
[[ -f "$CONFIG_PATH" ]] || err "Missing config file: $CONFIG_PATH"
[[ -f "$COMPOSE_FILE" ]] || err "Compose file not found: $COMPOSE_FILE"
BASE_URL="$(jq -r '.base_url // empty' "$CONFIG_PATH")"
API_KEY="$(jq -r '.api_key // empty' "$CONFIG_PATH")"
[[ -n "$BASE_URL" ]] || err "config.base_url is missing"
[[ -n "$API_KEY" ]] || err "config.api_key is missing"
BASE_URL="${BASE_URL%/}"
# Resolve stack ID from name
STACK_ID="$(bash "$SCRIPT_DIR/get-stack-id.sh" "$STACK_NAME")"
[[ -n "$STACK_ID" ]] || err "Failed to resolve stack ID for: $STACK_NAME"
# Get current stack info for EndpointId and existing env
STACK_INFO="$(curl -sS -w $'\n%{http_code}' \
-H "X-API-Key: $API_KEY" \
"$BASE_URL/api/stacks/$STACK_ID")"
http_code="$(printf '%s' "$STACK_INFO" | tail -n1)"
body="$(printf '%s' "$STACK_INFO" | sed '$d')"
if [[ "$http_code" -lt 200 || "$http_code" -ge 300 ]]; then
msg="$(printf '%s' "$body" | jq -r '.message // empty' 2>/dev/null || true)"
[[ -n "$msg" ]] || msg="$body"
err "Failed to fetch stack info (HTTP $http_code): $msg"
fi
ENDPOINT_ID="$(printf '%s' "$body" | jq -r '.EndpointId')"
[[ -n "$ENDPOINT_ID" && "$ENDPOINT_ID" != "null" ]] || err "Could not determine endpoint ID for stack"
# Capture old image IDs before update (if --prune-old)
OLD_IMAGE_IDS=""
if [[ "$PRUNE_OLD" == true ]]; then
# Get containers for this stack using the compose project label
# The label format is: com.docker.compose.project=<stack-name>
FILTERS="$(jq -n --arg project "$STACK_NAME" '{"label": ["com.docker.compose.project=\($project)"]}')"
CONTAINERS_RESPONSE="$(curl -sS -w $'\n%{http_code}' \
-H "X-API-Key: $API_KEY" \
-G \
--data-urlencode "all=1" \
--data-urlencode "filters=$FILTERS" \
"$BASE_URL/api/endpoints/$ENDPOINT_ID/docker/containers/json")"
containers_http_code="$(printf '%s' "$CONTAINERS_RESPONSE" | tail -n1)"
containers_body="$(printf '%s' "$CONTAINERS_RESPONSE" | sed '$d')"
if [[ "$containers_http_code" -ge 200 && "$containers_http_code" -lt 300 ]]; then
# Extract image IDs from containers
OLD_IMAGE_IDS="$(printf '%s' "$containers_body" | jq -r '.[].ImageID // empty' | sort -u | tr '\n' ' ' | sed 's/ $//')"
if [[ -n "$OLD_IMAGE_IDS" ]]; then
echo "Captured old image IDs for pruning: $OLD_IMAGE_IDS"
else
echo "No existing container images found to track for pruning"
PRUNE_OLD=false
fi
else
echo "Warning: Could not fetch containers for prune tracking, skipping prune"
PRUNE_OLD=false
fi
fi
# Determine env vars to use
if [[ -n "$ENV_FILE" ]]; then
# Read env file and convert to Portainer format
[[ -f "$ENV_FILE" ]] || err "Env file not found: $ENV_FILE"
ENV_JSON="$(grep -v '^#' "$ENV_FILE" | grep -v '^$' | while IFS='=' read -r name value; do
[[ -n "$name" ]] && printf '{"name":"%s","value":"%s"},' "$name" "$value"
done | sed 's/,$//')"
ENV_PAYLOAD="[$ENV_JSON]"
else
# Preserve existing env vars
ENV_PAYLOAD="$(printf '%s' "$body" | jq '.Env // []')"
fi
# Read new compose file content
COMPOSE_CONTENT="$(cat "$COMPOSE_FILE")"
# Build JSON payload (include pullImage if --pull flag is set)
if [[ "$PULL_IMAGE" == true ]]; then
PAYLOAD="$(jq -n \
--argjson env "$ENV_PAYLOAD" \
--arg compose "$COMPOSE_CONTENT" \
'{Env: $env, StackFileContent: $compose, pullImage: true}')"
else
PAYLOAD="$(jq -n \
--argjson env "$ENV_PAYLOAD" \
--arg compose "$COMPOSE_CONTENT" \
'{Env: $env, StackFileContent: $compose}')"
fi
# Build URL
UPDATE_URL="$BASE_URL/api/stacks/$STACK_ID?endpointId=$ENDPOINT_ID"
# Update the stack
response="$(curl -sS -w $'\n%{http_code}' -X PUT \
-H "X-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d "$PAYLOAD" \
"$UPDATE_URL")"
http_code="$(printf '%s' "$response" | tail -n1)"
body="$(printf '%s' "$response" | sed '$d')"
if [[ "$http_code" -lt 200 || "$http_code" -ge 300 ]]; then
msg="$(printf '%s' "$body" | jq -r '.message // empty' 2>/dev/null || true)"
[[ -n "$msg" ]] || msg="$body"
err "Failed to update stack (HTTP $http_code): $msg"
fi
echo "Stack '$STACK_NAME' (ID: $STACK_ID) updated successfully"
# Prune old images if requested and we captured image IDs
if [[ "$PRUNE_OLD" == true && -n "$OLD_IMAGE_IDS" ]]; then
echo "Pruning old images no longer in use..."
for IMAGE_ID in $OLD_IMAGE_IDS; do
# Check if image is still in use by any container
IMAGE_FILTERS="$(jq -n --arg imageId "$IMAGE_ID" '{"ancestor": [$imageId]}')"
USAGE_RESPONSE="$(curl -sS -w $'\n%{http_code}' \
-H "X-API-Key: $API_KEY" \
-G \
--data-urlencode "all=1" \
--data-urlencode "filters=$IMAGE_FILTERS" \
"$BASE_URL/api/endpoints/$ENDPOINT_ID/docker/containers/json")"
usage_http_code="$(printf '%s' "$USAGE_RESPONSE" | tail -n1)"
usage_body="$(printf '%s' "$USAGE_RESPONSE" | sed '$d')"
if [[ "$usage_http_code" -ge 200 && "$usage_http_code" -lt 300 ]]; then
container_count="$(printf '%s' "$usage_body" | jq 'length')"
if [[ "$container_count" -eq 0 ]]; then
# Image not in use, safe to remove
DELETE_RESPONSE="$(curl -sS -w $'\n%{http_code}' -X DELETE \
-H "X-API-Key: $API_KEY" \
"$BASE_URL/api/endpoints/$ENDPOINT_ID/docker/images/$IMAGE_ID?force=false")"
delete_http_code="$(printf '%s' "$DELETE_RESPONSE" | tail -n1)"
delete_body="$(printf '%s' "$DELETE_RESPONSE" | sed '$d')"
if [[ "$delete_http_code" -ge 200 && "$delete_http_code" -lt 300 ]]; then
echo "Removed unused image: $IMAGE_ID"
else
msg="$(printf '%s' "$delete_body" | jq -r '.message // empty' 2>/dev/null || true)"
[[ -n "$msg" ]] || msg="$delete_body"
echo "Warning: Could not remove image $IMAGE_ID: $msg"
fi
else
echo "Image $IMAGE_ID still in use by $container_count container(s), skipping"
fi
fi
done
echo "Pruning complete"
fi

46
skills/searxng/SKILL.md Normal file
View File

@@ -0,0 +1,46 @@
---
name: searxng
description: Search the web through a local or self-hosted SearXNG instance. Use when you want privacy-respecting web, image, video, or news search without external search API keys, via scripts/searxng.py.
---
# SearXNG Search
Use `scripts/searxng.py` to run searches against a SearXNG instance.
## Requirements
Required runtime:
- `python3`
- Python packages: `httpx`, `rich`
Configuration lookup order:
1. `SEARXNG_URL` environment variable
2. workspace `.clawdbot/credentials/searxng/config.json` found by walking upward from the script location
3. `~/.clawdbot/credentials/searxng/config.json`
4. fallback: `http://localhost:8080`
Config file shape if used:
```json
{
"url": "https://search.example.com"
}
```
## Usage
Examples:
```bash
python3 scripts/searxng.py search "OpenClaw" -n 5
python3 scripts/searxng.py search "latest AI news" --category news -n 10
python3 scripts/searxng.py search "cute cats" --category images --format json
python3 scripts/searxng.py search "rust tutorial" --language en --time-range month
```
## Notes
- Uses the SearXNG JSON API endpoint at `<SEARXNG_URL>/search`.
- HTTPS certificate verification is disabled in the current script for compatibility with local/self-signed instances.
- If connection fails, the script prints the URL it attempted.
- `--format json` is best for programmatic use; table output is best for humans.

View File

@@ -0,0 +1,253 @@
#!/usr/bin/env python3
# /// script
# requires-python = ">=3.11"
# dependencies = ["httpx", "rich"]
# ///
"""SearXNG CLI - Privacy-respecting metasearch via your local instance."""
import argparse
import os
import sys
import json
import warnings
from pathlib import Path
import httpx
from rich.console import Console
from rich.table import Table
from rich import print as rprint
# Suppress SSL warnings for local self-signed certificates
warnings.filterwarnings('ignore', message='Unverified HTTPS request')
console = Console()
def candidate_searxng_urls():
urls = []
env_url = os.getenv("SEARXNG_URL")
if env_url:
urls.append(env_url)
current = Path(__file__).resolve().parent
for base in [current, *current.parents]:
cfg = base / ".clawdbot" / "credentials" / "searxng" / "config.json"
if cfg.is_file():
try:
data = json.loads(cfg.read_text())
url = data.get("url") or data.get("SEARXNG_URL")
if isinstance(url, str) and url:
urls.append(url)
except Exception:
pass
urls.extend([
os.path.expanduser("~/.clawdbot/credentials/searxng/config.json"),
])
# Read optional home config file path entries
expanded = []
for item in urls:
if isinstance(item, str) and item.endswith("config.json") and os.path.isfile(item):
try:
data = json.loads(Path(item).read_text())
url = data.get("url") or data.get("SEARXNG_URL")
if isinstance(url, str) and url:
expanded.append(url)
except Exception:
pass
else:
expanded.append(item)
expanded.append("http://localhost:8080")
return list(dict.fromkeys([u for u in expanded if u]))
SEARXNG_URL = candidate_searxng_urls()[0]
def search_searxng(
query: str,
limit: int = 10,
category: str = "general",
language: str = "auto",
time_range: str = None,
output_format: str = "table"
) -> dict:
"""
Search using SearXNG instance.
Args:
query: Search query string
limit: Number of results to return
category: Search category (general, images, news, videos, etc.)
language: Language code (auto, en, de, fr, etc.)
time_range: Time range filter (day, week, month, year)
output_format: Output format (table, json)
Returns:
Dict with search results
"""
params = {
"q": query,
"format": "json",
"categories": category,
}
if language != "auto":
params["language"] = language
if time_range:
params["time_range"] = time_range
try:
# Disable SSL verification for local self-signed certs
response = httpx.get(
f"{SEARXNG_URL}/search",
params=params,
timeout=30,
verify=False # For local self-signed certs
)
response.raise_for_status()
data = response.json()
# Limit results
if "results" in data:
data["results"] = data["results"][:limit]
return data
except httpx.HTTPError as e:
console.print(f"[red]Error connecting to SearXNG at {SEARXNG_URL}:[/red] {e}")
return {"error": str(e), "results": []}
except Exception as e:
console.print(f"[red]Unexpected error while using {SEARXNG_URL}:[/red] {e}")
return {"error": str(e), "results": []}
def display_results_table(data: dict, query: str):
"""Display search results in a rich table."""
results = data.get("results", [])
if not results:
rprint(f"[yellow]No results found for:[/yellow] {query}")
return
table = Table(title=f"SearXNG Search: {query}", show_lines=False)
table.add_column("#", style="dim", width=3)
table.add_column("Title", style="bold")
table.add_column("URL", style="blue", width=50)
table.add_column("Engines", style="green", width=20)
for i, result in enumerate(results, 1):
title = result.get("title", "No title")[:70]
url = result.get("url", "")[:45] + "..."
engines = ", ".join(result.get("engines", []))[:18]
table.add_row(
str(i),
title,
url,
engines
)
console.print(table)
# Show additional info
if data.get("number_of_results"):
rprint(f"\n[dim]Total results available: {data['number_of_results']}[/dim]")
# Show content snippets for top 3
rprint("\n[bold]Top results:[/bold]")
for i, result in enumerate(results[:3], 1):
title = result.get("title", "No title")
url = result.get("url", "")
content = result.get("content", "")[:200]
rprint(f"\n[bold cyan]{i}. {title}[/bold cyan]")
rprint(f" [blue]{url}[/blue]")
if content:
rprint(f" [dim]{content}...[/dim]")
def display_results_json(data: dict):
"""Display results in JSON format for programmatic use."""
print(json.dumps(data, indent=2))
def main():
parser = argparse.ArgumentParser(
description="SearXNG CLI - Search the web via your local SearXNG instance",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=f"""
Examples:
%(prog)s search "python asyncio"
%(prog)s search "climate change" -n 20
%(prog)s search "cute cats" --category images
%(prog)s search "breaking news" --category news --time-range day
%(prog)s search "rust tutorial" --format json
Environment:
SEARXNG_URL: SearXNG instance URL (default: {SEARXNG_URL})
"""
)
subparsers = parser.add_subparsers(dest="command", help="Commands")
# Search command
search_parser = subparsers.add_parser("search", help="Search the web")
search_parser.add_argument("query", nargs="+", help="Search query")
search_parser.add_argument(
"-n", "--limit",
type=int,
default=10,
help="Number of results (default: 10)"
)
search_parser.add_argument(
"-c", "--category",
default="general",
choices=["general", "images", "videos", "news", "map", "music", "files", "it", "science"],
help="Search category (default: general)"
)
search_parser.add_argument(
"-l", "--language",
default="auto",
help="Language code (auto, en, de, fr, etc.)"
)
search_parser.add_argument(
"-t", "--time-range",
choices=["day", "week", "month", "year"],
help="Time range filter"
)
search_parser.add_argument(
"-f", "--format",
choices=["table", "json"],
default="table",
help="Output format (default: table)"
)
args = parser.parse_args()
if not args.command:
parser.print_help()
return
if args.command == "search":
query = " ".join(args.query)
data = search_searxng(
query=query,
limit=args.limit,
category=args.category,
language=args.language,
time_range=args.time_range,
output_format=args.format
)
if args.format == "json":
display_results_json(data)
else:
display_results_table(data, query)
if __name__ == "__main__":
main()

3
skills/web-automation/.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
node_modules/
*.log
.DS_Store

View File

@@ -0,0 +1,104 @@
---
name: web-automation
description: Browse and scrape web pages using Playwright with Camoufox anti-detection browser. Use when automating web workflows, extracting rendered page content, handling authenticated sessions, or scraping websites with bot protection.
---
# Web Automation with Camoufox (Codex)
Automated web browsing and scraping using Playwright with two execution paths under one skill:
- one-shot extraction via `extract.js`
- broader stateful automation via Camoufox and the existing `auth.ts`, `browse.ts`, `flow.ts`, and `scrape.ts`
## When To Use Which Command
- Use `node scripts/extract.js "<URL>"` for one-shot extraction from a single URL when you need rendered content, bounded stealth behavior, and JSON output.
- Use `npx tsx scrape.ts ...` when you need markdown output, Readability extraction, full-page cleanup, or selector-based scraping.
- Use `npx tsx browse.ts ...`, `auth.ts`, or `flow.ts` when the task needs interactive navigation, persistent sessions, login handling, click/type actions, or multi-step workflows.
## Requirements
- Node.js 20+
- pnpm
- Network access to download browser binaries
## First-Time Setup
```bash
cd ~/.openclaw/workspace/skills/web-automation/scripts
pnpm install
npx playwright install chromium
npx camoufox-js fetch
pnpm approve-builds
pnpm rebuild better-sqlite3 esbuild
```
## Prerequisite Check (MANDATORY)
Before running any automation, verify Playwright + Camoufox dependencies are installed and scripts are configured to use Camoufox.
```bash
cd ~/.openclaw/workspace/skills/web-automation/scripts
node -e "require.resolve('playwright/package.json');require.resolve('playwright-core/package.json');require.resolve('camoufox-js/package.json');console.log('OK: playwright + playwright-core + camoufox-js installed')"
node -e "const fs=require('fs');const t=fs.readFileSync('browse.ts','utf8');if(!/camoufox-js/.test(t)){throw new Error('browse.ts is not configured for Camoufox')}console.log('OK: Camoufox integration detected in browse.ts')"
```
If any check fails, stop and return:
"Missing dependency/config: web-automation requires `playwright`, `playwright-core`, and `camoufox-js` with Camoufox-based scripts. Run setup in this skill, then retry."
If runtime fails with missing native bindings for `better-sqlite3` or `esbuild`, run:
```bash
cd ~/.openclaw/workspace/skills/web-automation/scripts
pnpm approve-builds
pnpm rebuild better-sqlite3 esbuild
```
## Quick Reference
- One-shot JSON extract: `node scripts/extract.js "https://example.com"`
- Browse page: `npx tsx browse.ts --url "https://example.com"`
- Scrape markdown: `npx tsx scrape.ts --url "https://example.com" --mode main --output page.md`
- Authenticate: `npx tsx auth.ts --url "https://example.com/login"`
- Natural-language flow: `npx tsx flow.ts --instruction 'go to https://example.com then click on "Login" then type "user@example.com" in #email then press enter'`
## One-shot extraction
Use `extract.js` when you need a single page fetch with JavaScript rendering and lightweight anti-bot shaping, but not a full automation session.
```bash
node scripts/extract.js "https://example.com"
WAIT_TIME=5000 node scripts/extract.js "https://example.com"
SCREENSHOT_PATH=/tmp/page.png SAVE_HTML=true node scripts/extract.js "https://example.com"
```
Output is JSON only and includes fields such as:
- `requestedUrl`
- `finalUrl`
- `title`
- `content`
- `metaDescription`
- `status`
- `elapsedSeconds`
- `challengeDetected`
- optional `screenshot`
- optional `htmlFile`
## General flow runner
Use `flow.ts` for multi-step commands in plain language (go/click/type/press/wait/screenshot).
Example:
```bash
npx tsx flow.ts --instruction 'go to https://search.fiorinis.com then type "pippo" then press enter then wait 2s'
```
## Notes
- Sessions persist in Camoufox profile storage.
- Use `--wait` for dynamic pages.
- Use `--mode selector --selector "..."` for targeted extraction.
- `extract.js` keeps stealth and bounded anti-bot shaping while keeping the Chromium sandbox enabled.

View File

@@ -0,0 +1,575 @@
#!/usr/bin/env npx tsx
/**
* Authentication handler for web automation
* Supports generic form login and Microsoft SSO (MSAL)
*
* Usage:
* npx tsx auth.ts --url "https://example.com/login" --type form
* npx tsx auth.ts --url "https://example.com" --type msal
* npx tsx auth.ts --url "https://example.com" --type auto
*/
import { getPage, launchBrowser } from './browse.js';
import parseArgs from 'minimist';
import type { Page, BrowserContext } from 'playwright-core';
import { createInterface } from 'readline';
// Types
type AuthType = 'auto' | 'form' | 'msal';
interface AuthOptions {
url: string;
authType: AuthType;
credentials?: {
username: string;
password: string;
};
headless?: boolean;
timeout?: number;
}
interface AuthResult {
success: boolean;
finalUrl: string;
authType: AuthType;
message: string;
}
// Get credentials from environment or options
function getCredentials(options?: {
username?: string;
password?: string;
}): { username: string; password: string } | null {
const username = options?.username || process.env.CAMOUFOX_USERNAME;
const password = options?.password || process.env.CAMOUFOX_PASSWORD;
if (!username || !password) {
return null;
}
return { username, password };
}
// Prompt user for input (for MFA or credentials)
async function promptUser(question: string, hidden = false): Promise<string> {
const rl = createInterface({
input: process.stdin,
output: process.stdout,
});
return new Promise((resolve) => {
if (hidden) {
process.stdout.write(question);
// Note: This is a simple implementation. For production, use a proper hidden input library
}
rl.question(question, (answer) => {
rl.close();
resolve(answer);
});
});
}
// Detect authentication type from page
async function detectAuthType(page: Page): Promise<AuthType> {
const url = page.url();
// Check for Microsoft login
if (
url.includes('login.microsoftonline.com') ||
url.includes('login.live.com') ||
url.includes('login.windows.net')
) {
return 'msal';
}
// Check for common form login patterns
const hasLoginForm = await page.evaluate(() => {
const passwordField = document.querySelector(
'input[type="password"], input[name*="password"], input[id*="password"]'
);
const usernameField = document.querySelector(
'input[type="email"], input[type="text"][name*="user"], input[type="text"][name*="email"], input[id*="user"], input[id*="email"]'
);
return !!(passwordField && usernameField);
});
if (hasLoginForm) {
return 'form';
}
return 'auto';
}
// Handle generic form login
async function handleFormLogin(
page: Page,
credentials: { username: string; password: string },
timeout: number
): Promise<boolean> {
console.log('Attempting form login...');
// Find and fill username/email field
const usernameSelectors = [
'input[type="email"]',
'input[name*="user" i]',
'input[name*="email" i]',
'input[id*="user" i]',
'input[id*="email" i]',
'input[autocomplete="username"]',
'input[type="text"]:first-of-type',
];
let usernameField = null;
for (const selector of usernameSelectors) {
usernameField = await page.$(selector);
if (usernameField) break;
}
if (!usernameField) {
console.error('Could not find username/email field');
return false;
}
await usernameField.fill(credentials.username);
console.log('Filled username field');
// Find and fill password field
const passwordSelectors = [
'input[type="password"]',
'input[name*="password" i]',
'input[id*="password" i]',
'input[autocomplete="current-password"]',
];
let passwordField = null;
for (const selector of passwordSelectors) {
passwordField = await page.$(selector);
if (passwordField) break;
}
if (!passwordField) {
console.error('Could not find password field');
return false;
}
await passwordField.fill(credentials.password);
console.log('Filled password field');
// Check for "Remember me" checkbox and check it
const rememberCheckbox = await page.$(
'input[type="checkbox"][name*="remember" i], input[type="checkbox"][id*="remember" i]'
);
if (rememberCheckbox) {
await rememberCheckbox.check();
console.log('Checked "Remember me" checkbox');
}
// Find and click submit button
const submitSelectors = [
'button[type="submit"]',
'input[type="submit"]',
'button:has-text("Sign in")',
'button:has-text("Log in")',
'button:has-text("Login")',
'button:has-text("Submit")',
'[role="button"]:has-text("Sign in")',
];
let submitButton = null;
for (const selector of submitSelectors) {
submitButton = await page.$(selector);
if (submitButton) break;
}
if (!submitButton) {
// Try pressing Enter as fallback
await passwordField.press('Enter');
} else {
await submitButton.click();
}
console.log('Submitted login form');
// Wait for navigation or error
try {
await page.waitForNavigation({ timeout, waitUntil: 'domcontentloaded' });
return true;
} catch {
// Check if we're still on login page with error
const errorMessages = await page.$$eval(
'.error, .alert-danger, [role="alert"], .login-error',
(els) => els.map((el) => el.textContent?.trim()).filter(Boolean)
);
if (errorMessages.length > 0) {
console.error('Login error:', errorMessages.join(', '));
return false;
}
return true; // Might have succeeded without navigation
}
}
// Handle Microsoft SSO login
async function handleMsalLogin(
page: Page,
credentials: { username: string; password: string },
timeout: number
): Promise<boolean> {
console.log('Attempting Microsoft SSO login...');
const currentUrl = page.url();
// If not already on Microsoft login, wait for redirect
if (!currentUrl.includes('login.microsoftonline.com')) {
try {
await page.waitForURL('**/login.microsoftonline.com/**', { timeout: 10000 });
} catch {
console.log('Not redirected to Microsoft login');
return false;
}
}
// Wait for email input
const emailInput = await page.waitForSelector(
'input[type="email"], input[name="loginfmt"]',
{ timeout }
);
if (!emailInput) {
console.error('Could not find email input on Microsoft login');
return false;
}
// Fill email and submit
await emailInput.fill(credentials.username);
console.log('Filled email field');
const nextButton = await page.$('input[type="submit"], button[type="submit"]');
if (nextButton) {
await nextButton.click();
} else {
await emailInput.press('Enter');
}
// Wait for password page
try {
await page.waitForSelector(
'input[type="password"], input[name="passwd"]',
{ timeout }
);
} catch {
// Might be using passwordless auth or different flow
console.log('Password field not found - might be using different auth flow');
return false;
}
// Fill password
const passwordInput = await page.$('input[type="password"], input[name="passwd"]');
if (!passwordInput) {
console.error('Could not find password input');
return false;
}
await passwordInput.fill(credentials.password);
console.log('Filled password field');
// Submit
const signInButton = await page.$('input[type="submit"], button[type="submit"]');
if (signInButton) {
await signInButton.click();
} else {
await passwordInput.press('Enter');
}
// Handle "Stay signed in?" prompt
try {
const staySignedInButton = await page.waitForSelector(
'input[value="Yes"], button:has-text("Yes")',
{ timeout: 5000 }
);
if (staySignedInButton) {
await staySignedInButton.click();
console.log('Clicked "Stay signed in" button');
}
} catch {
// Prompt might not appear
}
// Check for Conditional Access Policy error
const caError = await page.$('text=Conditional Access policy');
if (caError) {
console.error('Blocked by Conditional Access Policy');
// Take screenshot for debugging
await page.screenshot({ path: 'ca-policy-error.png' });
console.log('Screenshot saved: ca-policy-error.png');
return false;
}
// Wait for redirect away from Microsoft login
try {
await page.waitForURL(
(url) => !url.href.includes('login.microsoftonline.com'),
{ timeout }
);
return true;
} catch {
return false;
}
}
// Check if user is already authenticated
async function isAuthenticated(page: Page, targetUrl: string): Promise<boolean> {
const currentUrl = page.url();
// If we're on the target URL (not a login page), we're likely authenticated
if (currentUrl.startsWith(targetUrl)) {
// Check for common login page indicators
const isLoginPage = await page.evaluate(() => {
const loginIndicators = [
'input[type="password"]',
'form[action*="login"]',
'form[action*="signin"]',
'.login-form',
'#login',
];
return loginIndicators.some((sel) => document.querySelector(sel) !== null);
});
return !isLoginPage;
}
return false;
}
// Main authentication function
export async function authenticate(options: AuthOptions): Promise<AuthResult> {
const browser = await launchBrowser({ headless: options.headless ?? true });
const page = await browser.newPage();
const timeout = options.timeout ?? 30000;
try {
// Navigate to URL
console.log(`Navigating to: ${options.url}`);
await page.goto(options.url, { timeout: 60000, waitUntil: 'domcontentloaded' });
// Check if already authenticated
if (await isAuthenticated(page, options.url)) {
return {
success: true,
finalUrl: page.url(),
authType: 'auto',
message: 'Already authenticated (session persisted from profile)',
};
}
// Get credentials
const credentials = options.credentials
? options.credentials
: getCredentials();
if (!credentials) {
// No credentials - open interactive browser
console.log('\nNo credentials provided. Opening browser for manual login...');
console.log('Please complete the login process manually.');
console.log('The session will be saved to your profile.');
// Switch to headed mode for manual login
await browser.close();
const interactiveBrowser = await launchBrowser({ headless: false });
const interactivePage = await interactiveBrowser.newPage();
await interactivePage.goto(options.url);
await promptUser('\nPress Enter when you have completed login...');
const finalUrl = interactivePage.url();
await interactiveBrowser.close();
return {
success: true,
finalUrl,
authType: 'auto',
message: 'Manual login completed - session saved to profile',
};
}
// Detect auth type if auto
let authType = options.authType;
if (authType === 'auto') {
authType = await detectAuthType(page);
console.log(`Detected auth type: ${authType}`);
}
// Handle authentication based on type
let success = false;
switch (authType) {
case 'msal':
success = await handleMsalLogin(page, credentials, timeout);
break;
case 'form':
default:
success = await handleFormLogin(page, credentials, timeout);
break;
}
const finalUrl = page.url();
return {
success,
finalUrl,
authType,
message: success
? `Authentication successful - session saved to profile`
: 'Authentication failed',
};
} finally {
await browser.close();
}
}
// Navigate to authenticated page (handles auth if needed)
export async function navigateAuthenticated(
url: string,
options?: {
credentials?: { username: string; password: string };
headless?: boolean;
}
): Promise<{ page: Page; browser: BrowserContext }> {
const { page, browser } = await getPage({ headless: options?.headless ?? true });
await page.goto(url, { timeout: 60000, waitUntil: 'domcontentloaded' });
// Check if we need to authenticate
if (!(await isAuthenticated(page, url))) {
console.log('Session expired or not authenticated. Attempting login...');
// Get credentials
const credentials = options?.credentials ?? getCredentials();
if (!credentials) {
throw new Error(
'Authentication required but no credentials provided. ' +
'Set CAMOUFOX_USERNAME and CAMOUFOX_PASSWORD environment variables.'
);
}
// Detect and handle auth
const authType = await detectAuthType(page);
let success = false;
if (authType === 'msal') {
success = await handleMsalLogin(page, credentials, 30000);
} else {
success = await handleFormLogin(page, credentials, 30000);
}
if (!success) {
await browser.close();
throw new Error('Authentication failed');
}
// Navigate back to original URL if we were redirected
if (!page.url().startsWith(url)) {
await page.goto(url, { timeout: 60000, waitUntil: 'domcontentloaded' });
}
}
return { page, browser };
}
// CLI entry point
async function main() {
const args = parseArgs(process.argv.slice(2), {
string: ['url', 'type', 'username', 'password'],
boolean: ['headless', 'help'],
default: {
type: 'auto',
headless: false, // Default to headed for auth so user can see/interact
},
alias: {
u: 'url',
t: 'type',
h: 'help',
},
});
if (args.help || !args.url) {
console.log(`
Web Authentication Handler
Usage:
npx tsx auth.ts --url <url> [options]
Options:
-u, --url <url> URL to authenticate (required)
-t, --type <type> Auth type: auto, form, or msal (default: auto)
--username <user> Username/email (or set CAMOUFOX_USERNAME env var)
--password <pass> Password (or set CAMOUFOX_PASSWORD env var)
--headless <bool> Run in headless mode (default: false for auth)
-h, --help Show this help message
Auth Types:
auto Auto-detect authentication type
form Generic username/password form
msal Microsoft SSO (login.microsoftonline.com)
Environment Variables:
CAMOUFOX_USERNAME Default username/email for authentication
CAMOUFOX_PASSWORD Default password for authentication
Examples:
# Interactive login (no credentials, opens browser)
npx tsx auth.ts --url "https://example.com/login"
# Form login with credentials
npx tsx auth.ts --url "https://example.com/login" --type form \\
--username "user@example.com" --password "secret"
# Microsoft SSO login
CAMOUFOX_USERNAME=user@company.com CAMOUFOX_PASSWORD=secret \\
npx tsx auth.ts --url "https://internal.company.com" --type msal
Notes:
- Session is saved to ~/.camoufox-profile/ for persistence
- After successful auth, subsequent browses will be authenticated
- Use --headless false if you need to handle MFA manually
`);
process.exit(args.help ? 0 : 1);
}
const authType = args.type as AuthType;
if (!['auto', 'form', 'msal'].includes(authType)) {
console.error(`Invalid auth type: ${authType}. Must be auto, form, or msal.`);
process.exit(1);
}
try {
const result = await authenticate({
url: args.url,
authType,
credentials:
args.username && args.password
? { username: args.username, password: args.password }
: undefined,
headless: args.headless,
});
console.log(`\nAuthentication result:`);
console.log(` Success: ${result.success}`);
console.log(` Auth type: ${result.authType}`);
console.log(` Final URL: ${result.finalUrl}`);
console.log(` Message: ${result.message}`);
process.exit(result.success ? 0 : 1);
} catch (error) {
console.error('Error:', error instanceof Error ? error.message : error);
process.exit(1);
}
}
// Run if executed directly
const isMainModule = process.argv[1]?.includes('auth.ts');
if (isMainModule) {
main();
}

View File

@@ -0,0 +1,195 @@
#!/usr/bin/env npx tsx
/**
* Browser launcher using Camoufox with persistent profile
*
* Usage:
* npx tsx browse.ts --url "https://example.com"
* npx tsx browse.ts --url "https://example.com" --screenshot --output page.png
* npx tsx browse.ts --url "https://example.com" --headless false --wait 5000
*/
import { Camoufox } from 'camoufox-js';
import { homedir } from 'os';
import { join } from 'path';
import { existsSync, mkdirSync } from 'fs';
import parseArgs from 'minimist';
import type { Page, BrowserContext } from 'playwright-core';
// Types
interface BrowseOptions {
url: string;
headless?: boolean;
screenshot?: boolean;
output?: string;
wait?: number;
timeout?: number;
interactive?: boolean;
}
interface BrowseResult {
title: string;
url: string;
screenshotPath?: string;
}
// Get profile directory
const getProfilePath = (): string => {
const customPath = process.env.CAMOUFOX_PROFILE_PATH;
if (customPath) return customPath;
const profileDir = join(homedir(), '.camoufox-profile');
if (!existsSync(profileDir)) {
mkdirSync(profileDir, { recursive: true });
}
return profileDir;
};
// Launch browser with persistent profile
export async function launchBrowser(options: {
headless?: boolean;
}): Promise<BrowserContext> {
const profilePath = getProfilePath();
const headless =
options.headless ??
(process.env.CAMOUFOX_HEADLESS ? process.env.CAMOUFOX_HEADLESS === 'true' : true);
console.log(`Using profile: ${profilePath}`);
console.log(`Headless mode: ${headless}`);
const browser = await Camoufox({
user_data_dir: profilePath,
headless,
});
return browser;
}
// Browse to URL and optionally take screenshot
export async function browse(options: BrowseOptions): Promise<BrowseResult> {
const browser = await launchBrowser({ headless: options.headless });
const page = await browser.newPage();
try {
// Navigate to URL
console.log(`Navigating to: ${options.url}`);
await page.goto(options.url, {
timeout: options.timeout ?? 60000,
waitUntil: 'domcontentloaded',
});
// Wait if specified
if (options.wait) {
console.log(`Waiting ${options.wait}ms...`);
await page.waitForTimeout(options.wait);
}
const result: BrowseResult = {
title: await page.title(),
url: page.url(),
};
console.log(`Page title: ${result.title}`);
console.log(`Final URL: ${result.url}`);
// Take screenshot if requested
if (options.screenshot) {
const outputPath = options.output ?? 'screenshot.png';
await page.screenshot({ path: outputPath, fullPage: true });
result.screenshotPath = outputPath;
console.log(`Screenshot saved: ${outputPath}`);
}
// If interactive mode, keep browser open
if (options.interactive) {
console.log('\nInteractive mode - browser will stay open.');
console.log('Press Ctrl+C to close.');
await new Promise(() => {}); // Keep running
}
return result;
} finally {
if (!options.interactive) {
await browser.close();
}
}
}
// Export page for use in other scripts
export async function getPage(options?: {
headless?: boolean;
}): Promise<{ page: Page; browser: BrowserContext }> {
const browser = await launchBrowser({ headless: options?.headless });
const page = await browser.newPage();
return { page, browser };
}
// CLI entry point
async function main() {
const args = parseArgs(process.argv.slice(2), {
string: ['url', 'output'],
boolean: ['screenshot', 'headless', 'interactive', 'help'],
default: {
headless: true,
screenshot: false,
interactive: false,
},
alias: {
u: 'url',
o: 'output',
s: 'screenshot',
h: 'help',
i: 'interactive',
},
});
if (args.help || !args.url) {
console.log(`
Web Browser with Camoufox
Usage:
npx tsx browse.ts --url <url> [options]
Options:
-u, --url <url> URL to navigate to (required)
-s, --screenshot Take a screenshot of the page
-o, --output <path> Output path for screenshot (default: screenshot.png)
--headless <bool> Run in headless mode (default: true)
--wait <ms> Wait time after page load in milliseconds
--timeout <ms> Navigation timeout (default: 60000)
-i, --interactive Keep browser open for manual interaction
-h, --help Show this help message
Examples:
npx tsx browse.ts --url "https://example.com"
npx tsx browse.ts --url "https://example.com" --screenshot --output page.png
npx tsx browse.ts --url "https://example.com" --headless false --interactive
Environment Variables:
CAMOUFOX_PROFILE_PATH Custom profile directory (default: ~/.camoufox-profile/)
CAMOUFOX_HEADLESS Default headless mode (true/false)
`);
process.exit(args.help ? 0 : 1);
}
try {
await browse({
url: args.url,
headless: args.headless,
screenshot: args.screenshot,
output: args.output,
wait: args.wait ? parseInt(args.wait, 10) : undefined,
timeout: args.timeout ? parseInt(args.timeout, 10) : undefined,
interactive: args.interactive,
});
} catch (error) {
console.error('Error:', error instanceof Error ? error.message : error);
process.exit(1);
}
}
// Run if executed directly
const isMainModule = process.argv[1]?.includes('browse.ts');
if (isMainModule) {
main();
}

View File

@@ -0,0 +1,208 @@
#!/usr/bin/env node
import fs from "node:fs";
import path from "node:path";
import { fileURLToPath } from "node:url";
const DEFAULT_WAIT_MS = 5000;
const MAX_WAIT_MS = 20000;
const NAV_TIMEOUT_MS = 30000;
const EXTRA_CHALLENGE_WAIT_MS = 8000;
const CONTENT_LIMIT = 12000;
const DEFAULT_USER_AGENT =
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36";
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
function fail(message, details) {
const payload = { error: message };
if (details) payload.details = details;
process.stderr.write(`${JSON.stringify(payload)}\n`);
process.exit(1);
}
function parseWaitTime(raw) {
const value = Number.parseInt(raw || `${DEFAULT_WAIT_MS}`, 10);
if (!Number.isFinite(value) || value < 0) return DEFAULT_WAIT_MS;
return Math.min(value, MAX_WAIT_MS);
}
function parseTarget(rawUrl) {
if (!rawUrl) {
fail("Missing URL. Usage: node skills/web-automation/scripts/extract.js <URL>");
}
let parsed;
try {
parsed = new URL(rawUrl);
} catch (error) {
fail("Invalid URL.", error.message);
}
if (!["http:", "https:"].includes(parsed.protocol)) {
fail("Only http and https URLs are allowed.");
}
return parsed.toString();
}
function ensureParentDir(filePath) {
if (!filePath) return;
fs.mkdirSync(path.dirname(filePath), { recursive: true });
}
async function detectChallenge(page) {
try {
return await page.evaluate(() => {
const text = (document.body?.innerText || "").toLowerCase();
return (
text.includes("checking your browser") ||
text.includes("just a moment") ||
text.includes("verify you are human") ||
text.includes("press and hold") ||
document.querySelector('iframe[src*="challenge"]') !== null ||
document.querySelector('iframe[src*="cloudflare"]') !== null
);
});
} catch {
return false;
}
}
async function loadPlaywright() {
try {
return await import("playwright");
} catch (error) {
fail(
"Playwright is not installed for this skill. Run pnpm install and npx playwright install chromium in skills/web-automation/scripts first.",
error.message
);
}
}
async function main() {
const requestedUrl = parseTarget(process.argv[2]);
const waitTime = parseWaitTime(process.env.WAIT_TIME);
const screenshotPath = process.env.SCREENSHOT_PATH || "";
const saveHtml = process.env.SAVE_HTML === "true";
const headless = process.env.HEADLESS !== "false";
const userAgent = process.env.USER_AGENT || DEFAULT_USER_AGENT;
const startedAt = Date.now();
const { chromium } = await loadPlaywright();
let browser;
try {
browser = await chromium.launch({
headless,
ignoreDefaultArgs: ["--enable-automation"],
args: [
"--disable-blink-features=AutomationControlled",
"--disable-features=IsolateOrigins,site-per-process"
]
});
const context = await browser.newContext({
userAgent,
locale: "en-US",
viewport: { width: 1440, height: 900 },
extraHTTPHeaders: {
"Accept-Language": "en-US,en;q=0.9",
Accept: "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
}
});
await context.addInitScript(() => {
Object.defineProperty(navigator, "webdriver", {
get: () => false
});
Object.defineProperty(navigator, "languages", {
get: () => ["en-US", "en"]
});
Object.defineProperty(navigator, "plugins", {
get: () => [1, 2, 3, 4, 5]
});
window.chrome = window.chrome || { runtime: {} };
const originalQuery = window.navigator.permissions?.query?.bind(window.navigator.permissions);
if (originalQuery) {
window.navigator.permissions.query = (parameters) => {
if (parameters?.name === "notifications") {
return Promise.resolve({ state: Notification.permission });
}
return originalQuery(parameters);
};
}
});
const page = await context.newPage();
const response = await page.goto(requestedUrl, {
waitUntil: "domcontentloaded",
timeout: NAV_TIMEOUT_MS
});
await page.waitForTimeout(waitTime);
let challengeDetected = await detectChallenge(page);
if (challengeDetected) {
await page.waitForTimeout(EXTRA_CHALLENGE_WAIT_MS);
challengeDetected = await detectChallenge(page);
}
const extracted = await page.evaluate((contentLimit) => {
const bodyText = document.body?.innerText || "";
return {
finalUrl: window.location.href,
title: document.title || "",
content: bodyText.slice(0, contentLimit),
metaDescription:
document.querySelector('meta[name="description"]')?.content ||
document.querySelector('meta[property="og:description"]')?.content ||
""
};
}, CONTENT_LIMIT);
const result = {
requestedUrl,
finalUrl: extracted.finalUrl,
title: extracted.title,
content: extracted.content,
metaDescription: extracted.metaDescription,
status: response ? response.status() : null,
challengeDetected,
elapsedSeconds: ((Date.now() - startedAt) / 1000).toFixed(2)
};
if (screenshotPath) {
ensureParentDir(screenshotPath);
await page.screenshot({ path: screenshotPath, fullPage: false, timeout: 10000 });
result.screenshot = screenshotPath;
}
if (saveHtml) {
const htmlTarget = screenshotPath
? screenshotPath.replace(/\.[^.]+$/, ".html")
: path.resolve(__dirname, `page-${Date.now()}.html`);
ensureParentDir(htmlTarget);
fs.writeFileSync(htmlTarget, await page.content());
result.htmlFile = htmlTarget;
}
process.stdout.write(`${JSON.stringify(result, null, 2)}\n`);
await browser.close();
} catch (error) {
if (browser) {
try {
await browser.close();
} catch {
// Ignore close errors after the primary failure.
}
}
fail("Scrape failed.", error.message);
}
}
main();

View File

@@ -0,0 +1,292 @@
#!/usr/bin/env npx tsx
import parseArgs from 'minimist';
import type { Page } from 'playwright-core';
import { launchBrowser } from './browse';
type Step =
| { action: 'goto'; url: string }
| { action: 'click'; selector?: string; text?: string }
| { action: 'type'; selector?: string; text: string }
| { action: 'press'; key: string; selector?: string }
| { action: 'wait'; ms: number }
| { action: 'screenshot'; path: string }
| { action: 'extract'; selector: string; count?: number };
function normalizeKey(k: string): string {
if (!k) return 'Enter';
const lower = k.toLowerCase();
if (lower === 'enter' || lower === 'return') return 'Enter';
if (lower === 'tab') return 'Tab';
if (lower === 'escape' || lower === 'esc') return 'Escape';
return k;
}
function splitInstructions(instruction: string): string[] {
return instruction
.split(/\bthen\b|;/gi)
.map((s) => s.trim())
.filter(Boolean);
}
function parseInstruction(instruction: string): Step[] {
const parts = splitInstructions(instruction);
const steps: Step[] = [];
for (const p of parts) {
// go to https://...
const goto = p.match(/^(?:go to|open|navigate to)\s+(https?:\/\/\S+)/i);
if (goto) {
steps.push({ action: 'goto', url: goto[1] });
continue;
}
// click on "text" or click #selector
const clickText = p.match(/^click(?: on)?\s+"([^"]+)"/i);
if (clickText) {
steps.push({ action: 'click', text: clickText[1] });
continue;
}
const clickSelector = p.match(/^click(?: on)?\s+(#[\w-]+|\.[\w-]+|[a-z]+\[[^\]]+\])/i);
if (clickSelector) {
steps.push({ action: 'click', selector: clickSelector[1] });
continue;
}
// type "text" [in selector]
const typeInto = p.match(/^type\s+"([^"]+)"\s+in\s+(.+)$/i);
if (typeInto) {
steps.push({ action: 'type', text: typeInto[1], selector: typeInto[2].trim() });
continue;
}
const typeOnly = p.match(/^type\s+"([^"]+)"$/i);
if (typeOnly) {
steps.push({ action: 'type', text: typeOnly[1] });
continue;
}
// press enter [in selector]
const pressIn = p.match(/^press\s+(\w+)\s+in\s+(.+)$/i);
if (pressIn) {
steps.push({ action: 'press', key: normalizeKey(pressIn[1]), selector: pressIn[2].trim() });
continue;
}
const pressOnly = p.match(/^press\s+(\w+)$/i);
if (pressOnly) {
steps.push({ action: 'press', key: normalizeKey(pressOnly[1]) });
continue;
}
// wait 2s / wait 500ms
const waitS = p.match(/^wait\s+(\d+)\s*s(?:ec(?:onds?)?)?$/i);
if (waitS) {
steps.push({ action: 'wait', ms: parseInt(waitS[1], 10) * 1000 });
continue;
}
const waitMs = p.match(/^wait\s+(\d+)\s*ms$/i);
if (waitMs) {
steps.push({ action: 'wait', ms: parseInt(waitMs[1], 10) });
continue;
}
// screenshot path
const shot = p.match(/^screenshot(?: to)?\s+(.+)$/i);
if (shot) {
steps.push({ action: 'screenshot', path: shot[1].trim() });
continue;
}
throw new Error(`Could not parse step: "${p}"`);
}
return steps;
}
function escapeRegExp(value: string): string {
return value.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}
function isLikelyLoginText(text: string): boolean {
return /(login|accedi|sign\s*in|entra)/i.test(text);
}
async function clickByText(page: Page, text: string): Promise<boolean> {
const patterns = [new RegExp(`^${escapeRegExp(text)}$`, 'i'), new RegExp(escapeRegExp(text), 'i')];
for (const pattern of patterns) {
const targets = [
page.getByRole('button', { name: pattern }).first(),
page.getByRole('link', { name: pattern }).first(),
page.getByText(pattern).first(),
];
for (const target of targets) {
if (await target.count()) {
try {
await target.click({ timeout: 8000 });
return true;
} catch {
// keep trying next candidate
}
}
}
}
return false;
}
async function fallbackLoginNavigation(page: Page, requestedText: string): Promise<boolean> {
if (!isLikelyLoginText(requestedText)) return false;
const current = new URL(page.url());
const candidateLinks = await page.evaluate(() => {
const loginTerms = ['login', 'accedi', 'sign in', 'entra'];
const anchors = Array.from(document.querySelectorAll('a[href], a[onclick], button[onclick]')) as Array<HTMLAnchorElement | HTMLButtonElement>;
return anchors
.map((el) => {
const text = (el.textContent || '').trim().toLowerCase();
const href = (el as HTMLAnchorElement).getAttribute('href') || '';
return { text, href };
})
.filter((x) => x.text && loginTerms.some((t) => x.text.includes(t)))
.map((x) => x.href)
.filter(Boolean);
});
// Prefer real URLs (not javascript:)
const realCandidate = candidateLinks.find((h) => /login|account\/login/i.test(h) && !h.startsWith('javascript:'));
if (realCandidate) {
const target = new URL(realCandidate, page.url()).toString();
await page.goto(target, { waitUntil: 'domcontentloaded', timeout: 60000 });
return true;
}
// Site-specific fallback for Corriere
if (/corriere\.it$/i.test(current.hostname) || /\.corriere\.it$/i.test(current.hostname)) {
await page.goto('https://www.corriere.it/account/login', {
waitUntil: 'domcontentloaded',
timeout: 60000,
});
return true;
}
return false;
}
async function typeInBestTarget(page: Page, text: string, selector?: string) {
if (selector) {
await page.locator(selector).first().click({ timeout: 10000 });
await page.locator(selector).first().fill(text);
return;
}
const loc = page.locator('input[name="q"], input[type="search"], input[type="text"], textarea').first();
await loc.click({ timeout: 10000 });
await loc.fill(text);
}
async function pressOnTarget(page: Page, key: string, selector?: string) {
if (selector) {
await page.locator(selector).first().press(key);
return;
}
await page.keyboard.press(key);
}
async function runSteps(page: Page, steps: Step[]) {
for (const step of steps) {
switch (step.action) {
case 'goto':
await page.goto(step.url, { waitUntil: 'domcontentloaded', timeout: 60000 });
break;
case 'click':
if (step.selector) {
await page.locator(step.selector).first().click({ timeout: 15000 });
} else if (step.text) {
const clicked = await clickByText(page, step.text);
if (!clicked) {
const recovered = await fallbackLoginNavigation(page, step.text);
if (!recovered) {
throw new Error(`Could not click target text: ${step.text}`);
}
}
} else {
throw new Error('click step missing selector/text');
}
try {
await page.waitForLoadState('domcontentloaded', { timeout: 10000 });
} catch {
// no navigation is fine
}
break;
case 'type':
await typeInBestTarget(page, step.text, step.selector);
break;
case 'press':
await pressOnTarget(page, step.key, step.selector);
break;
case 'wait':
await page.waitForTimeout(step.ms);
break;
case 'screenshot':
await page.screenshot({ path: step.path, fullPage: true });
break;
case 'extract': {
const items = await page.locator(step.selector).allTextContents();
const out = items.slice(0, step.count ?? items.length).map((t) => t.trim()).filter(Boolean);
console.log(JSON.stringify(out, null, 2));
break;
}
default:
throw new Error('Unknown step');
}
}
}
async function main() {
const args = parseArgs(process.argv.slice(2), {
string: ['instruction', 'steps'],
boolean: ['headless', 'help'],
default: { headless: true },
alias: { i: 'instruction', s: 'steps', h: 'help' },
});
if (args.help || (!args.instruction && !args.steps)) {
console.log(`
General Web Flow Runner (Camoufox)
Usage:
npx tsx flow.ts --instruction "go to https://example.com then type \"hello\" then press enter"
npx tsx flow.ts --steps '[{"action":"goto","url":"https://example.com"}]'
Supported natural steps:
- go to/open/navigate to <url>
- click on "Text"
- click <css-selector>
- type "text"
- type "text" in <css-selector>
- press <key>
- press <key> in <css-selector>
- wait <N>s | wait <N>ms
- screenshot <path>
`);
process.exit(args.help ? 0 : 1);
}
const browser = await launchBrowser({ headless: args.headless });
const page = await browser.newPage();
try {
const steps: Step[] = args.steps ? JSON.parse(args.steps) : parseInstruction(args.instruction);
await runSteps(page, steps);
console.log('Flow complete. Final URL:', page.url());
} finally {
await browser.close();
}
}
main().catch((e) => {
console.error('Error:', e instanceof Error ? e.message : e);
process.exit(1);
});

View File

@@ -0,0 +1,32 @@
{
"name": "web-automation-scripts",
"version": "1.0.0",
"description": "Web browsing and scraping scripts using Camoufox",
"type": "module",
"scripts": {
"extract": "node extract.js",
"browse": "tsx browse.ts",
"scrape": "tsx scrape.ts",
"fetch-browser": "npx camoufox-js fetch"
},
"dependencies": {
"@mozilla/readability": "^0.5.0",
"better-sqlite3": "^12.6.2",
"camoufox-js": "^0.8.5",
"jsdom": "^24.0.0",
"minimist": "^1.2.8",
"playwright": "^1.58.2",
"playwright-core": "^1.40.0",
"turndown": "^7.1.2",
"turndown-plugin-gfm": "^1.0.2"
},
"devDependencies": {
"@types/jsdom": "^21.1.6",
"@types/minimist": "^1.2.5",
"@types/turndown": "^5.0.4",
"esbuild": "0.27.0",
"tsx": "^4.7.0",
"typescript": "^5.3.0"
},
"packageManager": "pnpm@10.18.1+sha512.77a884a165cbba2d8d1c19e3b4880eee6d2fcabd0d879121e282196b80042351d5eb3ca0935fa599da1dc51265cc68816ad2bddd2a2de5ea9fdf92adbec7cd34"
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,212 @@
import { writeFileSync } from 'fs';
import { getPage } from './browse.js';
const baseUrl = 'http://localhost:3000';
const username = 'analyst@fhb.local';
const password = process.env.CAMOUFOX_PASSWORD ?? '';
const reportPath = '/Users/stefano.fiorini/Documents/projects/fhb-loan-spreading-pilot-a/docs/plans/2026-01-24-financials-analysis-redesign/web-automation-scan.md';
type NavResult = {
requestedUrl: string;
url: string;
status: number | null;
title: string;
error?: string;
};
async function gotoWithStatus(page: any, url: string): Promise<NavResult> {
const resp = await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 60000 }).catch((e: unknown) => ({ error: e }));
if (resp?.error) {
return {
requestedUrl: url,
url: page.url(),
status: null,
title: await page.title().catch(() => ''),
error: String(resp.error),
};
}
return {
requestedUrl: url,
url: page.url(),
status: resp ? resp.status() : null,
title: await page.title().catch(() => ''),
};
}
async function textOrNull(page: any, selector: string): Promise<string | null> {
const loc = page.locator(selector).first();
try {
if ((await loc.count()) === 0) return null;
const txt = await loc.textContent();
return txt ? txt.trim().replace(/\s+/g, ' ') : null;
} catch {
return null;
}
}
async function main() {
const { page, browser } = await getPage({ headless: true });
const lines: string[] = [];
lines.push('# Web Automation Scan (local)');
lines.push('');
lines.push(`- Base URL: ${baseUrl}`);
lines.push(`- Timestamp: ${new Date().toISOString()}`);
lines.push('');
try {
lines.push('## Login');
await gotoWithStatus(page, `${baseUrl}/login`);
await page.locator('input[name="email"]').fill(username);
await page.locator('input[name="password"]').fill(password);
await page.locator('button[type="submit"]').click();
await page.waitForTimeout(2500);
const cookies = await page.context().cookies();
const sessionCookie = cookies.find((c: any) => c.name === 'fhb_session');
lines.push(`- After submit URL: ${page.url()}`);
lines.push(`- Has session cookie (fhb_session): ${Boolean(sessionCookie)}`);
lines.push('');
lines.push('## Demo Case');
const casesNav = await gotoWithStatus(page, `${baseUrl}/cases`);
lines.push(`- GET /cases → status ${casesNav.status ?? 'ERR'}, final ${casesNav.url}`);
const envCaseId = process.env.SCAN_CASE_ID?.trim() || null;
let selectedCaseId: string | null = envCaseId;
if (!selectedCaseId) {
const caseLinks = await page.$$eval('a[href^="/cases/"]', (as) =>
as
.map((a) => ({
href: (a as HTMLAnchorElement).getAttribute('href') || '',
text: (a.textContent || '').trim(),
}))
.filter((x) => x.href.includes('/cases/'))
);
const preferredTitles = ['Demo - Strong Borrower', 'Demo - Weak Borrower', 'Demo - Incomplete'];
for (const title of preferredTitles) {
const match = caseLinks.find((l) => l.text.includes(title) && l.href.includes('/cases/'));
const href = match?.href ?? '';
const m = href.match(/\/cases\/([0-9a-f-]{36})/i);
if (m) {
selectedCaseId = m[1];
break;
}
}
if (!selectedCaseId) {
const firstHref =
caseLinks.map((l) => l.href).find((h) => /\/cases\/[0-9a-f-]{36}/i.test(h)) ?? null;
const m = firstHref?.match(/\/cases\/([0-9a-f-]{36})/i) ?? null;
selectedCaseId = m?.[1] ?? null;
}
}
lines.push(`- Selected caseId: ${selectedCaseId ?? '(none found)'}`);
if (!selectedCaseId) {
lines.push('');
lines.push('⚠️ Could not find a demo case link on /cases.');
writeFileSync(reportPath, lines.join('\n') + '\n', 'utf-8');
return;
}
const caseBase = `${baseUrl}/cases/${selectedCaseId}/journey`;
lines.push('');
lines.push('## Route Checks');
const routesToCheck = [
`${caseBase}`,
`${caseBase}/financials`,
`${caseBase}/financials/income`,
`${caseBase}/analysis`,
`${caseBase}/analysis/configure`,
`${caseBase}/analysis/ai`,
`${caseBase}/analysis/ai/detail`,
`${caseBase}/spreads`,
];
for (const url of routesToCheck) {
const r = await gotoWithStatus(page, url);
const h1 = await textOrNull(page, 'h1');
const finalPath = r.url.startsWith(baseUrl) ? r.url.slice(baseUrl.length) : r.url;
lines.push(`- ${url.slice(baseUrl.length)} → status ${r.status ?? 'ERR'} (final ${finalPath})${h1 ? `, h1="${h1}"` : ''}`);
}
lines.push('');
lines.push('## Spreadsheet Analysis (UI)');
await gotoWithStatus(page, `${caseBase}/analysis/configure`);
const runButton = page.locator('button:has-text("Run Analysis")').first();
const disabled = await runButton.isDisabled().catch(() => true);
lines.push(`- Run button disabled: ${disabled}`);
if (!disabled) {
await runButton.click();
const resultsWait = page
.waitForURL('**/journey/analysis/results**', { timeout: 180000 })
.then(() => 'results' as const);
const errorWait = page
.locator('[role="alert"]')
.filter({ hasText: 'Error' })
.first()
.waitFor({ timeout: 180000 })
.then(() => 'error' as const);
const outcome = await Promise.race([resultsWait, errorWait]).catch(() => 'timeout' as const);
if (outcome === 'results') {
await page.waitForTimeout(1500);
lines.push(`- Results URL: ${page.url().replace(baseUrl, '')}`);
const downloadHref = await page
.locator('a[href*="/journey/analysis/download"]')
.first()
.getAttribute('href')
.catch(() => null);
if (downloadHref) {
const dlUrl = downloadHref.startsWith('http') ? downloadHref : `${baseUrl}${downloadHref}`;
const dlResp = await page.goto(dlUrl, { waitUntil: 'commit', timeout: 60000 }).catch(() => null);
lines.push(
`- Download route status: ${dlResp?.status() ?? 'ERR'} (Content-Type: ${dlResp?.headers()?.['content-type'] ?? 'n/a'})`
);
} else {
lines.push('- Download link not found on results page');
}
} else if (outcome === 'error') {
const errorText = await page
.locator('[role="alert"]')
.first()
.textContent()
.then((t: string | null) => (t ? t.trim().replace(/\\s+/g, ' ') : null))
.catch(() => null);
lines.push(`- Stayed on configure page; saw error callout: ${errorText ?? '(unable to read)'}`);
lines.push('- Skipping download check because analysis did not complete.');
} else {
lines.push('- Timed out waiting for results or error after clicking Run Analysis.');
}
} else {
lines.push('- Skipped running analysis because Run button was disabled.');
}
lines.push('');
lines.push('## Notes');
lines.push('- This scan avoids scraping financial values; it records route availability and basic headings.');
writeFileSync(reportPath, lines.join('\n') + '\n', 'utf-8');
} finally {
await browser.close();
}
}
main().catch((err) => {
console.error(err);
process.exitCode = 1;
});

View File

@@ -0,0 +1,351 @@
#!/usr/bin/env npx tsx
/**
* Web scraper that extracts content to markdown
*
* Usage:
* npx tsx scrape.ts --url "https://example.com" --mode main
* npx tsx scrape.ts --url "https://example.com" --mode full --output page.md
* npx tsx scrape.ts --url "https://example.com" --mode selector --selector ".content"
*/
import TurndownService from 'turndown';
import * as turndownPluginGfm from 'turndown-plugin-gfm';
import { Readability } from '@mozilla/readability';
import { JSDOM } from 'jsdom';
import { writeFileSync } from 'fs';
import parseArgs from 'minimist';
import { getPage } from './browse.js';
// Types
type ScrapeMode = 'main' | 'full' | 'selector';
interface ScrapeOptions {
url: string;
mode: ScrapeMode;
selector?: string;
output?: string;
includeLinks?: boolean;
includeTables?: boolean;
includeImages?: boolean;
headless?: boolean;
wait?: number;
}
interface ScrapeResult {
title: string;
url: string;
markdown: string;
byline?: string;
excerpt?: string;
}
// Configure Turndown for markdown conversion
function createTurndownService(options: {
includeLinks?: boolean;
includeTables?: boolean;
includeImages?: boolean;
}): TurndownService {
const turndown = new TurndownService({
headingStyle: 'atx',
hr: '---',
bulletListMarker: '-',
codeBlockStyle: 'fenced',
fence: '```',
emDelimiter: '*',
strongDelimiter: '**',
linkStyle: 'inlined',
});
// Add GFM support (tables, strikethrough, task lists)
turndown.use(turndownPluginGfm.gfm);
// Custom rule for code blocks with language detection
turndown.addRule('codeBlockWithLanguage', {
filter: (node) => {
return (
node.nodeName === 'PRE' &&
node.firstChild?.nodeName === 'CODE'
);
},
replacement: (_content, node) => {
const codeNode = node.firstChild as HTMLElement;
const className = codeNode.getAttribute('class') || '';
const langMatch = className.match(/language-(\w+)/);
const lang = langMatch ? langMatch[1] : '';
const code = codeNode.textContent || '';
return `\n\n\`\`\`${lang}\n${code}\n\`\`\`\n\n`;
},
});
// Remove images if not included
if (!options.includeImages) {
turndown.addRule('removeImages', {
filter: 'img',
replacement: () => '',
});
}
// Remove links but keep text if not included
if (!options.includeLinks) {
turndown.addRule('removeLinks', {
filter: 'a',
replacement: (content) => content,
});
}
// Remove script, style, nav, footer, aside elements
turndown.remove(['script', 'style', 'nav', 'footer', 'aside', 'noscript']);
return turndown;
}
// Extract main content using Readability
function extractMainContent(html: string, url: string): {
content: string;
title: string;
byline?: string;
excerpt?: string;
} {
const dom = new JSDOM(html, { url });
const reader = new Readability(dom.window.document);
const article = reader.parse();
if (!article) {
throw new Error('Could not extract main content from page');
}
return {
content: article.content,
title: article.title,
byline: article.byline || undefined,
excerpt: article.excerpt || undefined,
};
}
// Scrape a URL and return markdown
export async function scrape(options: ScrapeOptions): Promise<ScrapeResult> {
const { page, browser } = await getPage({ headless: options.headless ?? true });
try {
// Navigate to URL
console.log(`Navigating to: ${options.url}`);
await page.goto(options.url, {
timeout: 60000,
waitUntil: 'domcontentloaded',
});
// Wait if specified
if (options.wait) {
console.log(`Waiting ${options.wait}ms for dynamic content...`);
await page.waitForTimeout(options.wait);
}
const pageTitle = await page.title();
const pageUrl = page.url();
let html: string;
let title = pageTitle;
let byline: string | undefined;
let excerpt: string | undefined;
// Get HTML based on mode
switch (options.mode) {
case 'main': {
// Get full page HTML and extract with Readability
const fullHtml = await page.content();
const extracted = extractMainContent(fullHtml, pageUrl);
html = extracted.content;
title = extracted.title || pageTitle;
byline = extracted.byline;
excerpt = extracted.excerpt;
break;
}
case 'selector': {
if (!options.selector) {
throw new Error('Selector mode requires --selector option');
}
const element = await page.$(options.selector);
if (!element) {
throw new Error(`Selector not found: ${options.selector}`);
}
html = await element.innerHTML();
break;
}
case 'full':
default: {
// Get body content, excluding common non-content elements
html = await page.evaluate(() => {
// Remove common non-content elements
const selectorsToRemove = [
'script', 'style', 'noscript', 'iframe',
'nav', 'header', 'footer', '.cookie-banner',
'.advertisement', '.ads', '#ads', '.social-share',
'.comments', '#comments', '.sidebar'
];
selectorsToRemove.forEach(selector => {
document.querySelectorAll(selector).forEach(el => el.remove());
});
return document.body.innerHTML;
});
break;
}
}
// Convert to markdown
const turndown = createTurndownService({
includeLinks: options.includeLinks ?? true,
includeTables: options.includeTables ?? true,
includeImages: options.includeImages ?? false,
});
let markdown = turndown.turndown(html);
// Add title as H1 if not already present
if (!markdown.startsWith('# ')) {
markdown = `# ${title}\n\n${markdown}`;
}
// Add metadata header
const metadataLines = [
`<!-- Scraped from: ${pageUrl} -->`,
byline ? `<!-- Author: ${byline} -->` : null,
excerpt ? `<!-- Excerpt: ${excerpt} -->` : null,
`<!-- Scraped at: ${new Date().toISOString()} -->`,
'',
].filter(Boolean);
markdown = metadataLines.join('\n') + '\n' + markdown;
// Clean up excessive whitespace
markdown = markdown
.replace(/\n{4,}/g, '\n\n\n')
.replace(/[ \t]+$/gm, '')
.trim();
const result: ScrapeResult = {
title,
url: pageUrl,
markdown,
byline,
excerpt,
};
// Save to file if output specified
if (options.output) {
writeFileSync(options.output, markdown, 'utf-8');
console.log(`Markdown saved to: ${options.output}`);
}
return result;
} finally {
await browser.close();
}
}
// CLI entry point
async function main() {
const args = parseArgs(process.argv.slice(2), {
string: ['url', 'mode', 'selector', 'output'],
boolean: ['headless', 'links', 'tables', 'images', 'help'],
default: {
mode: 'main',
headless: true,
links: true,
tables: true,
images: false,
},
alias: {
u: 'url',
m: 'mode',
s: 'selector',
o: 'output',
h: 'help',
},
});
if (args.help || !args.url) {
console.log(`
Web Scraper - Extract content to Markdown
Usage:
npx tsx scrape.ts --url <url> [options]
Options:
-u, --url <url> URL to scrape (required)
-m, --mode <mode> Scrape mode: main, full, or selector (default: main)
-s, --selector <sel> CSS selector for selector mode
-o, --output <path> Output file path for markdown
--headless <bool> Run in headless mode (default: true)
--wait <ms> Wait time for dynamic content
--links Include links in output (default: true)
--tables Include tables in output (default: true)
--images Include images in output (default: false)
-h, --help Show this help message
Scrape Modes:
main Extract main article content using Readability (best for articles)
full Full page content with common elements removed
selector Extract specific element by CSS selector
Examples:
npx tsx scrape.ts --url "https://docs.example.com/guide" --mode main
npx tsx scrape.ts --url "https://example.com" --mode full --output page.md
npx tsx scrape.ts --url "https://example.com" --mode selector --selector ".api-docs"
npx tsx scrape.ts --url "https://example.com" --mode main --no-links --output clean.md
Output Format:
- GitHub Flavored Markdown (tables, strikethrough, task lists)
- Proper heading hierarchy
- Code blocks with language detection
- Metadata comments at top (source URL, date)
`);
process.exit(args.help ? 0 : 1);
}
const mode = args.mode as ScrapeMode;
if (!['main', 'full', 'selector'].includes(mode)) {
console.error(`Invalid mode: ${mode}. Must be main, full, or selector.`);
process.exit(1);
}
try {
const result = await scrape({
url: args.url,
mode,
selector: args.selector,
output: args.output,
includeLinks: args.links,
includeTables: args.tables,
includeImages: args.images,
headless: args.headless,
wait: args.wait ? parseInt(args.wait, 10) : undefined,
});
// Print result summary
console.log(`\nScrape complete:`);
console.log(` Title: ${result.title}`);
console.log(` URL: ${result.url}`);
if (result.byline) console.log(` Author: ${result.byline}`);
console.log(` Markdown length: ${result.markdown.length} chars`);
// Print markdown if not saved to file
if (!args.output) {
console.log('\n--- Markdown Output ---\n');
console.log(result.markdown);
}
} catch (error) {
console.error('Error:', error instanceof Error ? error.message : error);
process.exit(1);
}
}
// Run if executed directly
const isMainModule = process.argv[1]?.includes('scrape.ts');
if (isMainModule) {
main();
}

View File

@@ -0,0 +1,39 @@
import { Camoufox } from 'camoufox-js';
import { homedir } from 'os';
import { join } from 'path';
import { mkdirSync, existsSync } from 'fs';
async function test() {
const profilePath = join(homedir(), '.camoufox-profile');
if (!existsSync(profilePath)) {
mkdirSync(profilePath, { recursive: true });
}
console.log('Profile path:', profilePath);
console.log('Launching with full options...');
const browser = await Camoufox({
headless: true,
user_data_dir: profilePath,
// humanize: 1.5, // Test without this first
// geoip: true, // Test without this first
// enable_cache: true,
// block_webrtc: false,
});
console.log('Browser launched');
const page = await browser.newPage();
console.log('Page created');
await page.goto('https://github.com', { timeout: 30000 });
console.log('Navigated to:', page.url());
console.log('Title:', await page.title());
await page.screenshot({ path: '/tmp/github-test.png' });
console.log('Screenshot saved');
await browser.close();
console.log('Done');
}
test().catch(console.error);

View File

@@ -0,0 +1,22 @@
import { Camoufox } from 'camoufox-js';
async function test() {
console.log('Launching Camoufox with minimal config...');
const browser = await Camoufox({
headless: true,
});
console.log('Browser launched');
const page = await browser.newPage();
console.log('Page created');
await page.goto('https://example.com', { timeout: 30000 });
console.log('Navigated to:', page.url());
console.log('Title:', await page.title());
await browser.close();
console.log('Done');
}
test().catch(console.error);

View File

@@ -0,0 +1,32 @@
import { Camoufox } from 'camoufox-js';
import { homedir } from 'os';
import { join } from 'path';
import { mkdirSync, existsSync } from 'fs';
async function test() {
const profilePath = join(homedir(), '.camoufox-profile');
if (!existsSync(profilePath)) {
mkdirSync(profilePath, { recursive: true });
}
console.log('Profile path:', profilePath);
console.log('Launching with user_data_dir...');
const browser = await Camoufox({
headless: true,
user_data_dir: profilePath,
});
console.log('Browser launched');
const page = await browser.newPage();
console.log('Page created');
await page.goto('https://example.com', { timeout: 30000 });
console.log('Navigated to:', page.url());
console.log('Title:', await page.title());
await browser.close();
console.log('Done');
}
test().catch(console.error);

View File

@@ -0,0 +1,16 @@
{
"compilerOptions": {
"target": "ES2022",
"module": "ESNext",
"moduleResolution": "bundler",
"esModuleInterop": true,
"allowSyntheticDefaultImports": true,
"strict": true,
"skipLibCheck": true,
"resolveJsonModule": true,
"outDir": "./dist",
"rootDir": "."
},
"include": ["*.ts"],
"exclude": ["node_modules", "dist"]
}

View File

@@ -0,0 +1,8 @@
declare module 'turndown-plugin-gfm' {
import TurndownService from 'turndown';
export function gfm(turndownService: TurndownService): void;
export function strikethrough(turndownService: TurndownService): void;
export function tables(turndownService: TurndownService): void;
export function taskListItems(turndownService: TurndownService): void;
}