Files

Stefano Fiorini 4919edcec1 Fix ACP startup guidance for managed acpx path

2026-03-30 08:05:53 -05:00

14 KiB

Raw Permalink Blame History

OpenClaw ACP Orchestration

This document describes the local OpenClaw ACP setup used to orchestrate Codex and Claude Code from an OpenClaw agent on the gateway machine.

Scope

The target workflow is:

OpenClaw remains the orchestration brain
natural-language requests like use codex for this or run this in claude code are routed to ACP
the coding harness runs on the same gateway machine where the local codex and claude clients are installed
session lifecycle is handled through OpenClaw ACP rather than sub-agents or shell relay hacks

Local Baseline Before ACP Enablement

Captured on 2026-03-29:

OpenClaw: 2026.3.28 (f9b1079)
bundled acpx plugin present locally but disabled and not in the plugin allowlist
local codex: /opt/homebrew/bin/codex 0.117.0
local claude: /opt/homebrew/bin/claude 2.1.87
gateway host: 8 CPU cores, 8 GB RAM
default OpenClaw agent workspace: ~/.openclaw/workspace

Architectural Decision

Primary architecture:

OpenClaw ACP with acpx

Fallback architecture only if parity is not acceptable:

openclaw mcp serve with Codex or Claude Code connected as external MCP clients to existing OpenClaw channel conversations

Why ACP is primary:

this is the official OpenClaw architecture for "run this in Codex" / "start Claude Code in a thread"
it gives durable ACP sessions, resume, bindings, and programmatic sessions_spawn runtime:"acp"

Important Runtime Caveat

The bundled acpx runtime supports Codex and Claude, but the stock aliases are adapter commands, not necessarily the bare local terminal binaries:

codex -> npx -y @zed-industries/codex-acp@0.9.5
claude -> npx -y @zed-industries/claude-agent-acp@0.21.0

That means "same as terminal" behavior has to be validated explicitly. It is not guaranteed just because ACP works.

Baseline Configuration Applied

The current host-local OpenClaw config keeps the native main orchestrator and adds ACP-backed agents alongside it:

agents.list[0] = main with runtime.type = "embedded"
agents.list[1] = codex with runtime.type = "acp"
agents.list[2] = claude with runtime.type = "acp"
acp.enabled = true
acp.dispatch.enabled = true
acp.backend = "acpx"
acp.defaultAgent = "codex"
acp.allowedAgents = ["claude", "codex"]
acp.maxConcurrentSessions = 2
plugins.allow += acpx
plugins.entries.acpx.enabled = true
ACP-specific cwd values are absolute paths, not ~-prefixed shortcuts

The main entry is intentional. Once agents.list is populated, OpenClaw treats that list as the agent inventory. If main is omitted, ACP targets can displace the native orchestrator and break the intended architecture.

ACP Health Equivalents

The docs mention /acp doctor, but the operator-friendly local equivalents on this host are:

openclaw config validate
openclaw plugins inspect acpx --json
openclaw gateway status --json
openclaw status --deep
cd /opt/homebrew/lib/node_modules/openclaw/dist/extensions/acpx && ./node_modules/.bin/acpx config show

Healthy baseline on this machine means:

config validates
acpx plugin status is loaded
gateway RPC is healthy
openclaw status --deep shows Agents 3 with default main
acpx config show works without bootstrap errors
plugins.installs does not need an acpx record because acpx is bundled with OpenClaw, not separately installed

Important health nuance:

openclaw plugins inspect acpx --json only tells you the plugin is loaded, not that the ACP backend is healthy enough for sessions_spawn runtime:"acp"
the actual readiness signal is the gateway log line acpx runtime backend ready
during rollout, the backend stayed unavailable until ACP-specific cwd values were changed from ~/.openclaw/workspace to absolute paths
a later startup bug showed that pinning a custom command path disables the plugin-local managed install path and can leave ACP unavailable after a restart if the local acpx artifact is absent at boot
the current host fix is to leave plugins.entries.acpx.config.command unset so the bundled plugin can manage its own plugin-local acpx binary

Maintenance note:

the current host intentionally uses the managed plugin-local default command path rather than a custom override
after any OpenClaw upgrade, re-run:
- openclaw config validate
- openclaw plugins inspect acpx --json
- openclaw logs --limit 80 --plain --timeout 10000 | rg 'acpx runtime backend (registered|ready|probe failed)'
- ls -l /opt/homebrew/lib/node_modules/openclaw/dist/extensions/acpx/node_modules/.bin/acpx
if ACP comes up unavailable at startup, check whether a custom plugins.entries.acpx.config.command override was reintroduced before debugging deeper

Security Review

Why this needs a review

ACP coding sessions are headless and non-interactive. If they are allowed to write files and run shell commands, the permission mode matters a lot.

Leading rollout candidate

plugins.entries.acpx.config.permissionMode = "approve-all"
plugins.entries.acpx.config.nonInteractivePermissions = "deny"

Why deny instead of fail:

on this host, graceful degradation is better than crashing an otherwise useful ACP session at the first blocked headless permission prompt
the live acpx plugin schema for OpenClaw 2026.3.28 validates deny, so this is an intentional runtime choice rather than a placeholder

What `approve-all` means here

On this gateway host, an ACP coding harness may:

write files in the configured working tree
execute shell commands without an interactive prompt
access network resources that are already reachable from the host
read local home-directory configuration that the launched harness itself can reach

Risk boundaries

This host already runs OpenClaw with:

tools.exec.host = "gateway"
tools.exec.security = "full"
tools.exec.ask = "off"

So ACP approve-all does not create the first fully trusted execution path on this machine. It extends that trust to ACP-backed Codex/Claude sessions. That is still a meaningful trust expansion and should stay limited to trusted operators and trusted channels.

First-wave rollout stance

Recommended first wave:

enable ACP only for trusted direct operators
prefer explicit agentId routing and minimal bindings
defer broad persistent group bindings until parity and lifecycle behavior are proven
keep the plugin-tools bridge off unless there is a proven need for ACP harnesses to call OpenClaw plugin tools from inside the session

Observability And Recovery

Minimum required operational checks:

openclaw config validate
openclaw plugins inspect acpx --json
openclaw gateway status --json
openclaw status --deep
openclaw logs --follow
/tmp/openclaw/openclaw-YYYY-MM-DD.log

Operational questions this setup must answer:

did an ACP session start
which harness was used
which session key is active
where a stall or permission denial first occurred
whether the gateway restart preserved resumable state

Current host signals:

plugin status: openclaw plugins inspect acpx --json
gateway/runtime health: openclaw gateway status --json
agent inventory and active session count: openclaw status --deep
ACP adapter defaults and override file discovery: acpx config show
first runtime failure point: gateway log under /tmp/openclaw/

Claude adapter noise:

the Claude ACP adapter currently emits session/update validation noise for usage_update after otherwise successful turns
when filtering logs during Claude ACP troubleshooting, separate that known noise from startup failures by focusing first on:
- acpx runtime backend ready
- ACP runtime backend is currently unavailable
- probe failed
- actual session spawn/close lines

Concurrency Stance

This machine has 8 CPU cores and 8 GB RAM. A conservative initial ACP concurrency cap is better than the plan's generic placeholder of 8.

Recommended initial cap:

acp.maxConcurrentSessions = 2

Reason:

enough for one Codex and one Claude session at the same time
low enough to reduce memory pressure and noisy contention on the same laptop-class host
if operators start using longer-lived persistent ACP sessions heavily, revisit this only after checking real memory pressure and swap behavior on the gateway host

Plugin Tools Bridge

The planning material discussed plugins.entries.acpx.config.pluginToolsMcpBridge, but the local 2026.3.28 bundled acpx schema does not currently expose that key in openclaw plugins inspect acpx --json.

Current stance:

treat plugin-tools bridge as unsupported unless the live runtime proves otherwise
do not add that key blindly to openclaw.json

Default Workspace Root

The default ACP workspace root for this install is:

~/.openclaw/workspace

Per-session or per-binding cwd values can narrow from there when a specific repository or skill workspace is known.

For ACP plugin/runtime config, use absolute paths instead of ~-prefixed paths.

Parity Results

Codex ACP parity

Validated directly with acpx codex against a real project worktree.

Observed:

correct cwd
HOME=/Users/stefano
access to ~/.codex
access to ~/.openclaw/workspace
access to installed Codex skills under ~/.codex/skills
persistent named sessions retained state across turns
persistent named sessions retained state across an OpenClaw gateway restart

Assessment:

Codex ACP is close enough to local terminal behavior for rollout

Claude Code ACP parity

Validated directly with acpx claude against the same project worktree.

Observed:

correct cwd
HOME=/Users/stefano
access to ~/.claude
access to ~/.codex when explicitly tested with shell commands
persistent named sessions retained state across turns
persistent named sessions retained state across an OpenClaw gateway restart

Known defect:

the Claude ACP adapter emits an extra session/update validation error after otherwise successful turns:
- Invalid params
- sessionUpdate: 'usage_update'

Assessment:

Claude ACP is usable, but noisier than Codex
this is an adapter/protocol mismatch to monitor, not a rollout blocker for trusted operators

ACPX Override Decision

Decision:

do not add ~/.acpx/config.json agent overrides for Codex or Claude right now

Why:

Codex parity already passes with the stock alias path
swapping Claude from the deprecated package name to @agentclientprotocol/claude-agent-acp@0.24.2 did not remove the session/update validation noise
raw local codex and claude CLIs are not drop-in ACP servers, so an override would add maintenance cost without delivering materially better parity

Natural-Language Routing Policy

The main agent is instructed to:

stay native as the orchestrator
use sessions_spawn with runtime: "acp" when the user explicitly asks for Codex or Claude Code
choose agentId: "codex" or agentId: "claude" accordingly
use one-shot ACP runs for single tasks
use persistent ACP sessions only when the user clearly wants continued context
avoid silent fallback to ordinary local exec when ACP was explicitly requested

The live messaging tool surface had to be extended to expose:

sessions_spawn
sessions_yield

without widening the whole profile beyond what was needed.

Binding Policy

First-wave binding policy is intentionally conservative:

no broad top-level persistent bindings[]
no automatic permanent channel/topic binds
prefer on-demand ACP spawn from the current conversation
only introduce persistent binds later if there is a clear operator need

Channel-specific note:

WhatsApp does not support ACP thread-bound spawn in the tested path
use current-conversation or one-shot ACP behavior there, not thread-bound ACP assumptions

Smoke-Test Findings

What worked:

direct acpx codex runs
direct acpx claude runs
mixed Codex + Claude ACPX runs in parallel
persistent ACPX named sessions
named-session recall after a gateway restart

What failed and why:

channel-less CLI-driven openclaw agent tests can fail ACP spawn with:
- Channel is required when multiple channels are configured: telegram, whatsapp, bluebubbles
this is a context issue, not a backend-registration issue
synthetic CLI sessions are not a perfect substitute for a real inbound channel conversation when testing current-conversation ACP spawn

Operational interpretation:

ACP backend + harness parity are good enough for rollout
final operator confidence should still come from a real inbound Telegram or WhatsApp conversation, not only a synthetic CLI turn

Fallback Decision