18 KiB
ai-cli-dispatch Architecture
This document describes the internal design of ai-cli-dispatch, the module breakdown, data flow, key design decisions, and how to extend the tool.
Module Breakdown
src/
├── cli.ts — Entry point: argument parsing, command routing, I/O formatting
├── cli-helpers.ts — Shared formatting, sync/async run handlers, error reporters
├── types.ts — Shared types and error classes
├── constants.ts — Client name registry and platform helpers
├── config.ts — Layered configuration resolution (flags → env → file → PATH)
├── detect.ts — Client discovery: binary lookup and version extraction
├── dispatch.ts — Prompt-to-client resolution (explicit flag → keywords → default)
├── execute.ts — Synchronous subprocess spawning, stdout/stderr capture, timeout handling
└── jobs.ts — Async job lifecycle: detached spawn, disk-backed state, polling API
Responsibilities
| Module | Responsibility |
|---|---|
cli.ts |
Parses argv with minimist, routes to all commands, prints JSON or text output, and controls the process exit code. |
cli-helpers.ts |
Shared helpers for reportError, reportCliError, handleSyncRun, and handleAsyncRun to keep cli.ts focused on routing. |
types.ts |
Defines ClientName, ClientInfo, ExecResult, ToolConfig, Job, JobRecord, JobStatus, and the error hierarchy (ClientNotFoundError, ExecError, JobNotFoundError, JobResultUnavailableError). |
constants.ts |
Holds the canonical CLIENT_NAMES array and isWindows() helper used by discovery and config. |
config.ts |
Resolves per-client binary paths and the optional defaultClient from four layered sources. |
detect.ts |
Locates each client binary on PATH, falls back to a manual directory scan, and invokes --version to extract a semver string. |
dispatch.ts |
Chooses the target client from a prompt string using ordered keyword matching, with overrides for explicit --client and defaultClient. |
execute.ts |
Spawns the chosen client with its native argument shape, buffers stdout/stderr, enforces a timeout, and returns an ExecResult or throws a typed error. |
jobs.ts |
Manages background jobs: writes job records to disk, spawns detached child processes, tracks running children in memory, and provides status, results, cancel, list, and cleanup operations. |
Data Flow
Synchronous dispatch (run --sync, dispatch --sync)
A sync invocation flows through four stages:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ detect │ ──► │ config │ ──► │ dispatch │ ──► │ execute │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
│ │ │ │
which/where flags/env/file keyword scan spawn child
PATH walk defaultClient --client override capture output
--version fallback default timeout / exitCode
Asynchronous dispatch (run, dispatch, start)
An async invocation adds the jobs.ts stage. The caller receives a job ID immediately; the child process continues in the background.
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ detect │ ──► │ config │ ──► │ dispatch │ ──► │ execute │ ──► │ jobs │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
│ │ │ │ │
which/where flags/env/file keyword scan spawn child write job file
PATH walk defaultClient --client override capture output detached + unref
--version fallback default timeout / exitCode update on close
Later, lifecycle commands read from or modify the job store:
status <jobId> ──► readJobFile ──► return Job (sans stdout/stderr)
results <jobId> ──► readJobFile ──► return ExecResult (completed only)
cancel <jobId> ──► readJobFile ──► kill child or PID ──► write cancelled status
list-jobs ──► readdir jobDir ──► read each file ──► sort + filter
cleanup-jobs ──► readdir jobDir ──► stat mtime ──► unlink old files
1. Detect
detectClients() iterates over CLIENT_NAMES and attempts to locate each binary:
- Invoke
which <name>(orwhere <name>on Windows). - If that fails, walk
PATHsegments manually and testexistsSync(). - If a binary is found, run
<binary> --versionand parse the first semver-like match.
Result: an array of ClientInfo objects with name, found, path, and version.
2. Config
resolveConfig() builds a ResolvedConfig by layering sources (highest to lowest precedence):
- CLI flags —
--codex-path,--claude-path,--opencode-path,--default-client,--timeout - Environment variables —
AI_CLI_CODEX_PATH,AI_CLI_CLAUDE_PATH,AI_CLI_OPENCODE_PATH,AI_CLI_DEFAULT_CLIENT - Config file —
~/.openclaw/ai-cli-dispatch.json(paths,defaultClient,timeoutkeys) - PATH discovery —
which/wherefallback viadefaultWhichSync()
Only values for the three known ClientName entries are accepted; unknown defaultClient values are ignored.
3. Dispatch
resolveClient(prompt, config) decides which client to use:
- If
config.clientis a validClientName, return it immediately. - Lower-case the prompt and scan for substrings in order:
"open code"→opencode"claude"→claude"codex"→codex"opencode"→opencode
- If no keyword matches, return
config.defaultClientornull.
This ordering intentionally prioritizes "open code" before "opencode" so the spaced natural-language variant wins.
4. Execute
executePrompt(client, prompt, options) runs the selected client synchronously:
- Reject empty or whitespace-only prompts with
ExecError. - Validate that an explicit
clientPathexists on disk (if provided). - Map the client to its native argument array via
CLIENT_ARGS:codex→["exec", "--yolo", prompt]claude→["-p", prompt, "--dangerously-skip-permissions"]opencode→["run", "--dangerously-skip-permissions", prompt]
spawn()the process withshell: false.- Buffer
stdoutandstderrvia"data"listeners. - Start a
setTimeout; if it fires,child.kill()is sent. - On
close, resolve with{ stdout, stderr, exitCode, client, durationMs }. - On
error, reject withClientNotFoundErrorforENOENTorExecErrorfor anything else. - On timeout, reject with
ExecErrorcontaining the buffered output so far. - If
debugis enabled, emit aDebugInfoobject viaonDebug.
The default timeout is 10 minutes (600_000 ms).
5. Jobs
startJob(client, prompt, options) launches a background job:
- Generate a UUID for the job ID.
- Build the client argument array via
CLIENT_ARGS. spawn()the process withdetached: trueandstdio: ["ignore", "pipe", "pipe"].- Write an initial
JobRecordto~/.openclaw/ai-cli-dispatch/jobs/<jobId>.jsonwith statusrunning. - Update the record with the child
pidonce available. - Register the child in an in-memory
runningChildrenMap for cancellation and timeout tracking. - Buffer
stdout/stderrvia"data"listeners. - On
close, finalize the record: write status (completed,failed,timed_out, orcancelled), capture stdout/stderr, and recorddurationMs. - Call
child.unref()so the dispatcher process can exit without waiting for the child.
getJob(jobId) reads the job file and returns a Job (omitting the full stdout/stderr buffers).
getJobResult(jobId) returns the ExecResult for a completed job.
cancelJob(jobId) looks up the running child in memory, sends SIGTERM, and writes a cancelled status. If the child is no longer in memory, it attempts process.kill(pid, "SIGTERM") as a fallback.
listJobs({ filter }) reads all .json files in the job directory, parses them, sorts by startedAt descending, and optionally filters by status.
cleanupJobs({ maxAgeMs }) deletes job files whose mtime exceeds the threshold. Default max age is 24 hours.
Design Decisions
Async-First Architecture
The default execution mode is async (background job). Synchronous execution requires an explicit --sync flag.
Rationale:
- Primary use case alignment: Most AI CLI tasks (refactoring, test generation, migration) run for multiple minutes. Blocking the caller for that long is often undesirable in automation and orchestration contexts.
- Resilience: A detached background job survives an unexpected dispatcher exit. The caller can reconnect later via
statusandresults. - Batching: Multiple jobs can be started in parallel without blocking the dispatcher process.
- Backward compatibility path:
--syncpreserves the original one-shot behavior for callers that need it, without changing the default.
Disk-Backed Job Store
Job state is persisted as JSON files on disk rather than kept solely in memory.
Rationale:
- Durability across restarts: If the dispatcher process crashes or the host reboots, job files remain. A caller can still query
statusorresultsafter recovery. - No memory leaks: Long-running or forgotten jobs do not accumulate in heap. Cleanup is explicit via
cleanup-jobs. - External observability: Operators can inspect
~/.openclaw/ai-cli-dispatch/jobs/directly without calling the CLI. - Simplicity: A file-per-job model avoids the need for an embedded database or external service. It maps cleanly to the Node.js
fsAPI and is trivial to mock in tests.
Trade-off: High-frequency job creation could strain the filesystem, but the expected volume is low (tens to hundreds of jobs, not thousands per second).
Detached-Process Approach
Async jobs use detached: true with child.unref().
Rationale:
- Parent independence: The dispatcher can start a job and exit immediately. This is essential for CLI usage where the user or orchestrator should not hold a shell open for the duration of the task.
- Signal isolation: A detached process group means the child does not receive
SIGINTorSIGHUPsent to the dispatcher terminal session. - PID tracking: Even though the child is detached, the
pidis captured and written to the job file. This enablescancelJobto send signals even if the dispatcher has restarted and lost its in-memoryrunningChildrenmap.
Trade-off: The child is truly independent. If the host reboots, the child is lost (same as any other process). The job file will eventually reflect timed_out or remain running until cancel or cleanup is run.
Coexistence with ACP
ai-cli-dispatch is intentionally not an ACP agent. It is a thin, local subprocess wrapper with no session state, no thread binding, and no orchestrator protocol.
- Use
ai-cli-dispatchwhen you need a quick, one-shot CLI execution or a background job on the gateway host. - Use ACP (
docs/openclaw-acp-orchestration.md) when you need session-bound coding harnesses, multi-turn review, or orchestrator-managed verification gates.
This separation keeps the dispatcher small and avoids duplicating ACP’s scheduling, context persistence, and review-loop responsibilities.
Keyword Dispatch vs NLP
Client resolution uses deterministic substring matching instead of natural-language parsing or an LLM call.
Rationale:
- Speed: No network round-trip or model load; resolution is synchronous and sub-millisecond.
- Predictability: The same prompt always resolves to the same client. There is no temperature, context window, or model-version drift.
- Debuggability: A user can read the ordered keyword list and know exactly why a given prompt resolved to a given client.
- Scope fit: The dispatcher only needs to distinguish three clients. A full NLP pipeline would be overkill.
The trade-off is that prompts like "compare codex and claude" resolve to codex because "codex" is checked first. Users can always override with --client.
Error Taxonomy
All runtime failures are represented as typed errors so callers and tests can branch precisely:
| Error | When thrown | Data carried |
|---|---|---|
ClientNotFoundError |
Binary not on PATH, explicit clientPath missing, or ENOENT from spawn |
message with client name |
ExecError |
Empty prompt, unknown client, timeout, non-ENOENT spawn error, or child exit |
message + full ExecResult (stdout, stderr, exitCode, client, durationMs) |
JobNotFoundError |
Job ID not found in the job store | message with job ID |
JobResultUnavailableError |
results called on a non-completed job |
message with job ID and current status |
ExecError carries the ExecResult so that timeout handlers still return partial output. This avoids losing buffered stdout/stderr when a long-running task is killed.
Injection-Friendly Module Boundaries
Every non-trivial module accepts an options bag with injectable dependencies (spawnSync, spawn, existsSync, whichSync, readFileSync, etc.).
Rationale:
- Unit tests can run without touching the real filesystem,
PATH, or subprocess layer. - The CLI itself injects its real dependencies through default parameters, so production behavior is unchanged.
- There is no global mocking required; each test provides its own narrow fakes.
Minimal Dependency Surface
The runtime dependency graph contains exactly one external package: minimist (argument parsing). Everything else uses Node.js built-ins (child_process, fs, os, path, crypto).
Rationale:
- Reduces supply-chain risk and install time.
- Avoids version-lock issues across Node.js 20+ environments.
- Keeps the compiled/bundled footprint negligible for a tool that is often installed as a sidecar.
Extension Points
Adding a New Client
To support a fourth (or fifth) AI CLI client, change four files in src/ and the corresponding tests:
src/types.ts— Add the new name to theClientNameunion type.src/constants.ts— Append the new name toCLIENT_NAMES.src/execute.ts— Add an entry toCLIENT_ARGSwith the client’s native argument shape.src/config.ts— No change required; the existing loop overCLIENT_NAMESautomatically picks up the new env/flag/file keys.src/dispatch.ts— Add a keyword check for the new client inresolveClient. Decide its precedence relative to existing keywords.src/jobs.ts— No change required;CLIENT_ARGSis already shared.- Tests — Add colocated test cases in
tests/dispatch.test.ts,tests/execute.test.ts,tests/detect.test.ts, andtests/jobs.test.ts.
No changes are needed in cli.ts because it iterates over CLIENT_NAMES for validation.
Streaming Support
If a future use case requires real-time output (e.g., long-running codegen with progressive feedback), the cleanest extension is to add an optional onData callback to ExecuteOptions:
export interface ExecuteOptions {
clientPath?: string;
timeoutMs?: number;
spawn?: ...;
existsSync?: ...;
onData?: (chunk: string, stream: "stdout" | "stderr") => void;
}
When onData is provided, executePrompt would:
- Continue buffering internally for the final
ExecResult. - Also emit each chunk through
onDataso the caller can stream to a UI or logger. - Reject/resolve with the same error taxonomy.
This preserves backward compatibility: existing callers that omit onData receive the exact same buffered ExecResult they get today.
For async jobs, jobs.ts could store a partial stdout/stderr in the job file on each chunk (or at a throttled interval) so status callers can see progress without waiting for completion.
Platform Backends
The current Windows support is limited to discovery (where instead of which, .exe extension assumptions). If future clients require platform-specific spawn options (e.g., PowerShell quoting rules), the extension point is CLIENT_ARGS or a new CLIENT_SPAWN_OPTIONS record keyed by ClientName.
Testing Strategy
The test suite in tests/ mirrors the src/ structure:
| Test file | Coverage |
|---|---|
cli.test.ts |
Argument parsing, command routing, JSON/text output modes, exit codes, error formatting, sync vs async branches, all job lifecycle commands |
cli-helpers.test.ts |
reportError, reportCliError, handleSyncRun, handleAsyncRun with JSON and text modes |
config.test.ts |
Layered precedence of flags, env, file, and which fallback; malformed JSON tolerance |
detect.test.ts |
which success/failure, PATH directory fallback, version parsing, missing binary handling |
dispatch.test.ts |
Keyword matching, case insensitivity, --client precedence, defaultClient fallback, invalid flag handling |
execute.test.ts |
Successful execution, stderr capture, non-zero exit codes, ENOENT → ClientNotFoundError, timeout, empty prompt rejection, special-character preservation, debug info emission |
jobs.test.ts |
Job start, status query, result retrieval, cancellation, listing, cleanup, timeout handling, unknown client fallback, detached process behavior, in-memory vs on-disk consistency |
All tests use injected mocks; no test spawns real client binaries or reads the real filesystem.