Files

18 KiB
Raw Permalink Blame History

ai-cli-dispatch Architecture

This document describes the internal design of ai-cli-dispatch, the module breakdown, data flow, key design decisions, and how to extend the tool.

Module Breakdown

src/
├── cli.ts          — Entry point: argument parsing, command routing, I/O formatting
├── cli-helpers.ts  — Shared formatting, sync/async run handlers, error reporters
├── types.ts        — Shared types and error classes
├── constants.ts    — Client name registry and platform helpers
├── config.ts       — Layered configuration resolution (flags → env → file → PATH)
├── detect.ts       — Client discovery: binary lookup and version extraction
├── dispatch.ts     — Prompt-to-client resolution (explicit flag → keywords → default)
├── execute.ts      — Synchronous subprocess spawning, stdout/stderr capture, timeout handling
└── jobs.ts         — Async job lifecycle: detached spawn, disk-backed state, polling API

Responsibilities

Module Responsibility
cli.ts Parses argv with minimist, routes to all commands, prints JSON or text output, and controls the process exit code.
cli-helpers.ts Shared helpers for reportError, reportCliError, handleSyncRun, and handleAsyncRun to keep cli.ts focused on routing.
types.ts Defines ClientName, ClientInfo, ExecResult, ToolConfig, Job, JobRecord, JobStatus, and the error hierarchy (ClientNotFoundError, ExecError, JobNotFoundError, JobResultUnavailableError).
constants.ts Holds the canonical CLIENT_NAMES array and isWindows() helper used by discovery and config.
config.ts Resolves per-client binary paths and the optional defaultClient from four layered sources.
detect.ts Locates each client binary on PATH, falls back to a manual directory scan, and invokes --version to extract a semver string.
dispatch.ts Chooses the target client from a prompt string using ordered keyword matching, with overrides for explicit --client and defaultClient.
execute.ts Spawns the chosen client with its native argument shape, buffers stdout/stderr, enforces a timeout, and returns an ExecResult or throws a typed error.
jobs.ts Manages background jobs: writes job records to disk, spawns detached child processes, tracks running children in memory, and provides status, results, cancel, list, and cleanup operations.

Data Flow

Synchronous dispatch (run --sync, dispatch --sync)

A sync invocation flows through four stages:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   detect    │ ──► │   config    │ ──► │  dispatch   │ ──► │   execute   │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘
     │                    │                    │                    │
  which/where        flags/env/file       keyword scan         spawn child
  PATH walk          defaultClient        --client override    capture output
  --version                               fallback default     timeout / exitCode

Asynchronous dispatch (run, dispatch, start)

An async invocation adds the jobs.ts stage. The caller receives a job ID immediately; the child process continues in the background.

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   detect    │ ──► │   config    │ ──► │  dispatch   │ ──► │   execute   │ ──► │    jobs     │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘
     │                    │                    │                    │                    │
  which/where        flags/env/file       keyword scan         spawn child       write job file
  PATH walk          defaultClient        --client override    capture output    detached + unref
  --version                               fallback default     timeout / exitCode  update on close

Later, lifecycle commands read from or modify the job store:

  status <jobId>  ──►  readJobFile  ──►  return Job (sans stdout/stderr)
  results <jobId> ──►  readJobFile  ──►  return ExecResult (completed only)
  cancel <jobId>  ──►  readJobFile  ──►  kill child or PID  ──►  write cancelled status
  list-jobs       ──►  readdir jobDir ──►  read each file  ──►  sort + filter
  cleanup-jobs    ──►  readdir jobDir ──►  stat mtime       ──►  unlink old files

1. Detect

detectClients() iterates over CLIENT_NAMES and attempts to locate each binary:

  1. Invoke which <name> (or where <name> on Windows).
  2. If that fails, walk PATH segments manually and test existsSync().
  3. If a binary is found, run <binary> --version and parse the first semver-like match.

Result: an array of ClientInfo objects with name, found, path, and version.

2. Config

resolveConfig() builds a ResolvedConfig by layering sources (highest to lowest precedence):

  1. CLI flags--codex-path, --claude-path, --opencode-path, --default-client, --timeout
  2. Environment variablesAI_CLI_CODEX_PATH, AI_CLI_CLAUDE_PATH, AI_CLI_OPENCODE_PATH, AI_CLI_DEFAULT_CLIENT
  3. Config file~/.openclaw/ai-cli-dispatch.json (paths, defaultClient, timeout keys)
  4. PATH discoverywhich/where fallback via defaultWhichSync()

Only values for the three known ClientName entries are accepted; unknown defaultClient values are ignored.

3. Dispatch

resolveClient(prompt, config) decides which client to use:

  1. If config.client is a valid ClientName, return it immediately.
  2. Lower-case the prompt and scan for substrings in order:
    • "open code"opencode
    • "claude"claude
    • "codex"codex
    • "opencode"opencode
  3. If no keyword matches, return config.defaultClient or null.

This ordering intentionally prioritizes "open code" before "opencode" so the spaced natural-language variant wins.

4. Execute

executePrompt(client, prompt, options) runs the selected client synchronously:

  1. Reject empty or whitespace-only prompts with ExecError.
  2. Validate that an explicit clientPath exists on disk (if provided).
  3. Map the client to its native argument array via CLIENT_ARGS:
    • codex["exec", "--yolo", prompt]
    • claude["-p", prompt, "--dangerously-skip-permissions"]
    • opencode["run", "--dangerously-skip-permissions", prompt]
  4. spawn() the process with shell: false.
  5. Buffer stdout and stderr via "data" listeners.
  6. Start a setTimeout; if it fires, child.kill() is sent.
  7. On close, resolve with { stdout, stderr, exitCode, client, durationMs }.
  8. On error, reject with ClientNotFoundError for ENOENT or ExecError for anything else.
  9. On timeout, reject with ExecError containing the buffered output so far.
  10. If debug is enabled, emit a DebugInfo object via onDebug.

The default timeout is 10 minutes (600_000 ms).

5. Jobs

startJob(client, prompt, options) launches a background job:

  1. Generate a UUID for the job ID.
  2. Build the client argument array via CLIENT_ARGS.
  3. spawn() the process with detached: true and stdio: ["ignore", "pipe", "pipe"].
  4. Write an initial JobRecord to ~/.openclaw/ai-cli-dispatch/jobs/<jobId>.json with status running.
  5. Update the record with the child pid once available.
  6. Register the child in an in-memory runningChildren Map for cancellation and timeout tracking.
  7. Buffer stdout/stderr via "data" listeners.
  8. On close, finalize the record: write status (completed, failed, timed_out, or cancelled), capture stdout/stderr, and record durationMs.
  9. Call child.unref() so the dispatcher process can exit without waiting for the child.

getJob(jobId) reads the job file and returns a Job (omitting the full stdout/stderr buffers).

getJobResult(jobId) returns the ExecResult for a completed job.

cancelJob(jobId) looks up the running child in memory, sends SIGTERM, and writes a cancelled status. If the child is no longer in memory, it attempts process.kill(pid, "SIGTERM") as a fallback.

listJobs({ filter }) reads all .json files in the job directory, parses them, sorts by startedAt descending, and optionally filters by status.

cleanupJobs({ maxAgeMs }) deletes job files whose mtime exceeds the threshold. Default max age is 24 hours.

Design Decisions

Async-First Architecture

The default execution mode is async (background job). Synchronous execution requires an explicit --sync flag.

Rationale:

  • Primary use case alignment: Most AI CLI tasks (refactoring, test generation, migration) run for multiple minutes. Blocking the caller for that long is often undesirable in automation and orchestration contexts.
  • Resilience: A detached background job survives an unexpected dispatcher exit. The caller can reconnect later via status and results.
  • Batching: Multiple jobs can be started in parallel without blocking the dispatcher process.
  • Backward compatibility path: --sync preserves the original one-shot behavior for callers that need it, without changing the default.

Disk-Backed Job Store

Job state is persisted as JSON files on disk rather than kept solely in memory.

Rationale:

  • Durability across restarts: If the dispatcher process crashes or the host reboots, job files remain. A caller can still query status or results after recovery.
  • No memory leaks: Long-running or forgotten jobs do not accumulate in heap. Cleanup is explicit via cleanup-jobs.
  • External observability: Operators can inspect ~/.openclaw/ai-cli-dispatch/jobs/ directly without calling the CLI.
  • Simplicity: A file-per-job model avoids the need for an embedded database or external service. It maps cleanly to the Node.js fs API and is trivial to mock in tests.

Trade-off: High-frequency job creation could strain the filesystem, but the expected volume is low (tens to hundreds of jobs, not thousands per second).

Detached-Process Approach

Async jobs use detached: true with child.unref().

Rationale:

  • Parent independence: The dispatcher can start a job and exit immediately. This is essential for CLI usage where the user or orchestrator should not hold a shell open for the duration of the task.
  • Signal isolation: A detached process group means the child does not receive SIGINT or SIGHUP sent to the dispatcher terminal session.
  • PID tracking: Even though the child is detached, the pid is captured and written to the job file. This enables cancelJob to send signals even if the dispatcher has restarted and lost its in-memory runningChildren map.

Trade-off: The child is truly independent. If the host reboots, the child is lost (same as any other process). The job file will eventually reflect timed_out or remain running until cancel or cleanup is run.

Coexistence with ACP

ai-cli-dispatch is intentionally not an ACP agent. It is a thin, local subprocess wrapper with no session state, no thread binding, and no orchestrator protocol.

  • Use ai-cli-dispatch when you need a quick, one-shot CLI execution or a background job on the gateway host.
  • Use ACP (docs/openclaw-acp-orchestration.md) when you need session-bound coding harnesses, multi-turn review, or orchestrator-managed verification gates.

This separation keeps the dispatcher small and avoids duplicating ACPs scheduling, context persistence, and review-loop responsibilities.

Keyword Dispatch vs NLP

Client resolution uses deterministic substring matching instead of natural-language parsing or an LLM call.

Rationale:

  • Speed: No network round-trip or model load; resolution is synchronous and sub-millisecond.
  • Predictability: The same prompt always resolves to the same client. There is no temperature, context window, or model-version drift.
  • Debuggability: A user can read the ordered keyword list and know exactly why a given prompt resolved to a given client.
  • Scope fit: The dispatcher only needs to distinguish three clients. A full NLP pipeline would be overkill.

The trade-off is that prompts like "compare codex and claude" resolve to codex because "codex" is checked first. Users can always override with --client.

Error Taxonomy

All runtime failures are represented as typed errors so callers and tests can branch precisely:

Error When thrown Data carried
ClientNotFoundError Binary not on PATH, explicit clientPath missing, or ENOENT from spawn message with client name
ExecError Empty prompt, unknown client, timeout, non-ENOENT spawn error, or child exit message + full ExecResult (stdout, stderr, exitCode, client, durationMs)
JobNotFoundError Job ID not found in the job store message with job ID
JobResultUnavailableError results called on a non-completed job message with job ID and current status

ExecError carries the ExecResult so that timeout handlers still return partial output. This avoids losing buffered stdout/stderr when a long-running task is killed.

Injection-Friendly Module Boundaries

Every non-trivial module accepts an options bag with injectable dependencies (spawnSync, spawn, existsSync, whichSync, readFileSync, etc.).

Rationale:

  • Unit tests can run without touching the real filesystem, PATH, or subprocess layer.
  • The CLI itself injects its real dependencies through default parameters, so production behavior is unchanged.
  • There is no global mocking required; each test provides its own narrow fakes.

Minimal Dependency Surface

The runtime dependency graph contains exactly one external package: minimist (argument parsing). Everything else uses Node.js built-ins (child_process, fs, os, path, crypto).

Rationale:

  • Reduces supply-chain risk and install time.
  • Avoids version-lock issues across Node.js 20+ environments.
  • Keeps the compiled/bundled footprint negligible for a tool that is often installed as a sidecar.

Extension Points

Adding a New Client

To support a fourth (or fifth) AI CLI client, change four files in src/ and the corresponding tests:

  1. src/types.ts — Add the new name to the ClientName union type.
  2. src/constants.ts — Append the new name to CLIENT_NAMES.
  3. src/execute.ts — Add an entry to CLIENT_ARGS with the clients native argument shape.
  4. src/config.ts — No change required; the existing loop over CLIENT_NAMES automatically picks up the new env/flag/file keys.
  5. src/dispatch.ts — Add a keyword check for the new client in resolveClient. Decide its precedence relative to existing keywords.
  6. src/jobs.ts — No change required; CLIENT_ARGS is already shared.
  7. Tests — Add colocated test cases in tests/dispatch.test.ts, tests/execute.test.ts, tests/detect.test.ts, and tests/jobs.test.ts.

No changes are needed in cli.ts because it iterates over CLIENT_NAMES for validation.

Streaming Support

If a future use case requires real-time output (e.g., long-running codegen with progressive feedback), the cleanest extension is to add an optional onData callback to ExecuteOptions:

export interface ExecuteOptions {
  clientPath?: string;
  timeoutMs?: number;
  spawn?: ...;
  existsSync?: ...;
  onData?: (chunk: string, stream: "stdout" | "stderr") => void;
}

When onData is provided, executePrompt would:

  • Continue buffering internally for the final ExecResult.
  • Also emit each chunk through onData so the caller can stream to a UI or logger.
  • Reject/resolve with the same error taxonomy.

This preserves backward compatibility: existing callers that omit onData receive the exact same buffered ExecResult they get today.

For async jobs, jobs.ts could store a partial stdout/stderr in the job file on each chunk (or at a throttled interval) so status callers can see progress without waiting for completion.

Platform Backends

The current Windows support is limited to discovery (where instead of which, .exe extension assumptions). If future clients require platform-specific spawn options (e.g., PowerShell quoting rules), the extension point is CLIENT_ARGS or a new CLIENT_SPAWN_OPTIONS record keyed by ClientName.

Testing Strategy

The test suite in tests/ mirrors the src/ structure:

Test file Coverage
cli.test.ts Argument parsing, command routing, JSON/text output modes, exit codes, error formatting, sync vs async branches, all job lifecycle commands
cli-helpers.test.ts reportError, reportCliError, handleSyncRun, handleAsyncRun with JSON and text modes
config.test.ts Layered precedence of flags, env, file, and which fallback; malformed JSON tolerance
detect.test.ts which success/failure, PATH directory fallback, version parsing, missing binary handling
dispatch.test.ts Keyword matching, case insensitivity, --client precedence, defaultClient fallback, invalid flag handling
execute.test.ts Successful execution, stderr capture, non-zero exit codes, ENOENTClientNotFoundError, timeout, empty prompt rejection, special-character preservation, debug info emission
jobs.test.ts Job start, status query, result retrieval, cancellation, listing, cleanup, timeout handling, unknown client fallback, detached process behavior, in-memory vs on-disk consistency

All tests use injected mocks; no test spawns real client binaries or reads the real filesystem.