stef-openclaw-skills/docs/us-cpa.md

# us-cpa

`us-cpa` is a Python CLI plus OpenClaw skill wrapper for U.S. federal individual tax work.

## Standalone package usage

From `skills/us-cpa/`:

```bash
python3 -m ensurepip --upgrade
python3 -m pip install --upgrade pip setuptools wheel
python3 -m pip install -e '.[dev]'
us-cpa --help
```

Without installing, the repo-local wrapper works directly:

```bash
skills/us-cpa/scripts/us-cpa --help
```

## OpenClaw installation

To install the skill for OpenClaw itself, copy the repo skill into the workspace skill directory and install its Python dependencies there.

1. Sync the repo copy into the workspace:

```bash
rsync -a --delete \
  ~/.openclaw/workspace/projects/stef-openclaw-skills/skills/us-cpa/ \
  ~/.openclaw/workspace/skills/us-cpa/
```

2. Create a workspace-local virtualenv and install the package:

```bash
cd ~/.openclaw/workspace/skills/us-cpa
python3 -m venv .venv
. .venv/bin/activate
python3 -m ensurepip --upgrade
python3 -m pip install --upgrade pip setuptools wheel
python3 -m pip install -e '.[dev]'
```

3. Verify the installed workspace wrapper:

```bash
~/.openclaw/workspace/skills/us-cpa/scripts/us-cpa --help
```

The wrapper prefers `.venv/bin/python` inside the skill directory when present, so OpenClaw can run the workspace copy without relying on global Python packages.

## Current Milestone

Current implementation now includes:

- deterministic cache layout under `~/.cache/us-cpa` by default
- `fetch-year` download flow for the bootstrap IRS corpus
- source manifest with URL, hash, authority rank, and local path traceability
- primary-law URL building for IRC and Treasury regulation escalation
- case-folder intake, document registration, and machine-usable fact extraction from JSON, text, and PDF inputs
- question workflow with conversation and memo output
- prepare workflow for the current supported multi-form 1040 package
- review workflow with findings-first output
- fillable-PDF first rendering with overlay fallback
- e-file-ready draft export payload generation

## CLI Surface

```bash
skills/us-cpa/scripts/us-cpa question --question "What is the standard deduction?" --tax-year 2025
skills/us-cpa/scripts/us-cpa question --question "What is the standard deduction?" --tax-year 2025 --style memo --format markdown
skills/us-cpa/scripts/us-cpa prepare --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe
skills/us-cpa/scripts/us-cpa review --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe
skills/us-cpa/scripts/us-cpa fetch-year --tax-year 2025
skills/us-cpa/scripts/us-cpa extract-docs --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe --create-case --case-label "Jane Doe" --facts-json ./facts.json
skills/us-cpa/scripts/us-cpa render-forms --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe
skills/us-cpa/scripts/us-cpa export-efile-ready --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe
```

## Tax-Year Cache

Default cache root:

```text
~/.cache/us-cpa
```

Override for isolated runs:

```bash
US_CPA_CACHE_DIR=/tmp/us-cpa-cache skills/us-cpa/scripts/us-cpa fetch-year --tax-year 2025
```

Current `fetch-year` bootstrap corpus for tax year `2025` is verified against live IRS `irs-prior` PDFs for:

- Form 1040
- Schedules 1, 2, 3, A, B, C, D, E, SE, and 8812
- Forms 8949, 4562, 4797, 6251, 8606, 8863, 8889, 8959, 8960, 8995, 8995-A, 5329, 5695, and 1116
- General Form 1040 instructions and selected schedule/form instructions

Current bundled tax-year computation data:

- 2024
- 2025

Other years fetch/source correctly, but deterministic return calculations currently stop with an explicit unsupported-year error until rate tables are added.

Adding a new supported year is a deliberate data-table change in `tax_years.py`, not an automatic runtime discovery step. That is intentional for tax-engine correctness.

## Interaction Model

- `question`
  - stateless by default
  - optional case context
- `prepare`
  - requires a case directory
  - if none exists, OpenClaw should ask whether to create one and where
- `review`
  - requires a case directory
  - can operate on an existing or newly-created review case

## Planned Case Layout

```text
<case-dir>/
  input/
  extracted/
  return/
  output/
  reports/
  issues/
  sources/
```

Current implementation writes:

- `case-manifest.json`
- `extracted/facts.json`
- `issues/open-issues.json`

## Intake Flow

Current `extract-docs` supports:

- `--create-case`
- `--case-label`
- `--facts-json <path>`
- repeated `--input-file <path>`

Behavior:

- creates the full case directory layout when `--create-case` is used
- copies input documents into `input/`
- stores normalized facts with source metadata in `extracted/facts.json`
- extracts machine-usable facts from JSON/text/PDF documents where supported
- appends document registry entries to `case-manifest.json`
- stops with a structured issue and non-zero exit if a new fact conflicts with an existing stored fact

## Output Contract

- JSON by default
- markdown available with `--format markdown`
- `question` supports:
  - `--style conversation`
  - `--style memo`
- `question` emits answered analysis output
- `prepare` emits a prepared return package summary
- `export-efile-ready` emits a draft e-file-ready payload
- `review` emits a findings-first review result
- `fetch-year` emits a downloaded manifest location and source count

## Question Engine

Current `question` implementation:

- loads the cached tax-year corpus
- searches a small IRS-first topical rule set
- returns one canonical analysis object
- renders that analysis as:
  - conversational output
  - memo output
- marks questions outside the current topical rule set as requiring primary-law escalation

Current implemented topics:

- standard deduction
- Schedule C / sole proprietorship reporting trigger
- Schedule D / capital gains reporting trigger
- Schedule E / rental income reporting trigger

## Form Rendering

Current rendering path:

- official IRS PDFs from the cached tax-year corpus
- deterministic field-fill when usable AcroForm fields are present
- overlay rendering onto those official PDFs using `reportlab` + `pypdf` as fallback
- artifact manifest written to `output/artifacts.json`

Current rendered form support:

- field-fill support for known mapped fillable forms
- overlay generation for the current required-form set resolved by the return model

Current review rule:

- field-filled artifacts are not automatically flagged for review
- overlay-rendered artifacts are marked `reviewRequired: true`

Overlay coordinates are currently a fallback heuristic and are not treated as line-perfect authoritative field maps. Overlay output must be visually reviewed before any filing/export handoff.

## Preparation Workflow

Current `prepare` implementation:

- loads case facts from `extracted/facts.json`
- normalizes them into the current supported federal return model
- preserves source provenance for normalized values
- computes the current supported 1040 package
- resolves required forms across the current supported subset
- writes:
  - `return/normalized-return.json`
  - `output/artifacts.json`
  - `reports/prepare-summary.json`

Current supported calculation inputs:

- `filingStatus`
- `spouse.fullName`
- `dependents`
- `wages`
- `taxableInterest`
- `businessIncome`
- `capitalGainLoss`
- `rentalIncome`
- `federalWithholding`
- `itemizedDeductions`
- `hsaContribution`
- `educationCredit`
- `foreignTaxCredit`
- `qualifiedBusinessIncome`
- `traditionalIraBasis`
- `additionalMedicareTax`
- `netInvestmentIncomeTax`
- `alternativeMinimumTax`
- `additionalTaxPenalty`
- `energyCredit`
- `depreciationExpense`
- `section1231GainLoss`

## E-file-ready Export

`export-efile-ready` writes:

- `output/efile-ready.json`

Current export behavior:

- draft-only
- includes required forms
- includes refund or balance due summary
- includes attachment manifest
- includes unresolved issues

## Review Workflow

Current `review` implementation:

- recomputes the return from current case facts
- compares stored normalized return values to recomputed values
- flags source-fact mismatches for key income fields
- flags likely omitted income when document-extracted facts support an amount the stored return omits
- checks whether required rendered artifacts are present
- flags high-complexity forms for specialist follow-up
- flags overlay-rendered artifacts as requiring human review
- sorts findings by severity

Current render modes:

- `--style conversation`
- `--style memo`

## Scope Rules

- U.S. federal individual returns only in v1
- official IRS artifacts are the target output for compiled forms
- conflicting facts must stop the workflow for user resolution

## Authority Ranking

Current authority classes are ranked to preserve source hierarchy:

- IRS forms
- IRS instructions
- IRS publications
- IRS FAQs
- Internal Revenue Code
- Treasury regulations
- other primary authority

Later research and review flows should consume this ranking rather than inventing their own.