# us-cpa

`us-cpa` is a Python CLI plus OpenClaw skill wrapper for U.S. federal individual tax work.

## Standalone package usage

From `skills/us-cpa/`:

```bash
pip install -e .[dev]
us-cpa --help
```

Without installing, the repo-local wrapper works directly:

```bash
skills/us-cpa/scripts/us-cpa --help
```

## Current Milestone

Current implementation now includes:

- deterministic cache layout under `~/.cache/us-cpa` by default
- `fetch-year` download flow for the bootstrap IRS corpus
- source manifest with URL, hash, authority rank, and local path traceability
- primary-law URL building for IRC and Treasury regulation escalation
- case-folder intake, document registration, and machine-usable fact extraction from JSON, text, and PDF inputs
- question workflow with conversation and memo output
- prepare workflow for the current supported multi-form 1040 package
- review workflow with findings-first output
- fillable-PDF first rendering with overlay fallback
- e-file-ready draft export payload generation

## CLI Surface

```bash
skills/us-cpa/scripts/us-cpa question --question "What is the standard deduction?" --tax-year 2025
skills/us-cpa/scripts/us-cpa question --question "What is the standard deduction?" --tax-year 2025 --style memo --format markdown
skills/us-cpa/scripts/us-cpa prepare --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe
skills/us-cpa/scripts/us-cpa review --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe
skills/us-cpa/scripts/us-cpa fetch-year --tax-year 2025
skills/us-cpa/scripts/us-cpa extract-docs --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe --create-case --case-label "Jane Doe" --facts-json ./facts.json
skills/us-cpa/scripts/us-cpa render-forms --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe
skills/us-cpa/scripts/us-cpa export-efile-ready --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe
```

## Tax-Year Cache

Default cache root:

```text
~/.cache/us-cpa
```

Override for isolated runs:

```bash
US_CPA_CACHE_DIR=/tmp/us-cpa-cache skills/us-cpa/scripts/us-cpa fetch-year --tax-year 2025
```

Current `fetch-year` bootstrap corpus for tax year `2025` is verified against live IRS `irs-prior` PDFs for:

- Form 1040
- Schedules 1, 2, 3, A, B, C, D, E, SE, and 8812
- Forms 8949, 4562, 4797, 6251, 8606, 8863, 8889, 8959, 8960, 8995, 8995-A, 5329, 5695, and 1116
- General Form 1040 instructions and selected schedule/form instructions

Current bundled tax-year computation data:

- 2024
- 2025

Other years fetch/source correctly, but deterministic return calculations currently stop with an explicit unsupported-year error until rate tables are added.

Adding a new supported year is a deliberate data-table change in `tax_years.py`, not an automatic runtime discovery step. That is intentional for tax-engine correctness.

## Interaction Model

- `question`
  - stateless by default
  - optional case context
- `prepare`
  - requires a case directory
  - if none exists, OpenClaw should ask whether to create one and where
- `review`
  - requires a case directory
  - can operate on an existing or newly-created review case

## Planned Case Layout

```text
<case-dir>/
  input/
  extracted/
  return/
  output/
  reports/
  issues/
  sources/
```

Current implementation writes:

- `case-manifest.json`
- `extracted/facts.json`
- `issues/open-issues.json`

## Intake Flow

Current `extract-docs` supports:

- `--create-case`
- `--case-label`
- `--facts-json <path>`
- repeated `--input-file <path>`

Behavior:

- creates the full case directory layout when `--create-case` is used
- copies input documents into `input/`
- stores normalized facts with source metadata in `extracted/facts.json`
- extracts machine-usable facts from JSON/text/PDF documents where supported
- appends document registry entries to `case-manifest.json`
- stops with a structured issue and non-zero exit if a new fact conflicts with an existing stored fact

## Output Contract

- JSON by default
- markdown available with `--format markdown`
- `question` supports:
  - `--style conversation`
  - `--style memo`
- `question` emits answered analysis output
- `prepare` emits a prepared return package summary
- `export-efile-ready` emits a draft e-file-ready payload
- `review` emits a findings-first review result
- `fetch-year` emits a downloaded manifest location and source count

## Question Engine

Current `question` implementation:

- loads the cached tax-year corpus
- searches a small IRS-first topical rule set
- returns one canonical analysis object
- renders that analysis as:
  - conversational output
  - memo output
- marks questions outside the current topical rule set as requiring primary-law escalation

Current implemented topics:

- standard deduction
- Schedule C / sole proprietorship reporting trigger
- Schedule D / capital gains reporting trigger
- Schedule E / rental income reporting trigger

## Form Rendering

Current rendering path:

- official IRS PDFs from the cached tax-year corpus
- deterministic field-fill when usable AcroForm fields are present
- overlay rendering onto those official PDFs using `reportlab` + `pypdf` as fallback
- artifact manifest written to `output/artifacts.json`

Current rendered form support:

- field-fill support for known mapped fillable forms
- overlay generation for the current required-form set resolved by the return model

Current review rule:

- field-filled artifacts are not automatically flagged for review
- overlay-rendered artifacts are marked `reviewRequired: true`

Overlay coordinates are currently a fallback heuristic and are not treated as line-perfect authoritative field maps. Overlay output must be visually reviewed before any filing/export handoff.

## Preparation Workflow

Current `prepare` implementation:

- loads case facts from `extracted/facts.json`
- normalizes them into the current supported federal return model
- preserves source provenance for normalized values
- computes the current supported 1040 package
- resolves required forms across the current supported subset
- writes:
  - `return/normalized-return.json`
  - `output/artifacts.json`
  - `reports/prepare-summary.json`

Current supported calculation inputs:

- `filingStatus`
- `spouse.fullName`
- `dependents`
- `wages`
- `taxableInterest`
- `businessIncome`
- `capitalGainLoss`
- `rentalIncome`
- `federalWithholding`
- `itemizedDeductions`
- `hsaContribution`
- `educationCredit`
- `foreignTaxCredit`
- `qualifiedBusinessIncome`
- `traditionalIraBasis`
- `additionalMedicareTax`
- `netInvestmentIncomeTax`
- `alternativeMinimumTax`
- `additionalTaxPenalty`
- `energyCredit`
- `depreciationExpense`
- `section1231GainLoss`

## E-file-ready Export

`export-efile-ready` writes:

- `output/efile-ready.json`

Current export behavior:

- draft-only
- includes required forms
- includes refund or balance due summary
- includes attachment manifest
- includes unresolved issues

## Review Workflow

Current `review` implementation:

- recomputes the return from current case facts
- compares stored normalized return values to recomputed values
- flags source-fact mismatches for key income fields
- flags likely omitted income when document-extracted facts support an amount the stored return omits
- checks whether required rendered artifacts are present
- flags high-complexity forms for specialist follow-up
- flags overlay-rendered artifacts as requiring human review
- sorts findings by severity

Current render modes:

- `--style conversation`
- `--style memo`

## Scope Rules

- U.S. federal individual returns only in v1
- official IRS artifacts are the target output for compiled forms
- conflicting facts must stop the workflow for user resolution

## Authority Ranking

Current authority classes are ranked to preserve source hierarchy:

- IRS forms
- IRS instructions
- IRS publications
- IRS FAQs
- Internal Revenue Code
- Treasury regulations
- other primary authority

Later research and review flows should consume this ranking rather than inventing their own.