# us-cpa `us-cpa` is a Python CLI plus OpenClaw skill wrapper for U.S. federal individual tax work. ## Standalone package usage From `skills/us-cpa/`: ```bash pip install -e .[dev] us-cpa --help ``` Without installing, the repo-local wrapper works directly: ```bash skills/us-cpa/scripts/us-cpa --help ``` ## Current Milestone Current implementation now includes: - deterministic cache layout under `~/.cache/us-cpa` by default - `fetch-year` download flow for the bootstrap IRS corpus - source manifest with URL, hash, authority rank, and local path traceability - primary-law URL building for IRC and Treasury regulation escalation - case-folder intake, document registration, and machine-usable fact extraction from JSON, text, and PDF inputs - question workflow with conversation and memo output - prepare workflow for the current supported multi-form 1040 package - review workflow with findings-first output - fillable-PDF first rendering with overlay fallback - e-file-ready draft export payload generation ## CLI Surface ```bash skills/us-cpa/scripts/us-cpa question --question "What is the standard deduction?" --tax-year 2025 skills/us-cpa/scripts/us-cpa question --question "What is the standard deduction?" --tax-year 2025 --style memo --format markdown skills/us-cpa/scripts/us-cpa prepare --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe skills/us-cpa/scripts/us-cpa review --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe skills/us-cpa/scripts/us-cpa fetch-year --tax-year 2025 skills/us-cpa/scripts/us-cpa extract-docs --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe --create-case --case-label "Jane Doe" --facts-json ./facts.json skills/us-cpa/scripts/us-cpa render-forms --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe skills/us-cpa/scripts/us-cpa export-efile-ready --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe ``` ## Tax-Year Cache Default cache root: ```text ~/.cache/us-cpa ``` Override for isolated runs: ```bash US_CPA_CACHE_DIR=/tmp/us-cpa-cache skills/us-cpa/scripts/us-cpa fetch-year --tax-year 2025 ``` Current `fetch-year` bootstrap corpus for tax year `2025` is verified against live IRS `irs-prior` PDFs for: - Form 1040 - Schedules 1, 2, 3, A, B, C, D, E, SE, and 8812 - Forms 8949, 4562, 4797, 6251, 8606, 8863, 8889, 8959, 8960, 8995, 8995-A, 5329, 5695, and 1116 - General Form 1040 instructions and selected schedule/form instructions Current bundled tax-year computation data: - 2024 - 2025 Other years fetch/source correctly, but deterministic return calculations currently stop with an explicit unsupported-year error until rate tables are added. Adding a new supported year is a deliberate data-table change in `tax_years.py`, not an automatic runtime discovery step. That is intentional for tax-engine correctness. ## Interaction Model - `question` - stateless by default - optional case context - `prepare` - requires a case directory - if none exists, OpenClaw should ask whether to create one and where - `review` - requires a case directory - can operate on an existing or newly-created review case ## Planned Case Layout ```text / input/ extracted/ return/ output/ reports/ issues/ sources/ ``` Current implementation writes: - `case-manifest.json` - `extracted/facts.json` - `issues/open-issues.json` ## Intake Flow Current `extract-docs` supports: - `--create-case` - `--case-label` - `--facts-json ` - repeated `--input-file ` Behavior: - creates the full case directory layout when `--create-case` is used - copies input documents into `input/` - stores normalized facts with source metadata in `extracted/facts.json` - extracts machine-usable facts from JSON/text/PDF documents where supported - appends document registry entries to `case-manifest.json` - stops with a structured issue and non-zero exit if a new fact conflicts with an existing stored fact ## Output Contract - JSON by default - markdown available with `--format markdown` - `question` supports: - `--style conversation` - `--style memo` - `question` emits answered analysis output - `prepare` emits a prepared return package summary - `export-efile-ready` emits a draft e-file-ready payload - `review` emits a findings-first review result - `fetch-year` emits a downloaded manifest location and source count ## Question Engine Current `question` implementation: - loads the cached tax-year corpus - searches a small IRS-first topical rule set - returns one canonical analysis object - renders that analysis as: - conversational output - memo output - marks questions outside the current topical rule set as requiring primary-law escalation Current implemented topics: - standard deduction - Schedule C / sole proprietorship reporting trigger - Schedule D / capital gains reporting trigger - Schedule E / rental income reporting trigger ## Form Rendering Current rendering path: - official IRS PDFs from the cached tax-year corpus - deterministic field-fill when usable AcroForm fields are present - overlay rendering onto those official PDFs using `reportlab` + `pypdf` as fallback - artifact manifest written to `output/artifacts.json` Current rendered form support: - field-fill support for known mapped fillable forms - overlay generation for the current required-form set resolved by the return model Current review rule: - field-filled artifacts are not automatically flagged for review - overlay-rendered artifacts are marked `reviewRequired: true` Overlay coordinates are currently a fallback heuristic and are not treated as line-perfect authoritative field maps. Overlay output must be visually reviewed before any filing/export handoff. ## Preparation Workflow Current `prepare` implementation: - loads case facts from `extracted/facts.json` - normalizes them into the current supported federal return model - preserves source provenance for normalized values - computes the current supported 1040 package - resolves required forms across the current supported subset - writes: - `return/normalized-return.json` - `output/artifacts.json` - `reports/prepare-summary.json` Current supported calculation inputs: - `filingStatus` - `spouse.fullName` - `dependents` - `wages` - `taxableInterest` - `businessIncome` - `capitalGainLoss` - `rentalIncome` - `federalWithholding` - `itemizedDeductions` - `hsaContribution` - `educationCredit` - `foreignTaxCredit` - `qualifiedBusinessIncome` - `traditionalIraBasis` - `additionalMedicareTax` - `netInvestmentIncomeTax` - `alternativeMinimumTax` - `additionalTaxPenalty` - `energyCredit` - `depreciationExpense` - `section1231GainLoss` ## E-file-ready Export `export-efile-ready` writes: - `output/efile-ready.json` Current export behavior: - draft-only - includes required forms - includes refund or balance due summary - includes attachment manifest - includes unresolved issues ## Review Workflow Current `review` implementation: - recomputes the return from current case facts - compares stored normalized return values to recomputed values - flags source-fact mismatches for key income fields - flags likely omitted income when document-extracted facts support an amount the stored return omits - checks whether required rendered artifacts are present - flags high-complexity forms for specialist follow-up - flags overlay-rendered artifacts as requiring human review - sorts findings by severity Current render modes: - `--style conversation` - `--style memo` ## Scope Rules - U.S. federal individual returns only in v1 - official IRS artifacts are the target output for compiled forms - conflicting facts must stop the workflow for user resolution ## Authority Ranking Current authority classes are ranked to preserve source hierarchy: - IRS forms - IRS instructions - IRS publications - IRS FAQs - Internal Revenue Code - Treasury regulations - other primary authority Later research and review flows should consume this ranking rather than inventing their own.