9.0 KiB
us-cpa
us-cpa is a Python CLI plus OpenClaw skill wrapper for U.S. federal individual tax work.
Standalone package usage
From skills/us-cpa/:
python3 -m ensurepip --upgrade
python3 -m pip install --upgrade pip setuptools wheel
python3 -m pip install -e '.[dev]'
us-cpa --help
Without installing, the repo-local wrapper works directly:
skills/us-cpa/scripts/us-cpa --help
OpenClaw installation
To install the skill for OpenClaw itself, copy the repo skill into the workspace skill directory and install its Python dependencies there.
- Sync the repo copy into the workspace:
rsync -a --delete --exclude '.venv' \
~/.openclaw/workspace/projects/stef-openclaw-skills/skills/us-cpa/ \
~/.openclaw/workspace/skills/us-cpa/
- Create a workspace-local virtualenv and install the package:
cd ~/.openclaw/workspace/skills/us-cpa
python3 -m venv .venv
. .venv/bin/activate
python3 -m ensurepip --upgrade
python3 -m pip install --upgrade pip setuptools wheel
python3 -m pip install -e '.[dev]'
- Verify the installed workspace wrapper:
~/.openclaw/workspace/skills/us-cpa/scripts/us-cpa --help
The wrapper prefers .venv/bin/python inside the skill directory when present, so OpenClaw can run the workspace copy without relying on global Python packages.
Keep the --exclude '.venv' flag on future syncs, otherwise rsync --delete will remove the workspace virtualenv.
Current Milestone
Current implementation now includes:
- deterministic cache layout under
~/.cache/us-cpaby default fetch-yeardownload flow for the bootstrap IRS corpus- source manifest with URL, hash, authority rank, and local path traceability
- primary-law URL building for IRC and Treasury regulation escalation
- case-folder intake, document registration, and machine-usable fact extraction from JSON, text, and PDF inputs
- question workflow with conversation and memo output
- prepare workflow for the current supported multi-form 1040 package
- review workflow with findings-first output
- fillable-PDF first rendering with overlay fallback
- e-file-ready draft export payload generation
CLI Surface
skills/us-cpa/scripts/us-cpa question --question "What is the standard deduction?" --tax-year 2025
skills/us-cpa/scripts/us-cpa question --question "What is the standard deduction?" --tax-year 2025 --style memo --format markdown
skills/us-cpa/scripts/us-cpa prepare --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe
skills/us-cpa/scripts/us-cpa review --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe
skills/us-cpa/scripts/us-cpa fetch-year --tax-year 2025
skills/us-cpa/scripts/us-cpa extract-docs --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe --create-case --case-label "Jane Doe" --facts-json ./facts.json
skills/us-cpa/scripts/us-cpa render-forms --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe
skills/us-cpa/scripts/us-cpa export-efile-ready --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe
Tax-Year Cache
Default cache root:
~/.cache/us-cpa
Override for isolated runs:
US_CPA_CACHE_DIR=/tmp/us-cpa-cache skills/us-cpa/scripts/us-cpa fetch-year --tax-year 2025
Current fetch-year bootstrap corpus for tax year 2025 is verified against live IRS irs-prior PDFs for:
- Form 1040
- Schedules 1, 2, 3, A, B, C, D, E, SE, and 8812
- Forms 8949, 4562, 4797, 6251, 8606, 8863, 8889, 8959, 8960, 8995, 8995-A, 5329, 5695, and 1116
- General Form 1040 instructions and selected schedule/form instructions
Current bundled tax-year computation data:
- 2024
- 2025
Other years fetch/source correctly, but deterministic return calculations currently stop with an explicit unsupported-year error until rate tables are added.
Adding a new supported year is a deliberate data-table change in tax_years.py, not an automatic runtime discovery step. That is intentional for tax-engine correctness.
Interaction Model
question- stateless by default
- optional case context
prepare- requires a case directory
- if none exists, OpenClaw should ask whether to create one and where
review- requires a case directory
- can operate on an existing or newly-created review case
Planned Case Layout
<case-dir>/
input/
extracted/
return/
output/
reports/
issues/
sources/
Current implementation writes:
case-manifest.jsonextracted/facts.jsonissues/open-issues.json
Intake Flow
Current extract-docs supports:
--create-case--case-label--facts-json <path>- repeated
--input-file <path>
Behavior:
- creates the full case directory layout when
--create-caseis used - copies input documents into
input/ - stores normalized facts with source metadata in
extracted/facts.json - extracts machine-usable facts from JSON/text/PDF documents where supported
- appends document registry entries to
case-manifest.json - stops with a structured issue and non-zero exit if a new fact conflicts with an existing stored fact
Output Contract
- JSON by default
- markdown available with
--format markdown questionsupports:--style conversation--style memo
questionemits answered analysis outputprepareemits a prepared return package summaryexport-efile-readyemits a draft e-file-ready payloadreviewemits a findings-first review resultfetch-yearemits a downloaded manifest location and source count
Question Engine
Current question implementation:
- loads the cached tax-year corpus
- searches the downloaded IRS corpus for relevant authorities and excerpts
- returns one canonical analysis object with:
- authorities
- excerpts
- confidence / risk
- primary-law escalation only when the IRS corpus is still insufficient
- renders that analysis as:
- conversational output
- memo output
In OpenClaw, the model should answer the user from the returned IRS excerpts when primaryLawRequired is false, rather than merely repeating the CLI summary.
Form Rendering
Current rendering path:
- official IRS PDFs from the cached tax-year corpus
- deterministic field-fill when usable AcroForm fields are present
- overlay rendering onto those official PDFs using
reportlab+pypdfas fallback - artifact manifest written to
output/artifacts.json
Current rendered form support:
- field-fill support for known mapped fillable forms
- overlay generation for the current required-form set resolved by the return model
Current review rule:
- field-filled artifacts are not automatically flagged for review
- overlay-rendered artifacts are marked
reviewRequired: true
Overlay coordinates are currently a fallback heuristic and are not treated as line-perfect authoritative field maps. Overlay output must be visually reviewed before any filing/export handoff.
Preparation Workflow
Current prepare implementation:
- loads case facts from
extracted/facts.json - normalizes them into the current supported federal return model
- preserves source provenance for normalized values
- computes the current supported 1040 package
- resolves required forms across the current supported subset
- writes:
return/normalized-return.jsonoutput/artifacts.jsonreports/prepare-summary.json
Current supported calculation inputs:
filingStatusspouse.fullNamedependentswagestaxableInterestbusinessIncomecapitalGainLossrentalIncomefederalWithholdingitemizedDeductionshsaContributioneducationCreditforeignTaxCreditqualifiedBusinessIncometraditionalIraBasisadditionalMedicareTaxnetInvestmentIncomeTaxalternativeMinimumTaxadditionalTaxPenaltyenergyCreditdepreciationExpensesection1231GainLoss
E-file-ready Export
export-efile-ready writes:
output/efile-ready.json
Current export behavior:
- draft-only
- includes required forms
- includes refund or balance due summary
- includes attachment manifest
- includes unresolved issues
Review Workflow
Current review implementation:
- recomputes the return from current case facts
- compares stored normalized return values to recomputed values
- flags source-fact mismatches for key income fields
- flags likely omitted income when document-extracted facts support an amount the stored return omits
- checks whether required rendered artifacts are present
- flags high-complexity forms for specialist follow-up
- flags overlay-rendered artifacts as requiring human review
- sorts findings by severity
Current render modes:
--style conversation--style memo
Scope Rules
- U.S. federal individual returns only in v1
- official IRS artifacts are the target output for compiled forms
- conflicting facts must stop the workflow for user resolution
Authority Ranking
Current authority classes are ranked to preserve source hierarchy:
- IRS forms
- IRS instructions
- IRS publications
- IRS FAQs
- Internal Revenue Code
- Treasury regulations
- other primary authority
Later research and review flows should consume this ranking rather than inventing their own.