Home/stef-openclaw-skills

Fork 0

Files

Stefano Fiorini b2bb07fa90 feat: make us-cpa questions retrieval-first

2026-03-15 04:40:57 -05:00

9.0 KiB

Raw Permalink Blame History

us-cpa

us-cpa is a Python CLI plus OpenClaw skill wrapper for U.S. federal individual tax work.

Standalone package usage

From skills/us-cpa/:

python3 -m ensurepip --upgrade
python3 -m pip install --upgrade pip setuptools wheel
python3 -m pip install -e '.[dev]'
us-cpa --help

Without installing, the repo-local wrapper works directly:

skills/us-cpa/scripts/us-cpa --help

OpenClaw installation

To install the skill for OpenClaw itself, copy the repo skill into the workspace skill directory and install its Python dependencies there.

Sync the repo copy into the workspace:

rsync -a --delete --exclude '.venv' \
  ~/.openclaw/workspace/projects/stef-openclaw-skills/skills/us-cpa/ \
  ~/.openclaw/workspace/skills/us-cpa/

Create a workspace-local virtualenv and install the package:

cd ~/.openclaw/workspace/skills/us-cpa
python3 -m venv .venv
. .venv/bin/activate
python3 -m ensurepip --upgrade
python3 -m pip install --upgrade pip setuptools wheel
python3 -m pip install -e '.[dev]'

Verify the installed workspace wrapper:

~/.openclaw/workspace/skills/us-cpa/scripts/us-cpa --help

The wrapper prefers .venv/bin/python inside the skill directory when present, so OpenClaw can run the workspace copy without relying on global Python packages.

Keep the --exclude '.venv' flag on future syncs, otherwise rsync --delete will remove the workspace virtualenv.

Current Milestone

Current implementation now includes:

deterministic cache layout under ~/.cache/us-cpa by default
fetch-year download flow for the bootstrap IRS corpus
source manifest with URL, hash, authority rank, and local path traceability
primary-law URL building for IRC and Treasury regulation escalation
case-folder intake, document registration, and machine-usable fact extraction from JSON, text, and PDF inputs
question workflow with conversation and memo output
prepare workflow for the current supported multi-form 1040 package
review workflow with findings-first output
fillable-PDF first rendering with overlay fallback
e-file-ready draft export payload generation

CLI Surface

skills/us-cpa/scripts/us-cpa question --question "What is the standard deduction?" --tax-year 2025
skills/us-cpa/scripts/us-cpa question --question "What is the standard deduction?" --tax-year 2025 --style memo --format markdown
skills/us-cpa/scripts/us-cpa prepare --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe
skills/us-cpa/scripts/us-cpa review --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe
skills/us-cpa/scripts/us-cpa fetch-year --tax-year 2025
skills/us-cpa/scripts/us-cpa extract-docs --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe --create-case --case-label "Jane Doe" --facts-json ./facts.json
skills/us-cpa/scripts/us-cpa render-forms --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe
skills/us-cpa/scripts/us-cpa export-efile-ready --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe

Tax-Year Cache

Default cache root:

~/.cache/us-cpa

Override for isolated runs:

US_CPA_CACHE_DIR=/tmp/us-cpa-cache skills/us-cpa/scripts/us-cpa fetch-year --tax-year 2025

Current fetch-year bootstrap corpus for tax year 2025 is verified against live IRS irs-prior PDFs for:

Form 1040
Schedules 1, 2, 3, A, B, C, D, E, SE, and 8812
Forms 8949, 4562, 4797, 6251, 8606, 8863, 8889, 8959, 8960, 8995, 8995-A, 5329, 5695, and 1116
General Form 1040 instructions and selected schedule/form instructions

Current bundled tax-year computation data:

2024
2025

Other years fetch/source correctly, but deterministic return calculations currently stop with an explicit unsupported-year error until rate tables are added.

Adding a new supported year is a deliberate data-table change in tax_years.py, not an automatic runtime discovery step. That is intentional for tax-engine correctness.

Interaction Model

question
- stateless by default
- optional case context
prepare
- requires a case directory
- if none exists, OpenClaw should ask whether to create one and where
review
- requires a case directory
- can operate on an existing or newly-created review case

Planned Case Layout

<case-dir>/
  input/
  extracted/
  return/
  output/
  reports/
  issues/
  sources/

Current implementation writes:

case-manifest.json
extracted/facts.json
issues/open-issues.json

Intake Flow

Current extract-docs supports:

--create-case
--case-label
--facts-json <path>
repeated --input-file <path>

Behavior:

creates the full case directory layout when --create-case is used
copies input documents into input/
stores normalized facts with source metadata in extracted/facts.json
extracts machine-usable facts from JSON/text/PDF documents where supported
appends document registry entries to case-manifest.json
stops with a structured issue and non-zero exit if a new fact conflicts with an existing stored fact

Output Contract

JSON by default
markdown available with --format markdown
question supports:
- --style conversation
- --style memo
question emits answered analysis output
prepare emits a prepared return package summary
export-efile-ready emits a draft e-file-ready payload
review emits a findings-first review result
fetch-year emits a downloaded manifest location and source count

Question Engine

Current question implementation:

loads the cached tax-year corpus
searches the downloaded IRS corpus for relevant authorities and excerpts
returns one canonical analysis object with:
- authorities
- excerpts
- confidence / risk
- primary-law escalation only when the IRS corpus is still insufficient
renders that analysis as:
- conversational output
- memo output

In OpenClaw, the model should answer the user from the returned IRS excerpts when primaryLawRequired is false, rather than merely repeating the CLI summary.

Form Rendering

Current rendering path:

official IRS PDFs from the cached tax-year corpus
deterministic field-fill when usable AcroForm fields are present
overlay rendering onto those official PDFs using reportlab + pypdf as fallback
artifact manifest written to output/artifacts.json

Current rendered form support:

field-fill support for known mapped fillable forms
overlay generation for the current required-form set resolved by the return model

Current review rule:

field-filled artifacts are not automatically flagged for review
overlay-rendered artifacts are marked reviewRequired: true

Overlay coordinates are currently a fallback heuristic and are not treated as line-perfect authoritative field maps. Overlay output must be visually reviewed before any filing/export handoff.

Preparation Workflow

Current prepare implementation:

loads case facts from extracted/facts.json
normalizes them into the current supported federal return model
preserves source provenance for normalized values
computes the current supported 1040 package
resolves required forms across the current supported subset
writes:
- return/normalized-return.json
- output/artifacts.json
- reports/prepare-summary.json

Current supported calculation inputs:

filingStatus
spouse.fullName
dependents
wages
taxableInterest
businessIncome
capitalGainLoss
rentalIncome
federalWithholding
itemizedDeductions
hsaContribution
educationCredit
foreignTaxCredit
qualifiedBusinessIncome
traditionalIraBasis
additionalMedicareTax
netInvestmentIncomeTax
alternativeMinimumTax
additionalTaxPenalty
energyCredit
depreciationExpense
section1231GainLoss

E-file-ready Export

export-efile-ready writes:

output/efile-ready.json

Current export behavior:

draft-only
includes required forms
includes refund or balance due summary
includes attachment manifest
includes unresolved issues

Review Workflow

Current review implementation:

recomputes the return from current case facts
compares stored normalized return values to recomputed values
flags source-fact mismatches for key income fields
flags likely omitted income when document-extracted facts support an amount the stored return omits
checks whether required rendered artifacts are present
flags high-complexity forms for specialist follow-up
flags overlay-rendered artifacts as requiring human review
sorts findings by severity

Current render modes:

--style conversation
--style memo

Scope Rules

U.S. federal individual returns only in v1
official IRS artifacts are the target output for compiled forms
conflicting facts must stop the workflow for user resolution

Authority Ranking

Current authority classes are ranked to preserve source hierarchy:

IRS forms
IRS instructions
IRS publications
IRS FAQs
Internal Revenue Code
Treasury regulations
other primary authority

Later research and review flows should consume this ranking rather than inventing their own.

9.0 KiB Raw Permalink Blame History