feat: add us-cpa tax-year source corpus

This commit is contained in:
Stefano Fiorini
2026-03-15 00:53:18 -05:00
parent 291b729894
commit 0c2e34f2f0
5 changed files with 328 additions and 3 deletions

View File

@@ -4,7 +4,14 @@
## Current Milestone
Milestone 1 provides the initial package, CLI surface, skill wrapper, and test harness. Tax logic, IRS corpus download, case workflows, rendering, and review logic are not implemented yet.
Milestone 2 now adds the first tax-year corpus layer:
- deterministic cache layout under `~/.cache/us-cpa` by default
- `fetch-year` download flow for the bootstrap IRS corpus
- source manifest with URL, hash, authority rank, and local path traceability
- authority ranking hooks for IRS materials and future primary-law escalation
Tax logic, case workflows, rendering, and review logic are still pending.
## CLI Surface
@@ -18,6 +25,27 @@ skills/us-cpa/scripts/us-cpa render-forms --tax-year 2025 --case-dir ~/tax-cases
skills/us-cpa/scripts/us-cpa export-efile-ready --tax-year 2025 --case-dir ~/tax-cases/2025-jane-doe
```
## Tax-Year Cache
Default cache root:
```text
~/.cache/us-cpa
```
Override for isolated runs:
```bash
US_CPA_CACHE_DIR=/tmp/us-cpa-cache skills/us-cpa/scripts/us-cpa fetch-year --tax-year 2025
```
Current `fetch-year` bootstrap corpus for tax year `2025` is verified against live IRS `irs-prior` PDFs for:
- Form 1040
- Schedules 1, 2, 3, A, B, C, D, SE, and 8812
- Form 8949
- General Form 1040 instructions and selected schedule/form instructions
## Interaction Model
- `question`
@@ -47,10 +75,25 @@ skills/us-cpa/scripts/us-cpa export-efile-ready --tax-year 2025 --case-dir ~/tax
- JSON by default
- markdown available with `--format markdown`
- current milestone responses are scaffold payloads with `status: "not_implemented"`
- `question`, `prepare`, `review`, `extract-docs`, `render-forms`, and `export-efile-ready` still emit scaffold payloads with `status: "not_implemented"`
- `fetch-year` emits a downloaded manifest location and source count
## Scope Rules
- U.S. federal individual returns only in v1
- official IRS artifacts are the target output for compiled forms
- conflicting facts must stop the workflow for user resolution
## Authority Ranking
Current authority classes are ranked to preserve source hierarchy:
- IRS forms
- IRS instructions
- IRS publications
- IRS FAQs
- Internal Revenue Code
- Treasury regulations
- other primary authority
Later research and review flows should consume this ranking rather than inventing their own.