fix: expand us-cpa extraction review and rendering
This commit is contained in:
@@ -24,11 +24,12 @@ Current implementation now includes:
|
||||
- deterministic cache layout under `~/.cache/us-cpa` by default
|
||||
- `fetch-year` download flow for the bootstrap IRS corpus
|
||||
- source manifest with URL, hash, authority rank, and local path traceability
|
||||
- authority ranking hooks for IRS materials and future primary-law escalation
|
||||
- case-folder intake and conflict-stop handling
|
||||
- primary-law URL building for IRC and Treasury regulation escalation
|
||||
- case-folder intake, document registration, and machine-usable fact extraction from JSON, text, and PDF inputs
|
||||
- question workflow with conversation and memo output
|
||||
- prepare workflow for the current supported 1040 subset
|
||||
- prepare workflow for the current supported multi-form 1040 package
|
||||
- review workflow with findings-first output
|
||||
- fillable-PDF first rendering with overlay fallback
|
||||
- e-file-ready draft export payload generation
|
||||
|
||||
## CLI Surface
|
||||
@@ -61,10 +62,17 @@ US_CPA_CACHE_DIR=/tmp/us-cpa-cache skills/us-cpa/scripts/us-cpa fetch-year --tax
|
||||
Current `fetch-year` bootstrap corpus for tax year `2025` is verified against live IRS `irs-prior` PDFs for:
|
||||
|
||||
- Form 1040
|
||||
- Schedules 1, 2, 3, A, B, C, D, SE, and 8812
|
||||
- Form 8949
|
||||
- Schedules 1, 2, 3, A, B, C, D, E, SE, and 8812
|
||||
- Forms 8949, 4562, 4797, 6251, 8606, 8863, 8889, 8959, 8960, 8995, 8995-A, 5329, 5695, and 1116
|
||||
- General Form 1040 instructions and selected schedule/form instructions
|
||||
|
||||
Current bundled tax-year computation data:
|
||||
|
||||
- 2024
|
||||
- 2025
|
||||
|
||||
Other years fetch/source correctly, but deterministic return calculations currently stop with an explicit unsupported-year error until rate tables are added.
|
||||
|
||||
## Interaction Model
|
||||
|
||||
- `question`
|
||||
@@ -109,7 +117,8 @@ Behavior:
|
||||
|
||||
- creates the full case directory layout when `--create-case` is used
|
||||
- copies input documents into `input/`
|
||||
- stores normalized user-statement facts in `extracted/facts.json`
|
||||
- stores normalized facts with source metadata in `extracted/facts.json`
|
||||
- extracts machine-usable facts from JSON/text/PDF documents where supported
|
||||
- appends document registry entries to `case-manifest.json`
|
||||
- stops with a structured issue and non-zero exit if a new fact conflicts with an existing stored fact
|
||||
|
||||
@@ -142,21 +151,26 @@ Current implemented topics:
|
||||
|
||||
- standard deduction
|
||||
- Schedule C / sole proprietorship reporting trigger
|
||||
- Schedule D / capital gains reporting trigger
|
||||
- Schedule E / rental income reporting trigger
|
||||
|
||||
## Form Rendering
|
||||
|
||||
Current rendering path:
|
||||
|
||||
- official IRS PDFs from the cached tax-year corpus
|
||||
- overlay rendering onto those official PDFs using `reportlab` + `pypdf`
|
||||
- deterministic field-fill when usable AcroForm fields are present
|
||||
- overlay rendering onto those official PDFs using `reportlab` + `pypdf` as fallback
|
||||
- artifact manifest written to `output/artifacts.json`
|
||||
|
||||
Current rendered form support:
|
||||
|
||||
- Form 1040 overlay artifact generation
|
||||
- field-fill support for known mapped fillable forms
|
||||
- overlay generation for the current required-form set resolved by the return model
|
||||
|
||||
Current review rule:
|
||||
|
||||
- field-filled artifacts are not automatically flagged for review
|
||||
- overlay-rendered artifacts are marked `reviewRequired: true`
|
||||
|
||||
## Preparation Workflow
|
||||
@@ -164,9 +178,10 @@ Current review rule:
|
||||
Current `prepare` implementation:
|
||||
|
||||
- loads case facts from `extracted/facts.json`
|
||||
- normalizes them into the current supported 2025 federal return model
|
||||
- computes the current supported 1040 subset
|
||||
- resolves required forms for the current supported subset
|
||||
- normalizes them into the current supported federal return model
|
||||
- preserves source provenance for normalized values
|
||||
- computes the current supported 1040 package
|
||||
- resolves required forms across the current supported subset
|
||||
- writes:
|
||||
- `return/normalized-return.json`
|
||||
- `output/artifacts.json`
|
||||
@@ -175,10 +190,27 @@ Current `prepare` implementation:
|
||||
Current supported calculation inputs:
|
||||
|
||||
- `filingStatus`
|
||||
- `spouse.fullName`
|
||||
- `dependents`
|
||||
- `wages`
|
||||
- `taxableInterest`
|
||||
- `businessIncome`
|
||||
- `capitalGainLoss`
|
||||
- `rentalIncome`
|
||||
- `federalWithholding`
|
||||
- `itemizedDeductions`
|
||||
- `hsaContribution`
|
||||
- `educationCredit`
|
||||
- `foreignTaxCredit`
|
||||
- `qualifiedBusinessIncome`
|
||||
- `traditionalIraBasis`
|
||||
- `additionalMedicareTax`
|
||||
- `netInvestmentIncomeTax`
|
||||
- `alternativeMinimumTax`
|
||||
- `additionalTaxPenalty`
|
||||
- `energyCredit`
|
||||
- `depreciationExpense`
|
||||
- `section1231GainLoss`
|
||||
|
||||
## E-file-ready Export
|
||||
|
||||
@@ -200,7 +232,10 @@ Current `review` implementation:
|
||||
|
||||
- recomputes the return from current case facts
|
||||
- compares stored normalized return values to recomputed values
|
||||
- flags source-fact mismatches for key income fields
|
||||
- flags likely omitted income when document-extracted facts support an amount the stored return omits
|
||||
- checks whether required rendered artifacts are present
|
||||
- flags high-complexity forms for specialist follow-up
|
||||
- flags overlay-rendered artifacts as requiring human review
|
||||
- sorts findings by severity
|
||||
|
||||
|
||||
Reference in New Issue
Block a user