This one is a docs-only PR and that's the whole point. Before I start wiring up OCR, Xero, the compliance engine, the dashboard, the search index, or the owner portal, I want the target shape of each of those things written down somewhere I can point at from the code. "Write the README first" at feature scale.
What landed
A stack of nine planning documents under docs/:
| File | Scope |
|---|---|
CLAUDE.md | Top-level agent instructions: project map, commands, conventions |
docs/plan.md | Phase breakdown (reorg → OCR → Xero → compliance → portal) |
docs/todo.md | Rolling punch-list |
docs/document-structure.md | The canonical folder layout the reorg engine targets |
docs/feature-reorganize.md | YAML mapping rules, dedup strategy, audit logging |
docs/feature-ocr.md | Per-category prompts, validation-based confidence, Decimal amounts |
docs/feature-xero.md | OAuth 2.0 PKCE, vendor alias resolution, category routing |
docs/feature-compliance.md | FL 718 compliance tracker: deadlines, actions, status events |
docs/feature-dashboard.md | Portal landing page data shape |
docs/feature-file-renaming.md | Canonical filename pattern per category |
docs/feature-search-index.md | Full-text + field-level search over sidecar JSON |
Each feature doc has the same three sections: Why (what problem this solves), How (the design, not the implementation), and Open questions (things I need to decide before writing code).
Why docs-first for this project
I'm going to be handing a lot of the implementation to an AI agent. The more of the design decisions I bake into prose before that agent touches the code, the less I'm debating architecture in pull-request comments and the more I'm reviewing whether the code matches the written spec. Every feature doc is also a test the agent can grade itself against: if the doc says "Decimal amounts, never float," the code either does that or it doesn't.
A secondary benefit: when I look at this in six months and ask "wait, why is OCR using validation-based confidence instead of asking Claude for a score?", the answer is in docs/feature-ocr.md under a heading called "Why validation-based confidence." Future-me doesn't have to reverse-engineer the reasoning from the diff.
What this unblocks
OCR, Xero, and compliance can all land as separate PRs now that they each have a target to aim at. PR #3 is the OCR pipeline, and it drops later today.