Writing the architecture docs before writing the code

2026-04-01

This one is a docs-only PR and that's the whole point. Before I start wiring up OCR, Xero, the compliance engine, the dashboard, the search index, or the owner portal, I want the target shape of each of those things written down somewhere I can point at from the code. "Write the README first" at feature scale.

What landed

A stack of nine planning documents under docs/:

FileScope
CLAUDE.mdTop-level agent instructions: project map, commands, conventions
docs/plan.mdPhase breakdown (reorg → OCR → Xero → compliance → portal)
docs/todo.mdRolling punch-list
docs/document-structure.mdThe canonical folder layout the reorg engine targets
docs/feature-reorganize.mdYAML mapping rules, dedup strategy, audit logging
docs/feature-ocr.mdPer-category prompts, validation-based confidence, Decimal amounts
docs/feature-xero.mdOAuth 2.0 PKCE, vendor alias resolution, category routing
docs/feature-compliance.mdFL 718 compliance tracker: deadlines, actions, status events
docs/feature-dashboard.mdPortal landing page data shape
docs/feature-file-renaming.mdCanonical filename pattern per category
docs/feature-search-index.mdFull-text + field-level search over sidecar JSON

Each feature doc has the same three sections: Why (what problem this solves), How (the design, not the implementation), and Open questions (things I need to decide before writing code).

Why docs-first for this project

I'm going to be handing a lot of the implementation to an AI agent. The more of the design decisions I bake into prose before that agent touches the code, the less I'm debating architecture in pull-request comments and the more I'm reviewing whether the code matches the written spec. Every feature doc is also a test the agent can grade itself against: if the doc says "Decimal amounts, never float," the code either does that or it doesn't.

A secondary benefit: when I look at this in six months and ask "wait, why is OCR using validation-based confidence instead of asking Claude for a score?", the answer is in docs/feature-ocr.md under a heading called "Why validation-based confidence." Future-me doesn't have to reverse-engineer the reasoning from the diff.

What this unblocks

OCR, Xero, and compliance can all land as separate PRs now that they each have a target to aim at. PR #3 is the OCR pipeline, and it drops later today.


PR: https://github.com/StevieIsmagic/honest-cam/pull/2