Rules for using unreviewed AI-generated code
I've been using two rules for working with unreviewed AI-generated code. For personal projects, these rules strike a balance between undisciplined vibe coding and over-review. These rules treat such code like an untrusted closed-source dependency.
Rule 1: Only depend on verified behavior
It isn't feasible to reliably detect every unexpected side effect, so I expect undefined behavior from both ambiguous requirements and AI mistakes. I treat such behavior, both helpful and harmful, as ephemeral.
One way to test this rule is to occasionally regenerate the code from the spec instead of continuing to build incrementally. If something breaks, I was depending on behavior I hadn't verified well enough.
I'm okay with some ambiguity, and the leaky abstractions that come with it, when it lets me move on to more important things. I tighten the specification as usage reveals requirements.
For now, this only works for personal-use projects, where the stakes are low and the user can be relied on not to depend on every observable behavior.
Rule 2: Assume the code is unreadable
I don't know how to evaluate understandability without reading the code, so I expect unreviewed code to get harder to understand over time.
If I ever need to review part of it, I assume I'll be better off rewriting it from scratch. To prepare for that possibility, I keep a clear boundary between reviewed and unreviewed code by confining generated code to specific files and interfaces. That limits the blast radius when I need to eject a file.
I also lean more heavily on black-box testing. I've been writing far more browser tests and less file-scoped unit tests.
Taken together, these rules let me keep some of the speed of skipping review while still ending up with a project I can maintain fifty prompts later, or a month later.