Project-specific evidence can justify skipping review of AI-generated code

On my latest personal project I've been skipping code review on some AI-generated files.

Every feature has automated tests that I've reviewed myself; I don't review the implementation. In 250 implementations, there was only one regression that review could have caught. Even the pessimistic end of a 95% confidence interval puts it at a 2.2% error rate, or 1-in-45. That's still low enough that fixing the occasional miss costs less than reviewing every change.

This confidence is specific to one project, one model, one person's spec-writing habits, and typical changes within the project, and breaks down when any of those change. When I make atypical, untestable, or high-risk changes, I still review the code.

If you're deciding whether to skip code review, gather your own numbers.

Appendix: Related work

Software factories: StrongDM, hackernoon

Rules for using unreviewed AI-generated code