Project-specific evidence can justify skipping review of AI-generated code
On my latest personal project I've been skipping code review on some AI-generated files.
Every feature has automated tests that I've reviewed myself; I don't review the implementation. In 250 implementations, there was only one regression that review could have caught. Even the pessimistic end of a 95% confidence interval puts it at a 2.2% error rate, or 1-in-45. That's still low enough that fixing the occasional miss costs less than reviewing every change.
This confidence is specific to one project, one model, one person's spec-writing habits, and typical changes within the project, and breaks down when any of those change. When I make atypical, untestable, or high-risk changes, I still review the code.
If you're deciding whether to skip code review, gather your own numbers.
Appendix: Related work
Software factories: StrongDM, hackernoon