Using screenshots as a specification for AI code regeneration

2026-04-06

I've been exploring whether screenshot-based snapshot tests can serve as an adequate design spec for AI-generated UI code, and so far it seems viable. Since I treat unreviewed AI-generated code like an untrusted closed-source library, I want to be able to delete it and regenerate something comparable on demand.

I deleted the HTML and CSS for a to-do app, then asked a couple coding agents to regenerate them. I already have browser tests covering behavior, page content, and the background image; the screenshot is only there to specify visual properties like layout and color.

Original to-do app screenshot — Original app

Claude Code regenerated to-do app screenshot — Claude Code's regeneration

Codex regenerated to-do app screenshot — Codex's regeneration

The overall look is very close: the structure is the same, and so is the color palette. There are a few small errors, such as font changes and too much transparency. A few minutes of touch-up would probably bring either version up to par. As a proof of concept, I'd call this a success, though better tooling would still help.

I was concerned that the system would overfit to the design and produce overly verbose CSS, but it did well: in both cases, the regenerated CSS was shorter than the original.

Because verifying a change like this requires manual, subjective review across many attributes, I expect and accept some gradual drift from the original design. Every so often, it will need a touch-up, much like cleaning a kitchen or repainting a wall.