Using screenshots as a specification for AI code regeneration
I've been exploring whether screenshot-based snapshot tests can serve as an adequate design spec for AI-generated UI code, and so far it seems viable. Since I treat unreviewed AI-generated code like an untrusted closed-source library, I want to be able to delete it and regenerate something comparable on demand.
I deleted the HTML and CSS for a to-do app, then asked a couple coding agents to regenerate them. I already have browser tests covering behavior, page content, and the background image; the screenshot is only there to specify visual properties like layout and color.
The overall look is very close: the structure is the same, and so is the color palette. There are a few small errors, such as font changes and too much transparency. A few minutes of touch-up would probably bring either version up to par. As a proof of concept, I'd call this a success, though better tooling would still help.
I was concerned that the system would overfit to the design and produce overly verbose CSS, but it did well: in both cases, the regenerated CSS was shorter than the original.
Because verifying a change like this requires manual, subjective review across many attributes, I expect and accept some gradual drift from the original design. Every so often, it will need a touch-up, much like cleaning a kitchen or repainting a wall.