I don't use it myself so to speak, except to fill in some things sometimes like ...

I don't use it myself so to speak, except to fill in some things sometimes like passwords. The LLM is the user. It just uses the primitives it has (these are my paraphrases): scroll_to, expand_viewport, screenshot, select_dom_element, fill_input. This way I can tell it to implement a feature and verify it and it does so in a Google Chrome testing profile. Without the grounding, I've noticed that LLMs often produce "code that should work" but then something else is missing. This way, by the time I see it, the feature works.

I then have to go in and advise it on factoring and things like that, but the functionality itself is present and working.