A thing I really like with Claude Code is how well it uses the bash scripts you give it. I also have a browser control MCP installed and it's pretty good for it to full-cycle around the approach. I have a staging database that it has the passwords to that it logs in and runs queries on. This whole thing means it loops and delivers good results for me.
I'll try this, but the grounding seems crucial for these LLMs to deliver results that are fewer shot than otherwise.
I don't use it myself so to speak, except to fill in some things sometimes like passwords. The LLM is the user. It just uses the primitives it has (these are my paraphrases): scroll_to, expand_viewport, screenshot, select_dom_element, fill_input. This way I can tell it to implement a feature and verify it and it does so in a Google Chrome testing profile. Without the grounding, I've noticed that LLMs often produce "code that should work" but then something else is missing. This way, by the time I see it, the feature works.
I then have to go in and advise it on factoring and things like that, but the functionality itself is present and working.
I'll try this, but the grounding seems crucial for these LLMs to deliver results that are fewer shot than otherwise.