Looking at the prompts op has shared, I'd recommend more aggressively managing/t...

devingould · 2025-09-26T14:59:18 1758898758

I pretty much never clear my context window unless I'm switching to entirely different work, seems to work fine with copilot summarizing the convo every once in a while. I'm probably at 95% code written by an llm.

I actually think it works better that way, the agent doesn't have to spend as much time rereading code it had previously just read. I do have several "agents" like you mention, but I just use them one by one in the same chat so they share context. They all write to markdown in case I do want to start fresh if things do go the wrong direction, but that doesn't happen very often.

loudmax · 2025-09-26T15:50:57 1758901857

I wouldn't take it for granted that Claude isn't re-reading your entire context each time it runs.

When you run llama.cpp on your home computer, it holds onto the key-value cache from previous runs in memory. Presumably Claude does something analogous, though on a much larger scale. Maybe Claude holds onto that key-value cache indefinitely, but my naive expectation would be that it only holds onto it for however long it expects you to keep the context going. If you walk away from your computer and resume the context the next day, I'd expect Claude to re-read your entire context all over again.

At best, you're getting some performance benefit keeping this context going, but you are subjecting yourself to context rot.

Someone familiar with running Claude or industrial-strength SOTA models might have more insight.

kookamamie · 2025-09-26T16:43:59 1758905039

CC absolutely does not read the context again during each run. For example, if you ask it to do something, then revert its changes, it will think the changes are still there leading to bad times.

Terretta · 2025-09-28T15:48:51 1759074531

It wouldn't re-read the context, it would cache tokens thus far which is like photographically remembering the context instead of re-reading it, until you see it "compress" context when it gives itself a prompt to recap so far:

https://www.anthropic.com/news/prompt-caching

catlifeonmars · 2025-09-27T03:14:16 1758942856

When you say “revert its changes” do you mean undo the changes outside of CC? Does CC watch the filesystem?

kookamamie · 2025-09-27T05:53:50 1758952430

Yes, reverting outside. This can happen often when one is not happy with CC's output - Esc + revert.

dboreham · 2025-09-27T06:52:20 1758955940

You can tell it that you manually reverted the changes.

That said, the fact that we're all curating these random bits of "llm whisperer" lore is...concerning. The product is at the same time amazingly good and terribly bad.

kookamamie · 2025-09-27T11:50:55 1758973855

I know. Typically I'd let CC know with "I reverted these changes."

catlifeonmars · 2025-09-27T03:12:08 1758942728

This article on OpenAI prompt caching was interesting https://platform.openai.com/docs/guides/prompt-caching

As someone who definitely doesn’t know what they’re talking about, I’m going to guess that some analogous optimizations might apply to Claude.

Something something… TPU slice cache locality… guestures vaguely

Terretta · 2025-09-28T15:49:09 1759074549

Not vague:

https://www.anthropic.com/news/prompt-caching

madduci · 2025-09-26T19:27:45 1758914865

I have tested today a mix of cleaning often the context with long contexes and Copilot with Claude ended producing good visual results, but the generated CSS was extremely messy.

lukaslalinsky · 2025-09-26T13:07:26 1758892046

Even better approach, in my experience is to ask CC to do research, then plan work, then let it implement step 1, then double escape, move back to the plan, tell it that step 1 was done and continue with step 2.

nadis · 2025-09-26T14:21:19 1758896479

This is a really interesting approach as well, and one I'll need to try! Previously, I've almost never moved back to the plan using double escape unless things go wrong. This is a clever way to better use that functionality. Thanks for sharing!

righthand · 2025-09-26T13:08:36 1758892116

You just convinced me that Llms are a pay-to-play management sim.

bubblyworld · 2025-09-26T13:19:56 1758892796

Heh, there are at least as many different ways to use LLMs as there are pithy comments disparaging people who do.

righthand · 2025-09-26T13:35:01 1758893701

I’m not disparaging it, just actualizing it and sharing that thought. If you don’t understand that most modern “tools” and “services” are gamified, then yes I suppose I seem like a huge jerk.

The author literally talks about managing a team of multiple agents and Llm services requiring purchase of “tokens” is similar to popping a token into an arcade machine.

dgfitz · 2025-09-26T18:20:15 1758910815

I read a quote on here somewhere:

"Hacker culture never took root in the AI gold rush because the LLM 'coders' saw themselves not as hackers and explorers, but as temporarily understaffed middle-managers"

righthand · 2025-09-26T20:55:43 1758920143

Interesting and agreeable.

Also hacking really doesn’t have anything to do with generating poorly structured documents that compile into some sort of visual mess that needs fixing. Hacking is the analysis and circumvention of systems. Sometimes when hacking we piece together some shitty code to accomplish a circumvention task, but rarely is the code representative of the entire hack. Llms just make steps of a hack quicker to complete. At a steep cost.

hiAndrewQuinn · 2025-09-26T14:18:26 1758896306

Electricity prices are also token-based, in a sense, yet most people broadly agree this is the best way to price them.

righthand · 2025-09-26T15:01:12 1758898872

I’m not following your point. What is the disagreement about Llm pricing?

baq · 2025-09-27T11:11:03 1758971463

Scratch the sim and you’re golden. The difference is it’s agent management, it isn’t people management.

righthand · 2025-09-28T13:32:14 1759066334

Agents simulating people’s actions.

R0m41nJosh · 2025-09-27T10:44:09 1758969849

I have been reluctant to use AI as a coding assistant though I have installed claude code and bought a bunch of credits. When I see comments like this I genuinely asking what's the point. Are you sure that going through all of these manipulation instead of directly editing the source code makes you more productive? In which way?

Not trolling, true question.

lukaslalinsky · 2025-09-27T15:27:29 1758986849

Years ago, I was joking with my colleagues that I'm living two weeks ahead, writing the present day code is a chore, thinking about the next problems is more important, so that when the time comes to implement them, I know how. I don't have much time to code these days, but I still have the ability to think. Instead of doing the chore myself, I now delegate it to Claude Code. I still do coding occasionally, usually when it's something hard that I know AI will mess up, but in those instances, I enjoy it.

yomismoaqui · 2025-09-27T10:53:06 1758970386

Time is the answer here, if this dance is 2 hours and implementing it by hand is 8 hours you have won.

Also while Claude Code is crunching floats you can do other things (maybe direct another agent instance)

giancarlostoro · 2025-09-26T14:23:07 1758896587

> Looking at the prompts op has shared, I'd recommend more aggressively managing/trimming the context. In general you don't give the agent a new task without /clearing the context before. This will enable the agent to be more focused on the new task, and decrease its bias (if eg. reviewing changes it has made previously).

My workflow for any IDE, including Visual Studio 2022 w/ CoPilot, JetBrains AI, and now Zed w/ Claude Code baked in is to start a new convo altogether when I'm doing something different, or changing up my initial instructions. It works way better. People are used to keeping a window until the model loses its mind on apps like ChatGPT, but for code, the context Window gets packed a lot sooner (remember the tools are sending some code over too), so you need to start over or it starts getting confused much sooner.

nadis · 2025-09-26T22:19:27 1758925167

I've been meaning to try Zed but haven't gotten into it yet; it felt hard to justify switching IDEs when I just got into a working flow with VS Code + Claude Code CLI. How are you finding it? I'm assuming positive if that's your core IDE now but would love to hear more about the experience you've had so far.

lukaslalinsky · 2025-09-27T15:36:29 1758987389

If you are Claude Code user, you will likely not enjoy the version integrated into Zed. Many things are missing, for example, no slash commands. I use Zed, but still run Claude Code in the terminal. As an editor, Zed is excellent, especially as a Vim replacement.

nadis · 2025-09-28T02:09:47 1759025387

Oh interesting, that’s good to know. Thank you. I might try that combination - I like using the Claude Code CLI so hopefully less of a painful transition.

giancarlostoro · 2025-09-27T23:18:44 1759015124

I guess I am a weirdo because I never used Claude Code until Zed added it. At least Zed has a built in terminal emulator to boot.

nadis · 2025-09-26T14:12:34 1758895954

OP here, this is great advice. Thanks for sharing. Clearing context more often between tasks is something I've started to do more recently, although definitely still a WIP to remember to do so. I haven't had a lot of success with the .md files leading to better results yet, but have only experimented with them occasionally. Could be a prompting issue though, and I like the structure you suggested. Looking forward to trying!

I didn't mention it in the blog post but actually experimented a bit with using Claude Code to create specialized agents such as an expert-in-Figma-and-frontend "Design Engineer", but in general found the results worse than just using Claude Code as-is. This also could be a prompting issue though and it was my first attempt at creating my own agents, so likely a lot of room to learn and improve.

enraged_camel · 2025-09-26T16:39:28 1758904768

This is overkill. I know because I'm on the opposite end of the spectrum: each of my chat sessions goes on for days. The main reason I start over is because Cursor slows down and starts to stutter after a while, which gets annoying.

nadis · 2025-09-26T22:42:13 1758926533

Claude auto-condenses context, which is both good/bad. Good in that it doesn't usually get super slow, bad in that sometimes it does this in the middle of a todo and then ends up (I suspect) producing something less on-task as a result.

philipp-gayret · 2025-09-26T17:06:49 1758906409

Since reading I can --continue I do the same. If I find it's missing something after compressing context I'll just make it re-read a plan or CLAUDE.md

nadis · 2025-09-28T02:11:16 1759025476

That’s a solid approach and one I hadn’t thought of myself. Thank you!

fizx · 2025-09-26T18:24:22 1758911062

Cursor when not in "MAX" mode does its own silent context pruning in the background.

antihero · 2025-09-26T22:07:23 1758924443

This sounds like more effort than just writing the code.

dotancohen · 2025-09-27T09:10:11 1758964211

Usually, managing a development team is more work than just writing the code oneself. However, managing a development team (even if that team consists of a single LLM and yourself) means that more work can be done in a shorter period of time. It also provides much better structure for ensuring that tests are written, and that documentation is written if that is important. And in my experience, though not everybody's experience I understand, it helps ensure a clean, useful git history.

Jweb_Guru · 2025-09-27T07:01:07 1758956467

It is.

jimbo808 · 2025-09-26T14:43:09 1758897789

This seems like a bit of overkill for most tasks, from my experience.

dingnuts · 2025-09-26T15:41:47 1758901307

it just seems like a lot of work when you could just write the code yourself, just a lot less typing to go ahead and make the edits you want instead of trying to guide the autocorrect to eventually predict what you want from guidelines you also have to generate to save time

like I'm sorry but when I see how much work the advocates are putting into their prompts the METR paper comes to mind.. you're doing more work than coding the "old fashioned way"

lo5 · 2025-09-26T18:46:46 1758912406

it depends on the codebase.

if there's adequate test coverage, and the tests emit informative failures, coding agents can be used as constraint-solvers to iterate and make changes, provided you stage your prompts properly, much like staging PRs.

claude code is really good at this.

jngiam1 · 2025-09-26T14:27:10 1758896830

I also ask the agent to keep track of what we're working on in a another md file which it save/loads between clears.

raducu · 2025-09-26T15:44:43 1758901483

In 2025, does it make any difference to tel the LLM "you're an expert/experienced engineer?"

remich · 2025-09-26T21:16:48 1758921408

The question isn't whether it makes a difference, the question is whether the model you're working with / the platform you're working with it on already does that. All of the major commercial models have their own system prompts that are quite detailed, and then the interfaces for using the models typically also have their own system prompts (Cursor, Claude Code, Codex, Warp, etc).

It's highly likely that if you're working with one of the commercial models that has been tuned for code tasks, in one of the commercial platforms that is marketed to SWEs, that instructions similar to the effect of "you're an expert/experienced engineer" will already be part of the context window.

antihero · 2025-09-26T22:08:33 1758924513

I think it makes more of writing prompts and leading in a way only an experienced engineer could.

adastra22 · 2025-09-26T16:09:45 1758902985

Yes. The fundamental reason why that works hasn’t changed.

hirako2000 · 2025-09-26T16:16:18 1758903378

How has it ever worked. I have thousands of threads with various LLMs, none have that role play cue, yet the responses always sound authoritative and similar to what one would find in literature written by experts in the field.

What does work is to provide clues for the agent to impersonate a clueless idiot on a subject, or a bad writer. It will at least sound like it in the responses.

Those models have been heavily trained with RLHF, if anything today's LLMs are even more likely to throw authoritative predictions, if not in accuracy, at least in tone.

Fuzzwah · 2025-09-27T00:41:55 1758933715

Every LLM has role play cues in their system prompts:

https://github.com/0xeb/TheBigPromptLibrary/tree/main/System...

pwython · 2025-09-26T21:59:34 1758923974

I also don't tell CC to think like expert engineer, but I do tell it to think like a marketer when it's helping me build out things like landing pages that should be optimized for conversions, not beauty. It'll throw in some good ideas I may miss. Also when I'm hesitant to give something complex to CC, I tell that silly SOB to ultrathink.

adastra22 · 2025-09-27T02:48:59 1758941339

Every LLM you have ever used has this role play baked into its system prompt.

hirako2000 · 2025-09-29T11:10:05 1759144205

And they seem to always pick the adequate role in my cases, similar roles to what was mentioned in the thread.

nadis · 2025-09-29T14:24:14 1759155854

I'm still not sure if specifying a role made a difference or not in terms of performance. Different but similar instance, when I tried to create an agent in Claude to play a specific role (frontend / design engineer expert), I found that this seemed to perform worse vs. just using default Claude, but this is all very anecdotal.

adastra22 · 2025-09-30T01:07:59 1759194479

Maybe it is cargo culting at this point, idk. When I first started experimenting with this, about two generations of models back, the role play prompt made a noticeable difference.

Example: with early Claude (pre-Claude Code) if you asked for a Rust program you’d get something that only resembled Rust syntax but was a mix of different languages. “You are a senior software engineer that develops solely with the Rust programming language” or something like this made it generate syntactically correct Rust.

Similar prompts led to better, more focused tests. I find that such prompts are not as necessary anymore, but anecdotally I’ve still felt a difference.

hirako2000 · 2025-09-26T16:12:37 1758903157

It never made a difference.

1oooqooq · 2025-09-27T10:57:55 1758970675

and it will completely ignoring the instructions because user input cannot afect it, but it will waste even more context space to fool you that it did.

shafyy · 2025-09-26T14:09:31 1758895771

Dude, why not just do it yourself if you have to micromanage the LLM this hardcore?