Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In my experience, "memory" is really not that helpful in most cases. For all of my projects, I keep the documentation files and feature specs up to date, so that LLMs are always aware of where to find what and which coding style guides the project is based on.

Maintaining the memory is a considerable burden, and make sure that simple "fix this linting" doesn't end up in the memory, as we always fix that type of issue in that particular way. That's also the major problem I have with ChatGPT's memory: it starts to respond from the perspective of "this is correct for this person".

I am curious who sees the benefits of the memory in coding? Is it like "learns how to code better" or it learns "how the project is structured". Either way, to me, this sounds like an easy project setup thing.





I think it cuts both ways - for example I've definitely had the experience where when typing into ChatGPT I know ahead of time that whatever "memory" they're storing and injecting is likely going to degrade my answer, so I hop over to incognito mode. I've also had the experience where I've had a loosely related follow-up question to something and I didn't want to dig through my chat history to find the exact convo, so it's nice to know that ChatGPT will probably pull the relevant details into context.

I think similar concepts apply to coding - in some cases, you have all the context you need up front (good coding practices help with this), but in many cases, there's a lot of "tribal knowledge" scattered across various repos that a human vet working in the org would certainly know, but an agent wouldn't (of course, there's somewhat of a circular argument here that if the agent eventually learned this tribal knowledge, it could just write it down into a CLAUDE.md file ;)). I think there's also a clear separation between procedural knowledge and learned preferences, the former is probably better represented as skills committed to a repo, vs I view the latter more as a "system prompt learning" problem.


ChatGPTs implementation of Memory is terrible. It quickly fills up with useless garbage and sometimes even plain incorrect statements, that are usually only relevant to one obscure conversation I had with it months ago.

A local, project specific llm.md is absolutely something I require though. Without that, language models kept on "fixing" random things in my code that it considered to be incorrect, despite comments on those lines literally telling it to NOT CHANGE THIS LINE OR THIS COMMENT.

My llm.md is structured like this:

- Instructions for the LLM on how to use it

- Examples of a bad and a good note

- LLM editable notes on quirks in the project

It helps a lot with making an LLM understand when things are unusual for a reason.

Besides that file, I wrap every prompt in a project specific intro and outro. I use these to take care of common undesirable LLM behavior, like removing my comments.

I also tell it to use a specific format on its own comments, so I can make it automatically clean those up on the next pass, which takes care of most of the aftercare.


I'm curious - how do you currently manage this `llm.md` in the tooling you use? E.g., do you symlink `AGENTS/CLAUDE.md` to `llm.md`? Also, is there any information you duplicate across your project-specific `llm.md` files that could potentially be shared globally?

The way I work with LLMs is a bit different.

I use a custom tool, that basically merges all my code into a single prompt. Most of my projects are relatively small, usually maxing out at 200k tokens, so I can just dump the whole thing into Gemini Pro for every feature set I am working on. It's a more manual way of working, but it ensures full control over the code changes.

For new projects I usually just copy the llm.md from the tool itself and strip out the custom part. I might add creating it as a feature of the tool in the future.

A few days ago I tried to use AntiGravity (on default settings) and that was an awful experience. Slow, pondering, continuously making dumb mistakes, only responding to feedback with code and it took at least 3 hours (and a lot of hand holding) to end up on a broken version of what I wanted.

I gave up, tried again using my own tool and was done in half an hour. Not sure if it will work as well for other people, but it definitely does for me.


I think the problem with ChatGPT / other RAG-based memory solutions is that it's not possible to collaborate with the agent on what it's memory should look like - so it makes sense that its much easier to just have a stateless system and message queue, to avoid mysterious pollution. But Letta's memory management is primarily text/files based so very transparent and controllable.

An example of how this kind of memory can help is learned skills https://www.letta.com/blog/skill-learning - if your agent takes the time to reflect/learn from experience and create a skill, that skills is much more effective at making it better next time than just putting the raw trajectory into context.


context poisoning is a real problem that these memory providers only make worse.

IMO context poisoning is only fatal when you can't see what's going on (eg black box memory systems like ChatGPT memory). The memory system used in the OP is fully white box - you can see every raw LLM request (and see exactly how the memory influenced the final prompt payload).

That's significant, you can improve it in your own environment then.

Yeah exactly - it's all just tokens that you have full control over (you can run CRUD operations on). No hidden prompts / hidden memory.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: