hckrnws
From the article:
> We recommend keeping task-specific instructions in separate markdown files with self-descriptive names somewhere in your project. Then, in your CLAUDE.md file, you can include a list of these files with a brief description of each, and instruct Claude to decide which (if any) are relevant and to read them before it starts working.
I've been doing this since the early days of agentic coding though I've always personally referred to it as the Table-of-Contents approach to keep the context window relatively streamlined. Here's a snippet of my CLAUDE.md file that demonstrates this approach:
# Documentation References
- When adding CSS, refer to: docs/ADDING_CSS.md
- When adding assets, refer to: docs/ADDING_ASSETS.md
- When working with user data, refer to: docs/STORAGE_MANAGER.md
Full CLAUDE.md file for reference:https://gist.github.com/scpedicini/179626cfb022452bb39eff10b...
There is far much easier way to do this and one that is perfectly aligned with how these tools work.
It is called documenting your code!
Just write what this file is supposed to do in a clear concise way. It acts as a prompt, it provides much needed context specific to the file and it is used only when necessary.
Another tip is to add README.md files where possible and where it helps. What is this folder for? Nobody knows! Write a README.md file. It is not a rocket science.
What people often forget about LLMs is that they are largely trained on public information which means that nothing new needs to be invented.
You don't have to "prompt it just the right way".
What you have to do is to use the same old good best practices.
So how exactly does one "write what this file is supposed to do in a clear concise way" in a way that is quickly comprehensible to AI? The gist of the article is that when your audience changes from "human" to "AI" the manner in which you write documentation changes. The article is fairly high quality, and presents excellent evidence that simply "documenting your code" won't get you as far as the guidelines it provides.
Your comment comes off as if you're dispensing common-sense advice, but I don't think it actually applies here.
For the record I do think the AI community tries to unnecessarily reinvent the wheel on crap all the time.
sure, readme.md is a great place to put content. But there's things I'd put in a readme that I'd never put in a claude.md if we want to squeeze the most out of these models.
Further, claude/agents.md have special quality-of-life mechanics with the coding agent harnesses like e.g. `injecting this file into the context window whenever an agent touches this directory, no matter whether the model wants to read it or not`
> What people often forget about LLMs is that they are largely trained on public information which means that nothing new needs to be invented.
I don't think this is relevant at all - when you're working with coding agents, the more you can finesse and manage every token that goes into your model and how its presented, the better results you can get. And the public data that goes into the models is near useless if you're working in a complex codebase, compared to the results you can get if you invest time into how context is collected and presented to your agent.
This is missing the point. If I want to instruct Claude to never write a database query that doesn't hit a preexisting index, where exactly am I supposed to document that? You can either choose:
1. A centralized location, like a README (congrats, you've just invented CLAUDE.md)
2. You add a docs folder (congrats, you've just done exactly what the author suggests under Progressive Disclosure)
Moreover, you can't just do it all in a README, for the exact reasons that the author lays out under "CLAUDE.md file length & applicability".
CLAUDE.md simply isn't about telling Claude what all the parts of your code are and how they work. You're right, that's what documenting your code is for. But even if you have READMEs everywhere, Claude has no idea where to put code when it starts a new task. If it has to read all your documentation every time it starts a new task, you're needlessly burning tokens. The whole point is to give Claude important information up front so it doesn't have to read all your docs and fill up its context window searching for the right information on every task.
Think of it this way: incredibly well documented code has everything a new engineer needs to get started on a task, yes. But this engineer has amnesia and forgets everything it's learned after every task. Do you want them to have to reonboard from scratch every time? No! You structure your docs in a way so they don't have to start from scratch every time. This is an accommodation: humans don't need this, for the most part, because we don't reonboard to the same codebase over and over. And so yes, you do need to go above and beyond the "same old good best practices".
You put a warning where it is most likely to be seen by a human coder.
Besides, no amount of prompting will prevent this situation.
If it is a concern then you put a linter or unit tests to prevent it altogether, or make a wrapper around the tricky function with some warning in its doc strings.
I don't see how this is any different from how you typically approach making your code more resilient to accidental mistakes.
Documenting for AI exactly like you would document for a human is ignoring how these tools work
You can make affordances for agent abilities without deviating from what humans find to be good documentation. Use hyperlinks, organize information, document in layers, use examples, be concise. It's not either/or unless you're being lazy.
But they are right, claude routinely ignores stuff from CLAUDE.md, even with warning bells etc. You need a linter preventing things. Like drizzle sql` templates: it just loves them.
I think you’re missing that CLAUDE.md is deterministically injected into the model’s context window
This means that instead of behaving like a file the LLM reads, it effectively lets you customize the model’s prompt
I also didn’t write that you have to “prompt it just the right way”, I think you’re missing the point entirely
Probably a lot of people here disagree with this feeling. But my take is that if setting up all the AI infrastructure and onboarding to my code is going to take this amount of effort, then I might as well code the damn thing myself which is what I'm getting paid to (and enjoy doing anyway)
I strongly disagree with the author not using /init. It takes a minute to run and Claude provides surprisingly good results.
Perhaps. But keep in mind that the setup work is typically mostly delegated to LLMs as well.
Whether it's setting up AI infrastructure or configuring Emacs/vim/VSCode, the important distinction to make is if the cost has to be paid continually, or if it's a one time/intermittent cost. If I had to configure my shell/git aliases every time I booted my computer, I wouldn't use them, but seeing as how they're saved in config files, they're pretty heavily customized by this point.
Don't use AI if you don't want to, but "it takes too much effort to set up" is an excuse printf debuggers use to avoid setting up a debugger. Which is a whole other debate though.
I think this could work really well for infrastructure/ops style work where the LLM will not be able to grasp the full context of say the network from just a few files that you have open.
But as others are saying this is just basic documentation that should be done anyway.
I’m sure I’m just working like a caveman, but I simply highlight the relevant code, add it to the chat, and talk to these tools as if they were my colleagues and I’m getting pretty good results.
About 12 to 6 months ago this was not the case (with or without .md files), I was getting mainly subpar result, so I’m assuming that the models have improved a lot.
Basically, I found that they not make that much of a difference, the model is either good enough or not…
I know (or at least I suppose) that these markdown files could bring some marginal improvements, but at this point, I don’t really care.
I assume this is an unpopular take because I see so many people treat these files as if they were black magic or silver bullet that 100x their already 1000x productivity.
> I simply highlight the relevant code, add it to the chat, and talk to these tools
Different use case. I assume the discussion is about having the agent implement whole features or research and fix bugs without much guidance.
Yep it is opinionated for how to get coding agents to solve hard problems in complex brownfield codebases which is what we are focused on at humanlayer :)
Matches my experience also. Bothered only once to setup a proper CLAUDE.md file, and now never do it. Simply refering to the context properly for surgical recommendations and edit works relatively well.
It feels a lot like bikeshedding to me, maybe I’m wrong
How about a list of existing database tables/columns so you don't need to repeat it each time?
Claude code figures that out at startup every time. Never had issues with it.
=== myExperience
I've gotten quite a bit of utility out of my current setup[0]:
Some explicit things I found helpful: Have the agent address you as something specific! This way you know if the agent is paying attention to your detailed instructions.
Rationality, as in the stuff practiced on early Less Wrong, gives a great language for constraining the agent, and since it's read The Sequences and everything else you can include pointers and the more you do the more it will nudge it into that mode of thought.
The explicit "This is what I'm doing, this is what I expect" pattern has been hugely useful for both me monitoring it/coming back to see what it did, and it itself. It makes it more likely to recover when it goes down a bad path.
The system reminder this article mentions is definitely there but I have not noticed it messing much with adherence. I wish there were some sort of power user mode to turn it off though!
Also, this is probably too long! But I have been experimenting and iterating for a while, and this is what is working best currently. Not that I've been able to hold any other part constant -- Opus 4.5 really is remarkable.
[0]: https://gist.github.com/ctoth/d8e629209ff1d9748185b9830fa4e7...
"Here's how to use the slop machine better" is such a ridiculous pretense a blog or article. You simply write a sentence and it approximates it. That is hardly worth any literature being written as it is so self obvious.
I have found enabling the codebase itself to be the “Claude.md” to be most effective. In other words, set up effective automated checks for linting, type checking, unit tests etc and tell Claude to always run these before completing a task. If the agent keeps doing something you don’t like, then a linting update or an additional test often is more effective than trying to tinker with the Claude.md file. Also, ensure docs on the codebase are up to date and tell Claude to read relevant parts when working on a task and of course update the docs for each new task. YMMV but this has worked for me.
[dead]
The advice here seems to assume a single .md file with instructions for the whole project, but the AGENTS.md methodology as supported by agents like github copilot is to break out more specific AGENTS.md files in the subdirectories in your code base. I wonder how and if the tips shared change assuming a flow with a bunch of focused AGENTS.md files throughout the code.
Hi, post author here :)
I didn’t dive into that because in a lot of cases it’s not necessary and I wanted to keep the post short, but for large monorepos it’s a good idea
Interesting selection of models for the "instruction count vs. accuracy" plot. Curious when that was done and why they chose those models. How well does ChatGPT 5/5.1 (and codex/mini/nano variants), Gemini 3, Claude Haiku/Sonnet/Opus 4.5, recent grok models, Kimi 2 Thinking etc (this generation of models) do?
Guessing they included some smaller models just to show how they dump accuracy at smaller context sizes
Sure - I was more commenting that they are all > 6 months old, which sounds silly, but things have been changing fast, and instruction following is definitely an area that has been developing a lot recently. I would be surprised if accuracy drops off that hard still.
I imagine it’s highly-correlated to parameter count, but the research is a few months old and frontier model architecture is pretty opaque so hard to draw too too many conclusions about newer models that aren’t in the study besides what I wrote in the post
None of this should be necessary if these tools did what they say on the tin, and most of this advice will probably age like milk.
Write readmes for humans, not LLMs. That's where the ball is going.
Hi, post author here :)
Yes README.md should still be written for humans and isn’t going away anytime soon.
CLAUDE.md is a convention used by claude code, and AGENTS.md is used by other coding agents. Both are intended to be supplemental to the README and are deterministically injected into the agent’s context.
It’s a configuration point for the harness, it’s not intended to replace the README.
Some of the advice in here will undoubtedly age poorly as harnesses change and models improve, but some of the generic principles will stay the same - e.g. that you shouldn’t use an LLM to do a linter &formatter’s job, or that LLMs are stateless and need to be onboarded into the codebase, and having some deterministically-injected instructions to achieve that is useful instead of relying on the agent to non-deterministically derive all that info by reading config and package files
The post isn’t really intended to be super forward-looking as much as “here’s how to use this coding agent harness configuration point as best as we know how to right now”
It seems overall a good set of guidelines. I appreciate some of the observations being backed up by data.
What I find most interesting is how a hierarchical / recursive context construct begins to emerge. The authors' note of "root" claude.md as well as the opening comments on LLMs being stateless ring to me like a bell. I think soon we will start seeing stateful LLMs, via clever manipulation of scope and context. Something akin to memory, as we humans perceive it.
"You can investigate this yourself by putting a logging proxy between the claude code CLI and the Anthropic API using ANTHROPIC_BASE_URL" I'd be eager to read a tutorial about that I never know which tool to favour for doing that when you're not a system or network expert.
Hi, post author here
We used cloudflare’s AI gateway which is pretty simple. Set one up, get the proxy URL and set it through the env var, very plug-and-play
Have you considered just asking claude? I'd wager you'd get up and running in <10 minutes.
agree - i've had claude one-shot this for me at least 10 times at this point cause i'm too lazy to lug whatever code around. literally made a new one this morning
I was expecting the traditional AI-written slop about AI, but this is actually really good. In particular, the "As instruction count increases, instruction-following quality decreases uniformly" section and associated graph is truly fantastic! To my mind, the ability to follow long lists of rules is one of the most obvious ways that virtually all AI models fail today. That's why I think that graph is so useful -- I've never seen someone go and systematically measure it before!
I would love to see it extended to show Codex, which to my mind is by far the best at rule-following. (I'd also be curious to see how Gemini 3 performs.)
I looked when I wrote the post but the paper hasn’t been revisited with newer models :/
Ha, I just tell Claude to write it. My results have been generally fine, but I only use Claude on a simple codebase that is well documented already. Maybe I will hand-edit it to see if I can see any improvements.
Looking for a similar GEMINI.md
It might support AGENTS.md, you could check the site and see if it’s there
It would be nice to see an actual example of what a good claude.md that implements all of these recommendations looks like.
What is a good Claude.md?
Claude.md - A markdown file you add to your code repository to explain how things work to Claude.
A good Claude.md - I don’t know, presumably the article explains.
Crafted by Rajat
Source Code