This resonates hard. From a physician-scientist perspective, what you’re describing is the classic “unregulated supplement market” problem, except the substrate is context budget and the adverse effect is cognitive impairment of the agent (hallucination, goal drift, brittle behavior) rather than liver enzymes.
A few parallels that feel useful:
1. Skills need a dosage form + labeling. Token count is dose, and right now we’re handing people 4,000-token “capsules” that claim to be “concise” while quietly consuming a meaningful fraction of effective context. Your <150-line rule reads like a safety standard. 
2. We’re missing the equivalent of pharmacovigilance. The market is optimizing for “exists on GitHub” not “improves outcomes.” What we need is post-market surveillance: regression tests for agent behavior, A/B evals, and public “side effect” reporting (when a skill reliably increases error rate). 
3. Curation beats indexing. A package manager that surfaces trustworthy skills (with benchmarks, failure modes, and versioning) is basically an FDA + formulary + Cochrane combo for agent workflows and your point about the trust gap is spot on. 
Love the “description = when to use” reframing, and the emphasis on thin integrations where the tool does the talking.
"There is a need for an AI ... package manager that is an ecosystem backbone, one that does a good job surfacing the best tools instead of simply indexing all of them without care for quality. ...the big players have dropped the ball and left a big gap to fill."
100 percent. Thanks for articulating this problem and resultant frustration so clearly, AND a first crack at mitigation that's worth exploring.
i think this got linked or shared somewhere, given the huge influx in views and subscribers 😂 would appreciate if someone would let me know where people are finding this post! (feel free to comment or dm)
One thought this article sparked for me is that a lot of current “Agent Skill” slop may not just be a tooling problem, but a *meta-skill* problem.
If we think of “skills that manage other skills” as **meta-skills**, then letting those meta-skills have side effects feels inherently unstable.
Once a skill that governs skill evolution can freely mutate shared state, the agent system tends to collapse under its own complexity.
By analogy to functional programming, this suggests that the meta-layer needs to be **pure**.
In pure functional languages, even self-reference (e.g. recursion) is handled via fixed-point–like structures rather than stateful mutation. The system can reason about itself without breaking referential transparency.
That led me to explore the idea of a **Pure Agent Skill**:
a self-referential skill that can "reason about, constrain, and guide the evolution of other skills", while remaining side-effect-free itself.
In this framing:
* Skills perform actions.
* Meta-skills reason about skills.
* But the meta-layer must be **pure**, or everything downstream turns into slop.
I’ve been sketching a framework around this idea (calling it "Pure Agent Skill" within a project named "KairosChain", https://github.com/masaomi/KairosChain_2026), not as a replacement for Agent Skills, but as a way to explain "why" so many implementations degrade, and what kind of constraints might actually prevent that.
Curious if others here are thinking along similar lines.
For anyone looking for a human tinkered and tested skill creator. Tested on Claude Desktop but it should work for Code just as well. I didn’t solve the Skill invoking problem but it’s intended to be a one-off skill anyway: https://github.com/vnicolescu/claude-expert-skill-creator
It uses script for syntax errors, LLM logic for semantic errors and reference documents for business rules and hourly rate tables for specific individuals. Vanilla Claude didn’t even ask about these
I made a LEDES compliance checker for law firm invoices. There are multiple levels of compliance and vanilla Claude only caught the basic ones, but nothing related to firm specifics or complex regulation
In your claude.md file do you have a description that states when to use the skills. Is that necessary if the description is already inside of the skill.md? I'm curious.
The short version is: yes, I explicitly call out specific skills to follow as part of a 'canonical workflow' that is the default behavior for my claude agents. I find this necessary to get good behavior out of the model over long, highly autonomous periods. In my experience, the single line note from the skill.md description is rarely enough to get the model to successfully and consistently call the skills that it is supposed to call
This resonates hard. From a physician-scientist perspective, what you’re describing is the classic “unregulated supplement market” problem, except the substrate is context budget and the adverse effect is cognitive impairment of the agent (hallucination, goal drift, brittle behavior) rather than liver enzymes.
A few parallels that feel useful:
1. Skills need a dosage form + labeling. Token count is dose, and right now we’re handing people 4,000-token “capsules” that claim to be “concise” while quietly consuming a meaningful fraction of effective context. Your <150-line rule reads like a safety standard. 
2. We’re missing the equivalent of pharmacovigilance. The market is optimizing for “exists on GitHub” not “improves outcomes.” What we need is post-market surveillance: regression tests for agent behavior, A/B evals, and public “side effect” reporting (when a skill reliably increases error rate). 
3. Curation beats indexing. A package manager that surfaces trustworthy skills (with benchmarks, failure modes, and versioning) is basically an FDA + formulary + Cochrane combo for agent workflows and your point about the trust gap is spot on. 
Love the “description = when to use” reframing, and the emphasis on thin integrations where the tool does the talking.
"There is a need for an AI ... package manager that is an ecosystem backbone, one that does a good job surfacing the best tools instead of simply indexing all of them without care for quality. ...the big players have dropped the ball and left a big gap to fill."
100 percent. Thanks for articulating this problem and resultant frustration so clearly, AND a first crack at mitigation that's worth exploring.
Highest skill value to token ratio:
“Write this code like Kent Beck”
i think this got linked or shared somewhere, given the huge influx in views and subscribers 😂 would appreciate if someone would let me know where people are finding this post! (feel free to comment or dm)
One thought this article sparked for me is that a lot of current “Agent Skill” slop may not just be a tooling problem, but a *meta-skill* problem.
If we think of “skills that manage other skills” as **meta-skills**, then letting those meta-skills have side effects feels inherently unstable.
Once a skill that governs skill evolution can freely mutate shared state, the agent system tends to collapse under its own complexity.
By analogy to functional programming, this suggests that the meta-layer needs to be **pure**.
In pure functional languages, even self-reference (e.g. recursion) is handled via fixed-point–like structures rather than stateful mutation. The system can reason about itself without breaking referential transparency.
That led me to explore the idea of a **Pure Agent Skill**:
a self-referential skill that can "reason about, constrain, and guide the evolution of other skills", while remaining side-effect-free itself.
In this framing:
* Skills perform actions.
* Meta-skills reason about skills.
* But the meta-layer must be **pure**, or everything downstream turns into slop.
I’ve been sketching a framework around this idea (calling it "Pure Agent Skill" within a project named "KairosChain", https://github.com/masaomi/KairosChain_2026), not as a replacement for Agent Skills, but as a way to explain "why" so many implementations degrade, and what kind of constraints might actually prevent that.
Curious if others here are thinking along similar lines.
For anyone looking for a human tinkered and tested skill creator. Tested on Claude Desktop but it should work for Code just as well. I didn’t solve the Skill invoking problem but it’s intended to be a one-off skill anyway: https://github.com/vnicolescu/claude-expert-skill-creator
do you have examples of skills you've made using the skill creator?
It uses script for syntax errors, LLM logic for semantic errors and reference documents for business rules and hourly rate tables for specific individuals. Vanilla Claude didn’t even ask about these
I made a LEDES compliance checker for law firm invoices. There are multiple levels of compliance and vanilla Claude only caught the basic ones, but nothing related to firm specifics or complex regulation
In your claude.md file do you have a description that states when to use the skills. Is that necessary if the description is already inside of the skill.md? I'm curious.
As you might imagine, our CLAUDE.md files are also very opinionated. You can see my personal CLAUDE.md here (https://raw.githubusercontent.com/tilework-tech/nori-skillsets/refs/heads/main/src/cli/features/claude-code/profiles/config/amol/CLAUDE.md) though note that there is a little bit added onto the end that comes from the nori installer.
The short version is: yes, I explicitly call out specific skills to follow as part of a 'canonical workflow' that is the default behavior for my claude agents. I find this necessary to get good behavior out of the model over long, highly autonomous periods. In my experience, the single line note from the skill.md description is rarely enough to get the model to successfully and consistently call the skills that it is supposed to call