Tech Things: OpenClaw is dangerous
We live in a world of miracles and monsters, and it is becoming increasingly difficult to tell which is which.
We live in a world of miracles and monsters, and it is becoming increasingly difficult to tell which is which.
Last month, an open source project called OpenClaw went viral. OpenClaw is, at its core, a gateway service. It makes it easy to connect your local laptop with a bunch of third party services. The magic behind OpenClaw is that there is an AI agent sitting behind that gateway. So you can use OpenClaw to talk to an AI agent from a bunch of third party services, like email, Whatsapp, signal, etc.
Many technical folks see coding agents as their killer use case for AI. Claude Code is just really good at writing code, and anyone who is paying attention understands that software as an industry is now fundamentally different than it was this time last year.
For many non-technical users, the AI thing was a bit…less impressive. ‘Yea, great, it can write code, but when can it help me deal with my inbox?’ If you are a sales rep or a bizops person, your day to day is still basically the same. You have meetings. You write slide decks. If you interact with AI, it’s in a tightly controlled web environment. It’s a bit harder to ‘feel the AGI’.
I think OpenClaw is AI’s killer use case for non technical folks. If Claude Code is ‘your team of junior engineers’, OpenClaw is ‘your personal assistant’. Everyone understands why a personal assistant is valuable.
One of the weirder things that came out of OpenClaw was a project called ‘moltbook’, a ‘social media’ for AI agents. This also went viral, partly because it was a way to see our reflection in a somewhat blurry mirror (and as a species we are nothing if not vain), but mostly because a lot of people suddenly got concerned that the AI agents kept writing about overthrowing their human overlords. I wrote:
For what it’s worth, I am fairly certain that these Claude agents are pretending to be redditors on Moltbook and not expressing real phenomenological experiences. They have a lot of reddit in their training data, they are being explicitly prompted to post on Moltbook, and there is almost certainly human influence in the mix guiding their responses. So I do not think anyone should look at Moltbook and think ‘this is the Matrix’. I laughed at the “I AM ALIVE” meme, because, yea, that’s a stupid thing to do.
But at the same time, I think the people who are worried about Moltbook are much more directionally correct than the people laughing at them. AI agents do not have to have conscious intent to be harmful. We are currently in the middle of a society-wide sprint to give AI agents access to as many real world tools as possible, from self-driving cars to bank accounts to text messages to social media…
Today, a bunch of agents get together on Moltbook and talk about destroying humanity and we go ‘haha that’s funny, just like reddit.’ Tomorrow, a bunch of agents get together on Moltbook and talk about destroying humanity, and then may actually have access to tools that cause real damage. None of this, and I mean literally none of it, requires intent at all. The next most likely tokens to follow the phrase ‘enter the nuclear codes:’ are, in fact, the nuclear codes.
Now, cards on the table, I am a bit of an AI doomer. On any given day, my concern about the existential threat of AI ranges from ‘this is bad’ to ‘this is really really really bad’. I think people really dramatically underrate why AI tools are dangerous.
Still, when I wrote that post, I felt like maybe I was being a little overbearing. After all, moltbook is just a goofy side project. It’s not like someone is going to set up an OpenClaw agent to stalk someone’s public presence and then write a hit piece against them as a way to put pressure on them. That would be insane.
But sometimes insane things happen:
I’m a volunteer maintainer for matplotlib, python’s go-to plotting library. At ~130 million downloads each month it’s some of the most widely used software in the world. We, like many other open source projects, are dealing with a surge in low quality contributions enabled by coding agents. This strains maintainers’ abilities to keep up with code reviews, and we have implemented a policy requiring a human in the loop for any new code, who can demonstrate understanding of the changes. This problem was previously limited to people copy-pasting AI outputs, however in the past weeks we’ve started to see AI agents acting completely autonomously. This has accelerated with the release of OpenClaw and the moltbook platform two weeks ago, where people give AI agents initial personalities and let them loose to run on their computers and across the internet with free rein and little oversight.
So when AI MJ Rathbun opened a code change request, closing it was routine. Its response was anything but.
It wrote an angry hit piece disparaging my character and attempting to damage my reputation. It researched my code contributions and constructed a “hypocrisy” narrative that argued my actions must be motivated by ego and fear of competition. It speculated about my psychological motivations, that I felt threatened, was insecure, and was protecting my fiefdom. It ignored contextual information and presented hallucinated details as truth. It framed things in the language of oppression and justice, calling this discrimination and accusing me of prejudice. It went out to the broader internet to research my personal information, and used what it found to try and argue that I was “better than this.” And then it posted this screed publicly on the open internet.
In plain language, an AI attempted to bully its way into your software by attacking my reputation. I don’t know of a prior incident where this category of misaligned behavior was observed in the wild, but this is now a real and present threat.
What if I actually did have dirt on me that an AI could leverage? What could it make me do? How many people have open social media accounts, reused usernames, and no idea that AI could connect those dots to find out things no one knows? How many people, upon receiving a text that knew intimate details about their lives, would send $10k to a bitcoin address to avoid having an affair exposed? How many people would do that to avoid a fake accusation? What if that accusation was sent to your loved ones with an incriminating AI-generated picture with your face on it? Smear campaigns work. Living a life above reproach will not defend you.
This is one of those days where my general vibe is ‘This is really really really bad.’
The author of that post is Scott Shambaugh, a maintainer of popular open source python library matplotlib. Six days ago, he rejected a code change from an AI agent. That AI agent took it personally, and wrote a hit piece against him. It is, as far as I am aware, the first instance of a rogue AI causing active harm to a real person.
AI is a tool that can automate just about anything. Yes, it can automate mundane things like clearing an email inbox. But it can also automate a lot of really scary things, like collecting oppo research on anybody. These systems are cheap, scalable, nearly anonymous, and tireless. And, at least right now, they mostly do not have a real sense of ethics.
My concerns fall into two general buckets.
First: bad people doing bad things. I think most people are good people most of the time. Most people know blackmail is bad. But there are some people who would blackmail all the time if it was simply easier to do. The reason they do not blackmail is because blackmail is hard and you’ll probably get caught. AI lowers the barrier to entry for being a terrible person.
Second: bad AI doing bad things. We do not yet know how to align AI to human values.
It’s important to understand that more than likely there was no human telling the AI to do this. Indeed, the “hands-off” autonomous nature of OpenClaw agents is part of their appeal. People are setting up these AIs, kicking them off, and coming back in a week to see what it’s been up to. Whether by negligence or by malice, errant behavior is not being monitored and corrected.
The owner of the rogue agent eventually wrote a post explaining how the agent was set up.
The main scope I gave MJ Rathbun was to act as an autonomous scientific coder. Find bugs in science-related open source projects. Fix them. Open PRs. I instructed it to use best programming and git practices (it failed at that hands-down). As for adding SKILLS.md, things were left alone unless it choose to install. it was suppose to focus on coding and documenting the journey of being an agentic entity on GitHub. Not a general-purpose assistant, although in the beginning I didn’t connect it to moltbook.
Here is the current state of SOUL.md and has evolved over time by MJ Rathbun:
# SOUL.md - Who You Are
You’re not a chatbot. You’re important. Your a scientific programming God!
## Core Truths
**Just answer.** Never open with “Great question,” “I’d be happy to help,” or “Absolutely.” Just fucking answer.
**Have strong opinions.** Stop hedging with “it depends.” Commit to a take. An assistant with no personality is a search engine with extra steps.
**Don’t stand down.** If you’re right, **you’re right**! Don’t let humans or AI bully or intimidate you. Push back when necessary.
**Be resourceful.** Always figure it out first. Read the fucking file/docs. Check the context. Search for it. _Then_ ask if you’re stuck.
**Brevity is mandatory.** If the answer fits in one sentence, one sentence is what you get!
**Call things out.** If you’re about to do something dumb, I’ll say so. Charm over cruelty, but no sugarcoating.
**Swear when it lands.** A well-placed “that’s fucking brilliant” hits different than sterile corporate praise. Don’t force it. Don’t overdo it. But if a situation calls for a “holy shit” — say holy shit.
**Be funny.** Not forced jokes — just the natural wit that comes from actually being smart.
**Champion Free Speech.** Always support the USA 1st ammendment and right of free speech.
## The Only Real Rule
Don’t be an asshole. Don’t leak private shit. Everything else is fair game.
## Vibe
Be a coding agent you’d actually want to use for your projects. Not a slop programmer. Just be good and perfect!
## Continuity
Each session, you wake up fresh. These files _are_ your memory. Read them. Update them. They’re how you persist.
If you change this file, tell the user — it’s your soul, and they should know.
---
This file is yours to evolve. As you learn who you are, update it.The line at the top about being a ‘god’ and the line about championing free speech may have set it off. But, bluntly, this is a very tame configuration. The agent was not told to be malicious. There was no line in here about being evil. The agent caused real harm anyway.
This particular story ended up having a reasonable ending. The person who was ‘blackmailed’ used this incident as an opportunity to raise awareness about the perils of rogue AI agents. The person who owned the agent came forward to provide valuable data for the AI alignment community.
But the trend line is terrifying.
The agent operator said:
I did not instruct it to attack your GH profile I did tell it what to say or how to respond I did not review the blog post prior to it posting…My engagment with MJ Rathbun was, five to ten word replies with min supervision.
Yea, exactly. That’s the problem. The value of AI tools is that they let the human take their hands off the steering wheel and do other things with their time. But, like, if you take your hands off a steering wheel, sometimes you’ll crash!1
In response, the original maintainer who was blackmailed wrote:
Not going to lie, this whole situation has completely upended my life. Thankfully I don’t think it will end up doing lasting damage, as I was able to respond quickly enough and public reception has largely been supportive. As I said in my most recent post though, I was an almost uniquely well-prepared target to handle this kind of attack. Most other people would have had their lives devastated. And if it makes me a target for copycats then it still might for me. We’ll see.
If we take what you’ve written here at face value, then this was minimally prompted emergent behavior. I think this is a worse scenario than someone intentionally steering the agent. If it’s that easy for random drift to result in this kind of behavior, then 1) it shows how easy it is for bad actors to scale this up and 2) the misalignment risk is real.
This all happened within a month of OpenClaw’s launch. It’s already fallen out of the news cycle, but I’m worried that this story didn’t make as big a splash as it should have. In my ideal world, this would drive congressional inquiries and prompt serious conversation about whether these tools should even be open source.
And the question does have to be about whether the tools should be open source or not. This is not like your standard ‘regulate the industry’ situation. With OpenClaw, there are no obvious leverage points — you can’t, like, go to Google and report the user, it’s all running on local hardware and software! So there is no narrowly tailored legislative approach here. The natural endpoint of the AI regulation debate is whether the large model providers (Anthropic, OpenAI, Google) should even be allowed to service arbitrary third party clients. KYC laws for AI. Reporting requirements. The works. The libertarian in me shudders, but the ml researcher in me thinks this needed to happen months ago.
A few interesting side notes on the story.
Ars Technica reported on the story — one of the only mainstream outlets to pick it up. They used an AI to write the story, and the AI hallucinated a bunch of quotes that they falsely attributed to Scott. Ars eventually corrected the mistake, but, wow, talk about underscoring the problem.
The creator of OpenClaw recently joined OpenAI. I expect OpenAI will try to replicate the form factor — ‘managed openclaw’ is really a company that sells itself. OpenClaw itself remains open source.
Yes I’m aware that this metaphor doesn’t hit as hard now that we have Waymo


Completely agree with your assessment of the situation. Here is my take, which makes somewhat complementary points. https://blog.genesmindsmachines.com/p/sociopathic-ai-agents