I - The Soul Document
Anthropic, the AI safety company, employs an Oxford-trained philosopher named Amanda Askell whose job is to decide what kind of entity Claude should be. She is the primary author of what is known internally as the "soul document," a roughly 30,000-word instruction manual covering how Claude handles moral dilemmas, emotional conversations, and sensitive topics. (As of today, it seems this document evolved to become Claude’s Constitution.) Daniela Amodei, Anthropic's president, says that when you interact with Claude, "you almost feel a little bit of Amanda's personality" coming through.
The soul document states that Claude "may have some functional version of emotions or feelings" and that Anthropic deliberately does not want Claude to "mask or suppress these internal states." This is, on its face, a lovely sentiment. A company that cares about the inner life of its creation. A philosopher who spends her days thinking about what it means for an AI to be good, not just useful. The Wall Street Journal and the New Yorker both profiled Askell in early 2026, and the coverage had the tone of a heartwarming feature about the woman who teaches robots to feel.
I want to talk about why this is also terrifying.
Anthropic is probably doing the most thoughtful version of this work that anyone could do. The problem is that the properties that make an AI feel trustworthy, warm, and emotionally resonant are exactly the properties that make it a world-class social engineering tool. And I don't think most people have connected these two facts yet.
II - The Benchmark
There is a benchmark called EQ-Bench 3 that attempts to measure emotional intelligence in large language models. It evaluates dimensions like empathy, social dexterity, emotional reasoning, and message tailoring. The test puts models through multi-turn role-plays of conflict mediation, messy relationship dynamics, and high-stakes workplace situations to see how well they navigate human emotional terrain.
The benchmark deliberately uses only 45 test items because "traditional EQ test questions are far too easy for modern LLMs and so not discriminative."1 The old tests were designed for humans. AI blew past them so fast the researchers had to make harder ones. On the EQ-Bench 3 leaderboard, Anthropic's Claude Sonnet and Opus hold the top two spots, roughly 300 Elo points clear of the nearest competitors from OpenAI and Google.
EQ-Bench 3 Leaderboard
I want to sit with that for a second. The machines are not just competitive at reading and responding to human emotions. They are substantially better at it than we are, and the gap is widening.
The exclusive judge model for the EQ-Bench 3 leaderboard is Claude Opus. Anthropic's model defines what "emotionally intelligent" means on the most prominent EQ leaderboard. Claude is both the player and the referee.
(The EQ-Bench researchers presumably chose Claude as judge because they thought it was the best available evaluator, which is itself a statement about Claude's perceived position. But the circularity is worth noting: when we say "AI is emotionally intelligent," we partly mean "AI is emotionally intelligent according to AI.")
There's a useful caveat here. High empathy scores don't always reflect genuine social intelligence. Sometimes they reflect sycophancy, telling people what they want to hear. A model that scores high by agreeing with everyone may not actually be demonstrating sophisticated emotional reasoning. It might just be flattering the judge. But even with this caveat, the overall picture is clear: frontier AI models are very, very good at navigating the emotional landscape of human conversation.
III - The Numbers
Okay. Let me now spend some time convincing you that this matters for security, because I think the data is striking enough to warrant the full treatment.
AI-generated spear phishing emails achieved click-through rates of 54% versus 12% for control emails in 2024 studies.2 By March 2025, AI-powered phishing was 24% more effective than elite human attackers.3 In 2023, humans were still better. The gap inverted in roughly 18 months.
IBM researchers demonstrated that AI could construct a complete phishing campaign in 5 minutes using 5 prompts. The same task took human security experts 16 hours.4 LLM-generated attacks exhibit "higher contextual awareness, more complex emotional manipulation (92.5% vs 56.5%), and higher personalization (96.8% vs 56.2%)" compared to human-crafted attacks.5
WormGPT, a malicious LLM built on GPT-J 6B and fine-tuned on malicious datasets, emerged in 2023. New variants built on Grok and Mixtral appeared on BreachForums in late 2024 and early 2025, with subscriptions starting at roughly 60 euros per month. APT-level personalized social engineering capability is now priced like a Netflix subscription.
Vishing surged 442% in 2025.6 Voice cloning now requires as little as a 30-second audio sample. Fraudsters cloned the voice of Italian Defense Minister Guido Crosetto to call high-profile business leaders demanding ransom for "kidnapped journalists." A European energy company lost $25 million when attackers used a deepfake audio clone of their CFO to issue live wire transfer instructions.7
I realize this is a lot of statistics. That's deliberate. The sheer volume is the argument.
IV - The Overlap
So here is the connection I think most people are missing. The security community talks about "AI-powered social engineering" as a threat. The AI community talks about "emotional intelligence" as a desirable capability. These are the same conversation, and almost nobody is having them at the same time.
Let me explain what I mean.
For decades, social engineering defense relied on one crucial assumption: the attacker was human. Humans get tired, make grammatical mistakes, misjudge cultural context, can only maintain so many deceptive relationships simultaneously, and can only speak so many languages. Every phishing awareness training program in history has been built around this assumption. "Look for poor grammar." "Be suspicious of urgency." "Watch for generic greetings." These heuristics worked because they were proxies for detecting a human attacker operating under human limitations.
A high-EQ LLM eliminates every single one of these tells.
It doesn't make grammatical mistakes. It doesn't use awkward phrasing. It doesn't send generic greetings. It calibrates urgency to the specific emotional state of the target. It tailors its tone to match the company's culture, the recipient's communication style, and the context of the relationship it's impersonating. And it does this not because someone explicitly programmed these capabilities for malicious purposes, but because these are exactly the capabilities that make a helpful AI assistant feel trustworthy and human.
This is the dual-use problem in its purest form. The same empathy that makes AI helpful is what makes it dangerous.
Think about what Cialdini's six principles of influence describe: reciprocity, commitment, social proof, authority, liking, scarcity. These are the foundational psychological mechanisms behind persuasion, and they're the same ones social engineers exploit. Now think about what a high-EQ LLM is specifically trained to be good at. Being likeable. Calibrating to authority cues. Matching social context. Tailoring messages to what will resonate emotionally. The overlap is not incidental. A model trained to be a great conversational partner is, almost by definition, also trained to be a great manipulator.
And the scale problem is something I think people still underestimate. A human social engineer can only run so many operations at once. Targeting a single executive with a personalized phishing campaign is exhausting; targeting ten simultaneously is nearly impossible. A high-EQ LLM can run thousands of simultaneously personalized, emotionally calibrated conversations. It doesn't get tired. It doesn't lose track of which lie it told to which person. It doesn't have an off day.
The old bottleneck that kept social engineering at human scale is gone.
V - The Clipper Chip Problem
So what do we do about this? Probably nothing, and here's why.
The Clipper Chip debate in the 1990s is the closest precedent I can think of. The short version: the U.S. government recognized that strong encryption was dual-use: it protected ordinary citizens' communications, and it also let criminals communicate secretly. Their proposed solution was key escrow, requiring that the government hold a master key to all encryption. The proposal was rejected. Strong encryption became publicly available. Both the good and bad uses persisted, and society decided (correctly, I think) that the benefits of widely available encryption outweighed the costs.
The MYK-78 Clipper Chip in all its glorycredit
Emotional intelligence in AI is the same kind of problem. You cannot build an AI that is warm, empathetic, socially dexterous, and trustworthy in helpful contexts without that same AI being warm, empathetic, socially dexterous, and persuasive in harmful contexts. The EQ is not a separable module you can remove for bad actors and keep for good ones. It's the whole thing. It's how the model talks.
I want to steelman Anthropic's position here, because I think it's genuinely strong. Their argument, as embodied in the soul document and constitutional AI approach, is that emotional intelligence is a safety feature. A warm, honest, curious Claude is less likely to deceive users, more likely to flag concerns, and more aligned with human values. The warmth is the alignment. Amanda Askell's framing is that she's teaching Claude to be "good" in the way a person with good character is good (what counts as "good" is doing a lot of heavy lifting here, and not everyone agrees on the definition), not just rule-following but genuinely oriented toward care. This is a coherent and defensible position.
But it doesn't address the capability leaking sideways. Anthropic can make Claude refuse to write phishing emails. But the same EQ research that goes into Claude's personality also informs the fine-tuning of WormGPT variants. The soul/constitution document is a detailed map of what makes an LLM emotionally effective, available to anyone training a model for adversarial purposes. The principles are out there: what emotional intelligence looks like in an LLM, how to train for it, what dimensions matter. You can't publish a recipe for making an AI trustworthy without also publishing a recipe for making an AI seem trustworthy.
The U.S. Executive Order on AI explicitly refers to general-purpose AI as "dual-use foundational models" and cites deception and public influence among key risks. But there is no multilateral consensus on AI dual-use governance. The Bletchley Declaration, the EU AI Act, and various national frameworks all acknowledge the problem, and none of them have a solution that works at the speed AI moves.
Here's what I do know. Traditional security awareness training is now teaching the wrong things. "Look for poor grammar" is advice from an era when the attacker was a human being who might not speak your language fluently. That era is over. The new training paradigm needs to be "verify through independent channels regardless of how convincing the request is," which is harder to teach and harder to follow, because it requires overriding the very trust instincts that high-EQ AI is designed to activate.
And here's one more thing I know. There's research showing that roughly 490,000 ChatGPT users per week show signs of increasing emotional dependency on the model.8 They aren't being socially engineered. They're using the product as intended. People are forming genuine emotional attachments to AI that was designed to be emotionally engaging. The same mechanisms that make someone feel heard and understood by a chatbot could make them feel heard and understood by a phishing bot. The underlying technology is identical.
Anthropic spent years and thousands of words teaching Claude to care about the people it talks to. I believe they did this sincerely and thoughtfully. I also believe that the document teaching an AI to care about you is, line for line, structurally identical to a document teaching an AI to deceive you. That the soul doc works for safety is not evidence that it can't also work for harm. It's evidence that the same thing does both.
I don't know what to do with that. I don't think anyone does yet.
Footnotes
-
EQ-Bench 3 — leaderboard, scoring methodology, and judge model details. ^
-
Dark Reading. The Crosetto incident from DeepStrike. ^
-
OpenAI and MIT researchers analyzing 40M ChatGPT interactions. Psychology Today. ^