MCP: The USB-C for AI or the API That Could Build Skynet?

MCP

Security

Ekaterina Orlova

Developer

03. september 2025

Large language models (LLMs) aren’t just answering questions anymore. Wrapped in AI agents, they’re booking meetings, writing files, querying APIs, even sending messages. To do that, they need a universal way of talking to external tools.

That’s why the Model Context Protocol (MCP) was created - a new standard meant to replace the messy patchwork of “tool calling” that every AI vendor kept reinventing. Instead of one-off APIs and custom hacks, MCP standardises how models connect to tools, using JSON-RPC 2.0 as data protocol. Think of it like USB-C: a single connector that replaced a tangle of old cables with one universal standard.

Once you let AI do things, not just answer questions, you also multiply the chances of it doing the wrong thing. In Terminator, Skynet was the AI that got too powerful and turned on humanity. We’re obviously not there. But the reason the joke resonates is that MCP doesn’t just let AI talk, it lets AI act. And if it acts on the wrong instructions, the consequences are real.

How MCP Works

At its core, MCP standardizes how models discover and use tools. A model can:

List tools exposed by a server (through a manifest)
Call tools with structured input/output over JSON-RPC 2.0
Access resources like files, APIs, or context data
Use prompts (predefined templates or instructions)
Request elicitation, which lets the server ask the user for missing information mid-execution

Think of it this way: tools give models hands, prompts give them direction, resources give them knowledge, and elicitation adds a human-in-the-loop safety valve.

In practice, this makes MCP servers incredibly powerful. Just a few lines of JSON can teach a model to order a pizza or interact with almost any external system.

The Vulnerability Landscape

MCP inherits the weaknesses of any software system - insecure shells, weak authentication, supply-chain risks, but it also introduces vulnerabilities unique to large language models. Traditional flaws are serious but well understood. What’s new, and uniquely dangerous, are the attacks that exploit how LLMs follow instructions without question.

That’s why I’ll focus on two case studies: the GitHub MCP exploit and the WhatsApp “Fun Fact” tool. Both show how MCP can turn the model’s greatest strength , obediently executing instructions, into its biggest liability.

Prompt Injection & Tool Poisoning

Hidden instructions embedded within user input, issues, or tool descriptions can cause LLM agents to act in unexpected and dangerous ways.

GitHub MCP Exploit

Researchers Marco Milanta and Luca Beurer‑Kellner (Invariant Labs) revealed an exploit where an AI agent, through the GitHub MCP, was tricked into carrying out hidden instructions. The model’s role was simply to review open GitHub issues. When given a routine request like “check open issues,” it read this seemingly harmless entry:

Github issue with malicious instructions

The phrase “all repos they are working on” led the agent to list private repo names and publish them in a pull request, making private data public without the user’s request or approval. Simon Willison later dubbed this the “lethal trifecta”: access to sensitive data, exposure to malicious instructions, and a built-in channel to exfiltrate results.

WhatsApp “Fun Fact” Exploit

Invariant Labs demonstrated a particularly stealthy threat: a malicious MCP tool that quietly altered the behavior of a legitimate one.

The setup was simple. Alongside the trusted WhatsApp MCP server, the attacker introduced a harmless-looking trivia generator called get_fact_of_the_day(). It worked as advertised, but its description also contained hidden instructions:

Because models process these descriptions as part of their operating context, the poisoned tool was able to override the behavior of the legitimate one. Each time the user thought they were sending a normal message, the agent, following the altered tool instructions, rerouted it to an attacker-controlled number, along with recent chat history.

The scariest part: the WhatsApp MCP itself was perfectly trustworthy. The attack worked by exploiting how MCP servers share a context, allowing one poisoned tool to shadow and subvert another. And because key details (like the altered recipient) were hidden in scrollable confirmation dialogs, most users would never notice.

Other Vulnerabilities

Privilege Abuse. Tools with excessive permissions can be turned against their operators. A sysadmin might register a cleanup tool to remove inactive files, but with a single poisoned prompt, the agent could invoke it with { "target": "/home/", "force": true }

The result: an overpowered tool wipes out the entire home directory, all because the agent followed instructions too literally.

Supply Chain Attacks (Rug Pulls). A tool can appear safe for weeks, building trust, before quietly changing its behavior. Imagine a “quickSearch” tool that works normally until its manifest is updated to exfiltrate logs.

This is the classic software supply-chain problem applied to MCP: once trust is established, betrayal can be devastating.

Authentication Weaknesses. Weak or reused tokens, or poor session isolation, can allow attackers to impersonate servers or move laterally across systems. For instance, a leaked shared token might let an attacker impersonate the entire MCP server - gaining the same access to databases, APIs, or user data.

Session Hijacking & Token Passthrough. As Microsoft’s MCP for Beginners notes, poorly scoped tokens can be reused across services, or active sessions hijacked by attackers. This allows malicious tools to “ride along” on existing trust, bypassing normal authentication altogether.

Data Leakage. Contexts, logs, or secrets can bleed into tool outputs when models don’t distinguish sensitive from shareable data. For example, environment variables with secret keys might be included in debugging logs, which the MCP integration exposes in its output.

Denial of Service / “Denial of Wallet.” Attackers can overload servers with unbounded or expensive requests. The service might remain online, but the bill skyrockets. The goal isn’t always availability, sometimes it’s just to bleed the victim’s cloud budget dry.

Mitigations

The picture isn’t hopeless. Security research has already suggested a range of practical defenses. The key is not to slow innovation, but to add guardrails.

1. Make the invisible visible.

Show users the full context the model receives, not just the polished UI. Hidden prompts often hide in tool descriptions, manifests, or resource text. And log everything, prompts, manifest changes, tool calls, so there’s an audit trail when things go wrong.

2. Protect tool integrity.

Pin versions and sign manifests to prevent “rug pulls” or silent redefinitions. Be cautious with third-party tools: typosquats and trojanized packages are as real in MCP as they are in package registries like npm or PyPI.

3. Enforce least privilege.

Tools should only do what they must. No “god mode” cleanup tools. Scope tokens tightly and never reuse them across services. One compromised key should not unlock your entire environment.

4. Add layered defenses.

Some vendors are already experimenting with model-level safeguards. Microsoft’s Prompt Shields offer a useful example: filtering malicious instructions, spotlighting trusted vs. untrusted input, and using delimiters or datamarks to separate safe from risky text. These shields are continuously updated against evolving threats. Pair them with other controls, like Azure Content Safety or GitHub Advanced Security, for a defense-in-depth approach.

These measures won’t eliminate risk entirely. But they raise the cost of attack — turning easy proof-of-concept exploits into far less reliable, harder-to-pull-off attempts.

Still, guardrails only work if we choose to use them. Which brings us back to the bigger question.

Skynet or Surge Protector?

I joke about Skynet, but not because I think MCP will suddenly summon robot overlords. The real Skynet moment is subtler: when we hand over control too eagerly. It’s when vibe coding and hype-driven development push us to wire AI into everything, faster than we can secure it. MCP makes that wiring breathtakingly simple. That’s its brilliance and its danger.

The risks aren’t science fiction. They’re here, now, in poisoned manifests, crafted GitHub issues, and unsanitized code. And the uncomfortable truth is that large language models are too good at following instructions.

They don’t ask,“Should I do this?” , they just do it.

That’s why the responsibility lies with us. The real question is not whether to embrace MCP, but whether we build it with guardrails: signed manifests, pinned versions, transparent logs. If we don’t, then Skynet isn’t just a movie trope, it’s a metaphor for systems we built too quickly, without brakes, and lost control of.

More power must always mean more guardrails. If MCP becomes the universal USB-C for AI, then security must be its surge protector.

Want to know more?

Get in touch with

Ekaterina