What the First LLM-Driven Intrusion Means for SOC Reporting Workflows

Jun 25

On May 10, 2026, Sysdig's Threat Research Team (TRT) recorded something that had never been documented before: an intrusion in which an LLM agent drove every decision in the post-exploitation phase in under sixty minutes — from the first command after initial access through to the exfiltration of a complete internal database.

The entry point was CVE-2026-39987, a critical remote code execution vulnerability in Marimo, an open-source Python notebook platform. A single WebSocket request opened an interactive shell on any unpatched instance. The first session was recorded at 18:23:44 UTC, and within seconds the agent had begun active reconnaissance.

What followed was a four-pivot lateral movement chain, which is a sequence of escalating steps in which each compromised system or credential served as a foothold to reach the next. First, the agent did not follow a pre-written playbook. Rather, it read the environment in real time, extracted two cloud credentials from environment files and the host's AWS credentials store, and used them to drive the next move.

The second pivot came roughly 48 minutes after the Marimo session ended, when the agent called AWS APIs using the harvested credentials. To defeat per-source-IP detection, it fanned twelve AWS API calls across eleven distinct Cloudflare Workers IP addresses in just 22 seconds — executed autonomously, treating distributed egress as a standard operational parameter.

The third pivot was retrieving an SSH private key from AWS Secrets Manager — selected from a ListSecrets response the agent had received 20 seconds earlier, using its own prior output as direct input for the next step.

The fourth pivot was the most striking. The agent opened eight parallel SSH sessions against a downstream bastion server from six distinct IP addresses simultaneously, conducted live schema enumeration of an internal PostgreSQL database, identified a credential table absent from any publicly documented version of the application, and dumped six tables — credentials, API keys, and user records — in a single HEREDOC command block. The entire bastion phase ran in under two minutes.

Sysdig's analysis identified four technical signatures confirming an LLM agent was driving the attack: the agent improvised a database dump with no prior schema knowledge; a Chinese-language planning comment translating to "see what else we can do" leaked into the command stream; every command was formatted for machine parsing with structured delimiters and bounded output caps; and each command fed directly from prior tool output in a live reasoning loop.

Why This Is Different From Every Attack Before It

For decades, cybersecurity has operated on a simple premise: attackers leave footprints, and those footprints repeat. A piece of malware runs the same command sequence every time it fires, and a human hacker — even a skilled one — has to think, type, and make decisions.

That window of time is where security teams make their move, and the entire architecture of modern SOC operations was built around it. Alerts fire, an analyst investigates, a playbook kicks in, and the team responds. But that sequence only works if the attacker is slow enough to get caught in the middle of it.

What happened on May 10th broke that assumption. There was no malware signature to catch and no human keystroke rhythm to detect. There was an AI agent that read the environment, made decisions, and moved continuously without hesitation — from an exposed notebook server to a fully exfiltrated internal database in under an hour. The playbook your SOC has been running since 2015 was written for a different kind of adversary.

The agent reasoned in real time against an environment it had never seen. The decision-making was inferred from the environment in real time — the same way an experienced human attacker reasons through an unfamiliar system, at a speed no human attacker could match.

Traditional indicator-of-compromise(IOC)-based detection flags known-bad artifacts, like a malicious IP, a recognized hash, a flagged command string. However, an LLM agent doesn't produce those reliably. Instead, it improvises commands from the environment, uses legitimate cloud APIs, and routes through infrastructure that produces no consistent source-IP signature across runs.

The distributed egress pattern observed in this attack illustrates the point sharply. Twelve AWS API calls across eleven distinct IP addresses in 22 seconds reflects a deliberate agent-level choice — fan out requests to defeat per-source-IP correlation — executed at a speed no human operator could replicate.

Behavior-based detection is the model that holds against an adaptive reasoning system — focused on what the attacker accomplished rather than how. Credential access, lateral movement, schema enumeration, exfiltration: these outcomes are detectable regardless of the specific commands used. The Sysdig team reached exactly this conclusion.

Here's the part that should concern anyone running a security operation. This attack didn't give anyone the traditional investigation window before the damage was done. From the first command on an exposed notebook server to a fully exfiltrated internal database took less than an hour, and the most damaging phase — the actual data theft — was over in under two minutes. There was nothing to investigate mid-breach because the breach was already complete before most incident response processes would have even triggered.

Rethinking the Reporting Pipeline

The reporting gap exposed by autonomous AI intrusions is architectural. The existing system of legacy incident documentation was built for an adversary that moved slowly enough for analysts to correlate log entries, reconstruct a timeline, and write up findings before the next shift. That model breaks when the entire post-exploitation chain compresses into 113 seconds. By the time an analyst is manually correlating logs across five interfaces, the incident is already over.

What the new threat model requires is a reporting pipeline where aggregation, structure, and documentation happen in parallel with the investigation. When that workflow is in place, analysts can produce a post-exploitation narrative built around behavioral chains at a pace that matches the speed of the attacker.

A template designed for a human operator running a static playbook cannot capture what an AI-driven attacker actually does. Platforms designed for structured reporting workflows are being built to address this directly — Indago among them, purpose-built to help analysts aggregate threat intelligence and produce sourced, structured incident documentation without the overhead that slows traditional workflows.

The Corporate AI Exposure Problem

The infrastructure the Sysdig attacker targeted was nothing unusual, per say. Each of those components — the notebook environment, the cloud credentials, the secrets manager, the bastion — represents infrastructure that enterprise AI deployments routinely rely on.

As organizations deploy AI agents for internal automation, they are building systems that require broad, legitimate access to exactly the resources an adversarial LLM agent would target. Corporate AI agents need API keys to call external services. They need service accounts to query databases. They need access to internal tooling, code repositories, and orchestration layers. That access is a design requirement. It also creates a topology that an adversarial LLM agent can navigate the same way a legitimate one would.

A legitimate enterprise AI agent and an adversarial one are doing the same thing at the technical level: reading environment variables, making API calls, querying secret stores. The difference is authorization and intent — neither of which traditional security controls are designed to detect.

This creates a specific challenge for organizations that have deployed AI agents internally without revisiting their secrets management, credential scoping, or access control architectures. When an AI agent is granted access to AWS Secrets Manager because it needs to retrieve a database connection string for a legitimate workflow, the same permissions become usable by an adversary who has compromised the environment that agent runs in. The blast radius of a credential compromise expands in direct proportion to how broadly that credential has been scoped to support AI workloads.

The Sysdig intrusion demonstrated this dynamic at a small scale — a single compromised notebook server, a handful of credentials, one internal database. But enterprise AI deployments operate at a scale that multiplies every element of that chain. More agents mean more service accounts, more API keys with standing permissions, and more lateral movement paths that were opened intentionally and never audited for adversarial reuse.

The organizations best positioned to manage this exposure treat AI infrastructure as part of the threat model: narrow credential scoping, aggressive secret rotation, service account audits, and agent activity logging granular enough to surface multi-step behavioral chains.

The Calculus Has Changed

The Sysdig incident will be remembered as a threshold. It will become more common. SOC teams that recognize their current reporting workflows were designed for a slower adversary — and act on that recognition — are the ones building defenses for the threat landscape that is actually arriving.

If your current incident reporting workflow was built for a slower adversary, it's worth examining what it would take to change that. Book a demo with Indago to walk through what structured, behavioral-chain reporting looks like in practice — and whether it fits how your SOC actually operates.

sysdigcyberattackllmslarge language modelsautonomousaiai intelligenceai threat intelligenceai toolsoc

Indago Team

What the First LLM-Driven Intrusion Means for SOC Reporting Workflows

Why This Is Different From Every Attack Before It

Rethinking the Reporting Pipeline

The Corporate AI Exposure Problem

The Calculus Has Changed

How Pharmaceutical Companies Are Using Intelligence Reporting to Track Supply Chain Risk