Tuesday, April 14, 2026

Autonomic Security: Stop Waiting for AI to Save Your SOC

 

Autonomic Security: Stop Waiting for AI to Save Your SOC

The Adversarial Podcast's RSA episode is worth your time — CISOs talking candidly about autonomic security and where the industry needs to go. It got me thinking: if our CISOs are ready to have this conversation, how do we get our SOCs ready to meet the challenge?

This is my take on that question.


What autonomic security actually means

Your autonomic nervous system keeps you alive without asking permission. Heart rate, immune response, reflexes — they don't wait for a conscious decision. They execute on signal.

Autonomic security is the same idea: security responses that execute on policy without requiring a human decision at the moment they fire. This is not the same as autonomous security — AI making novel judgment calls in novel situations. Autonomic security executes well-defined responses to well-understood conditions. The question isn't whether the AI is ready. It's whether your organization is structured to act.

This is not a futures concept. It's a description of what your team should already be doing with tools they already have.


We've been doing this for 40 years

Antivirus has been quarantining files without asking since 1987. Intrusion prevention systems have been dropping packets inline since the mid-2000s. EDR kills processes autonomously — nobody files a change request first. Okta and Entra block or step up authentication on probabilistic risk signals right now, with no human in that loop.

All of these act on non-deterministic intelligence. We accepted that tradeoff at the endpoint and perimeter because the blast radius was bounded and the action space was understood.

The SOC has never been granted the same latitude. That's a leadership decision, not a technology constraint.

There's a good version of this and a bad version. Good security is responsive and proportionate — it acts fast and it knows enough about your users and assets to act intelligently. Bad security builds walls that valid business functions quietly route around. The difference is context, not capability.


A throwback: we already answered the false positive objection

When IPS was displanting IDS in the mid-2000s, the resistance was identical to what you hear today against agentic containment: what if it blocks something legitimate?

Here's the counterintuitive finding from that era: tolerance for IPS false positives was higher than tolerance for IDS false positives — not lower.

An IDS false positive means an analyst spent time triaging an alert on traffic that was benign. Wasted time, and the attack may still be running. An IPS false positive means a legitimate connection got blocked. Disruptive, but cheap: restore access, done, move on.

Which type of false positive is cheaper? The math favors action.

We settled this question twenty years ago. IPS became standard NGFW furniture within a decade. We are re-litigating a closed argument.


What's actually in the way

We never stopped treating every case as a snowflake

SOAR promised automation. Most implementations delivered expensive alert routing with a human checkpoint on every node. The technology wasn't the problem — the culture was. Playbooks require repetition to build confidence, and teams never let them run enough to develop that confidence. Every alert got individualized treatment not because it warranted it, but because nobody trusted the playbook.

At triage the action space feels unbounded and most options are undertested. So the default becomes: document and escalate. The playbook exists. Nobody trusts it. SOAR ends up automating ticket creation and nothing else.

This is a rehearsal problem and a permission problem, not a technology problem.

We fear what we don't know about our own environment

We don't fear the containment action. We fear the unknown downstream effect. Disabling a service account is trivially reversible. What isn't reversible is payroll failing at 2am because nobody mapped that account to a scheduled job.

The standard response is to go build a perfect CMDB before you automate anything. That project never finishes. Here's a better model: flip the default.

The SOC can and will touch anything not on the protected list. Business owners of critical applications get a defined window to communicate their special containment requirements. If they don't, they get the vanilla containment policy. The burden of knowing what's sensitive shifts to the people who actually know — the application owners — and off the security team trying to reverse-engineer dependencies from the outside.

This reframes the governance dynamic entirely. The SOC is no longer on the back foot trying to prove it's safe to act. The default is action. The business earns an exception by doing the work. If payroll breaks because finance never registered that service account as critical, that's not a security failure — the SOC acted on policy. The business opted into the default by not opting out.

The "tipping our hand" excuse is real but misapplied

There are situations where premature containment burns an investigation — targeted intrusions, sophisticated adversaries, high-value environments where watching the attacker move tells you something you need to know. That's legitimate tradecraft.

It describes a fraction of what most SOCs actually handle. Commodity attackers don't care what you know. They care how fast you move. Giving them rope costs more than the intelligence value of observation in the overwhelming majority of cases. We've cited a niche justification as general policy.

The accountability asymmetry

A containment action that causes a brief outage generates a ticket, a postmortem, and an angry VP. A failure to contain that enables lateral movement and exfiltration generates an incident report in which the analyst made a reasonable judgment call under uncertainty.

Your team has read this incentive structure correctly and behaved accordingly. Of course they're conservative. The career risk is asymmetric. This is a leadership problem — you set the incentive structure, and right now it rewards inaction.

The organizational boundary problem

Security operations often sits outside the production support chain of command. No change management authority, no on-call ownership. Reversible containment is technically low risk but organizationally it creates an IT incident that someone else has to resolve.

There's history here too. For a decade the SOC has been the first call on every IT outage and every "are we breached?" bridge call, spending hours proving a negative before anyone looks at the actual root cause. Of course teams are reluctant to build a direct line between their automation and the IT ticket queue. That's not obstruction — it's learned self-preservation.

SOC teams have grown comfortable distancing themselves from users and the production environment. Agentic tools won't fix this. They'll expose it.


The embarrassingly simple example

You don't need a new platform to start. A Claude agent, a SIEM query, a firewall API, and a written policy is enough to build the muscle.

claude -p "IP {ip} is triggering a brute force alert. 
Query the SIEM for legitimate traffic from this IP 
in the last 24 hours. If none exists, block it at 
the perimeter firewall and log your reasoning. 
If legitimate traffic exists, escalate for analyst review."

The agent checks context before it acts. It queries the SIEM, assesses whether the IP has any legitimate traffic history, makes a judgment, executes the block if appropriate, and logs its reasoning. Observe, assess, act.

This is not impressive. That's the point. The value isn't the technology — it's that your team just defined a policy, encoded a judgment into it, and executed a containment action without a committee meeting.

Start with the lowest blast radius, most reversible action you have. Run it. Build from there. The first autonomic action is the hardest one. Make it trivial on purpose.


The release valve your SOC needs

When automation blocks something, two questions hit the bridge call immediately: did security cause this outage? and are we breached? Your automation needs to answer both before anyone asks.

The block needs a paper trail that is instant, readable, and complete: what was detected, what was assessed, what action was taken, and why. Not a ticket — a log entry that surfaces immediately to whoever needs it.

The unblock path matters as much as the block. A SOC that responds to threat alerts in minutes may take days to respond to an IT operational ticket — those are different queues, different teams, different SLAs. If the release path for a blocked IP or disabled account routes through the helpdesk queue, the business remembers the slow unblock and forgets the fast detection.

Build the release valve into the automation itself. The SOC isn't hiding behind the agent — they're signing their name to it. A containment your team can't explain or can't reverse in 60 seconds isn't ready to be automated yet. One that comes with a full audit trail and an instant release path changes the SOC's relationship with the business: from the team that might have caused the outage to the team that prevented the breach and can prove it.


What you need to do

If you lead the SOC: You already have more authority than you're using. The blocker isn't the technology or the CISO — it's the absence of a written policy that tells your team what they're allowed to do autonomously and at what threshold. Write that policy. Run the playbooks. Own the production impact when something goes wrong, and expect to be supported when you do.

If you're the CISO: Define which containment actions your SOC can execute autonomously, at what confidence level, against what asset classes. Put it in writing and sign it. Then change the accountability structure — make inaction as reviewable as action. Your team will not act boldly if the only career risk is acting.

Both of you: establish the protected list and set a deadline for business owners to register their critical systems. Give them a reasonable window and a clear format. Whatever isn't on the list by the deadline gets the default containment policy. Run the process once, maintain it on a cadence, and stop waiting for a perfect CMDB that will never exist.

Demand better primitives from your vendors too. Action confidence scores, blast radius estimates, rollback APIs. Detection without actionability is a dashboard. If your vendors aren't exposing these, ask why.


What it looks like when it's working

The SOC defines policy and audits outcomes. It does not approve every action. Reversible, low-blast-radius containment executes automatically against clear signal. High-blast-radius or irreversible actions escalate with a recommendation attached, not a blank slate.

The team's job shifts from triage queue management to policy refinement and exception handling. The automation signs its name to every action it takes. The business has a release path that doesn't require working a ticket.

Your security program starts to behave like an immune system: fast, proportionate, and mostly invisible to the people it protects.

The AI isn't what's in the way. You are. That's actually good news — you can fix it.

No comments:

Post a Comment