Closing the Loop in AI Code Fixes

CTEM Was Built for a Different Problem
Continuous Threat Exposure Management gave security teams something they badly needed: a systematic way to move from reactive alert-handling to proactive exposure reduction. Scope your environment. Discover what’s exposed. Prioritize by real risk. Validate that risk. Mobilize a fix.
The framework is sound. But every stage of it was designed with one assumption so obvious nobody wrote it down: a human wrote the remediation.
When a developer fixed a vulnerability in 2019, a few things were true. They understood the code they were changing – or at least had a working mental model of it. A peer reviewed the diff. The review process was the verification layer. It was imperfect, but it existed, and it was independent of the process that found the vulnerability.
That independence mattered more than anyone realized.

What Changes When AI Proposes the Fix

Your CTEM program is working. Findings are being discovered, prioritized, and closed. Your metrics look clean. Your mobilization rate is up.
And somewhere in your codebase right now, there is unverified code that your vulnerability management workflow put there.


Walk through what the current toolchain actually does.
MDASH – Microsoft’s multi-model agentic scanning engine – finds a vulnerability. It doesn’t rely on pattern matching. It reasons about code logic the way an attacker would, catches what traditional static analysis misses, and produces a finding with full context. So far, this is discovery working better than it ever has.
Then Security Copilot or GitHub Copilot Autofix proposes a remediation. It analyzes the vulnerability, understands the surrounding code, and rewrites it. The diff appears in the PR. It looks clean. It addresses the finding. Tests pass.
The developer reviews it. Approves it. Merges it.
Finding closed.
But here’s what didn’t happen: nobody scanned the fix.
The new code that Copilot wrote – the remediation itself – re-entered your codebase as unverified input. It passed a code review, which is a human judgment about whether the logic looks correct. It did not pass the same agentic scanning process that caught the original vulnerability.
You closed one loop. You opened another. Most CTEM programs have no process for the second one.

Why This Is Different From “Review Your Code Reviews”


This is not an argument that code reviews are insufficient, or that developers are careless. Both of those arguments are old and mostly unhelpful.
This is a structural problem with a specific cause.
When MDASH finds a vulnerability in AI-generated code, it is often finding something that exists because the original code contained logic that no human fully modeled. That’s the point – it catches what pattern matching misses precisely because human comprehension isn’t the detection mechanism.
But when Copilot proposes a fix for that vulnerability, the reviewer’s ability to independently verify correctness depends on their comprehension of the original logic. If that logic was generated by AI and never fully understood by a human, the reviewer is evaluating an AI’s answer to a question they never completely understood in the first place.
The diff looks right. The fix is syntactically correct. The tests pass.
None of that answers the question: did the fix introduce something new?

The Recursive Problem


AI-proposed fixes are code. Code has vulnerabilities. The system that found your original vulnerability did not scan the remediation.
This is not a theoretical edge case. It is the default state of every AI-assisted remediation workflow that hasn’t explicitly addressed it. Which, right now, is most of them.
The attack surface for your remediation code is real. A fix that parameterizes an XPath query is easy to reason about – any practitioner can verify it. But as AI-generated code increases as a percentage of your total codebase, and as the vulnerabilities being fixed involve increasingly complex logic, the gap between “syntactically correct fix” and “semantically correct fix” widens.
Your CTEM program is not built to see that gap. It was designed to close findings, not to treat remediations as new inputs requiring their own discovery cycle.

The Question Your CTEM Program Can’t Currently Answer


Go look at your last ten closed findings that were remediated by AI-proposed fixes.
For each one, ask: was the remediation code scanned before it merged?
If the answer is no – or if you’re not sure – your mobilization stage has a gap that your metrics aren’t showing you.
The framework works. The assumption embedded in it doesn’t hold anymore.
Part 2 of this series covers what fix verification actually requires, and why “have a human review the diff” is not a sufficient control when the code complexity exceeds the reviewer’s mental model.

What “Mobilization Complete” Needs to Mean Now
The traditional definition is straightforward: the finding is resolved, the fix is deployed, the ticket is closed.
That definition needs one additional condition: the remediation has been scanned.
Not reviewed. Scanned. By the same class of tooling that found the original vulnerability. Because a human code review and an agentic scan are answering different questions. The review asks whether the fix looks correct. The scan asks whether the fix introduced something new.
You need both answers before you close the loop.
This is achievable today. MDASH can run as a post-fix gate in your pipeline before PR merge. That’s not a significant workflow change – it’s a pipeline step. But it only happens if you build it in deliberately. It will not happen by default.
Most teams haven’t built it in. Most teams don’t know they need to.

Leave a comment