Thoughts on Aardvark and Why We Need a Second Loop
AI is quickly entering vulnerability detection. OpenAI’s new Aardvark — an “agentic security researcher” built on GPT-5 — is the latest example. This autonomous AI agent analyzes source code repositories to identify vulnerabilities, prioritize severity, and help fix them.
It’s an interesting approach — an LLM that reads repositories, builds a threat model, tries to reproduce findings in a sandbox, and even proposes code fixes.
I discussed this idea with ChatGPT earlier this week, trying to unpack what it means for how we think about vulnerability detection.
First of all, the way I see it, Aardvark is clearly built for developers.
It’s part of the engineering workflow: review the code, identify the issue, write a unit test, patch, and push. It can even understand some runtime context by reading Dockerfiles, Helm charts, or Terraform, spinning up a local container to run a repro.
But even then, it still operates in a closed universe — the one defined by the repository.
It assumes the environment described in code is the real environment but it doesn’t actually test or deploy the app.
That’s the gap: what the repository describes vs. what actually runs in production.
And in reality, deployed environments behave in ways that only an outside-in system will see:
- An exposed service or endpoint that wasn’t supposed to be reachable, but is responding from the internet due to routing, firewall rules, or a proxy misalignment.
- A live authentication or session-logic flaw (IDOR, missing header checks, cookie misconfig) that only appears in the deployed stack, not in the code snapshot.
- A cross-origin or header-based weakness (CORS, CSP, cache-poisoning vectors) that only becomes visible in the real HTTP/TLS response chain — after the CDN, WAF, load balancer, and edge rules have done their work.
- A cloud or infrastructure behavior that leaks externally (open ports, public buckets, metadata exposures, externalized IAM mistakes), regardless of what the Terraform repo claims the environment “should” look like.
- A multi-step path an attacker could take — SSRF pivoting to internal metadata, or combining a low-impact injection on one asset with exposed credentials on another — that only becomes visible when viewing the entire external surface as a graph instead of isolated findings.
These exposures don’t exist in code. But they’re the things attackers actually see and exploit.
That’s where external scanners and CTEM platforms live — outside the code, scanning the live perimeter, validating what’s exploitable in the wild, and mapping what’s really exposed.
Why We Need Two Loops
The way I see it, we need two distinct but connected loops:
Loop #1: The Defender Loop (Inside the Repo)
The domain of Aardvark and other code scanning and ASPM tools.
It catches vulnerable code before deployment.
Repo ingestion → threat model → unit test → patch → merge.
Loop #2: The Attacker Loop (Outside in the Wild)
The domain of external scanners.
External discovery → scan → exploit validation → re-test after each fix.
The second loop matters because even if the code is “secure,” the deployed system might not be. You don’t really know until you attack it.
How I Think About Vulnerability Research
If I had to describe how vulnerability detection should work, it would look something like this:
- The LLM or static tool finds and fixes the issue.
- CI runs local repro and unit tests.
- The fix is deployed to staging or prod-like.
- External scanning (ASM/DAST) runs automatically — DNS, ports, headers, TLS, cloud drift.
- Only when both tests (internal + external) pass is the issue marked as resolved.
That’s how you close the loop between what the developer thinks is fixed and what the attacker can still reach.
AI Needs a Second Loop to Reflect Reality
AI-powered code analysis is advancing quickly, but it still operates on intent — on what the repository says should exist. To truly understand risk, we need the second loop: continuous monitoring of what actually runs in production.
The future of AI in security lies in combining both views. One loop prevents vulnerable code from being written. The other ensures nothing exploitable is exposed after deployment.
When those two come together, that’s when AI will actually start telling us the truth about our security posture — not just the truth as written in code.
That, to me, is where this space is heading: AI that reflects reality, not just intent.

