Skip to main content
Fleuret raises €3.5M pre-seed

Agentic AI pentesting: how autonomous agents test web apps

Yanis Grigy, CEO4 min read

What "agentic" means here

A scanner runs a fixed list of checks against a target. An agent decides what to do next, based on what it just observed. The difference matters: real intrusions are sequences of reasoning, not signature matches.

An agentic pentest system is a coordinated set of LLM-powered agents, each specialised, that share a model of the target and take turns deciding the next attack step. It is closer in shape to a junior red team led by a senior than to a Nessus scan.

The core architecture

Three layers.

1. The discovery layer. Crawlers, fuzzers, and recon agents map the attack surface: endpoints, parameters, subdomains, authentication flows, third-party libraries, error patterns. Output is a structured representation of the target. We call ours the Coverage Graph: a hierarchical data structure that tracks what has been seen, what has been tested, and what remains unexplored.

2. The reasoning layer. A planning agent reads the Coverage Graph and proposes attack chains. "This endpoint accepts a UUID and returns user data, the auth token comes from a JWT signed with HS256, the session resumes via a refresh token that is not bound to the device. Try a horizontal IDOR plus a refresh-token replay." That is reasoning, not a rule.

3. The execution and validation layer. Specialised agents execute: an injection agent, an auth agent, a logic-flaw agent, an SSRF agent. Each one tries the planned attack, observes the response, refines, and either validates the finding with a working PoC or marks the hypothesis as failed. No PoC, no finding.

A scanner alerts. A pentester proves. Agentic systems prove because validation is part of the loop, not a downstream step.

Why open-weight models matter

Many AI pentest systems wrap a closed-model API. That works in a demo, fails in regulated environments. Three reasons:

  1. Data residency. Sending source code or production traffic samples to a US-hosted model breaks DORA's data-localisation expectations and most NIS2 critical-infrastructure operator policies.
  2. Cost at scale. Closed-model token costs make per-engagement economics impossible at the €3,000 per webapp price point. Open-weight inference on dedicated GPUs runs at €20 to €25 of compute per pentest.
  3. Fine-tuning. A pentest agent gets better when you train on real engagement traces. You cannot do this on a closed API.

We run on open-weight models (gpt-oss-120b, gpt-oss-20b, Kimi K2.5) hosted on Scaleway in France. Sovereign by construction.

What it does well, where it is still developing

Strong in 2026:

  • Web application logic (auth, authorization, IDOR, injection families, business-logic chains on standard CRUD).
  • REST and GraphQL APIs (introspection, broken object-level auth, mass assignment).
  • External infrastructure (subdomain takeover, exposed services, misconfigured TLS).

Still developing:

  • Active Directory and Kerberos-heavy internal network.
  • Social engineering and phishing simulations.
  • Bespoke business logic on multi-actor industrial workflows.

Honest map. Pair the continuous AI layer with one annual human engagement for the third bucket.

What you get out

A pentest report that looks like a senior consultant wrote it: scope, methodology, findings, reproduction steps, screenshots, remediation, retest plan. Plus the things a human cannot ship: structured machine-readable findings, signed evidence, integration with Jira and your compliance platform, weekly cadence at marginal cost.

If you want to see one running on your own surface, request a demo.


Share this postShare on LinkedIn

The Fleuret newsletter

One email a month. Cyber analysis, DORA, NIS2, and what we learn pentesting our customers' apps.

Privacy Settings

This site uses third-party website tracking technologies to provide and continually improve our services, and to display information according to users' interests. I agree and may revoke or change my consent at any time with effect for the future.