Build the orchestration layer that lasts when GPT-7 ships.

Foundation-model labs ship a new SOTA every quarter. We don't compete on the 2% of value that lives in the model. We build the 98% that lives in agentic orchestration: the system that turns "scan this app" into a full pentest report with chained exploits, audit-ready evidence, and a PDF a DORA-regulated CISO can hand to their auditor. That orchestration layer is Émile. It is the moat. It is what we hire for.

Apply via Yanis How Émile works ↓

yanis@fleuret.ai

What you'll build

Agentic pentest orchestrator (Émile)

Émile coordinates a roster of specialist agents on every webapp pentest. Your job: extend the agent roster, ship the supervisor's memory layer so it stops re-asking the same questions on long horizons, improve the termination oracle so long runs know when they're done.

Stack: Python, open-weight LLMs on EU sovereign infra, the Coverage Graph data structure, in-house fine-tune pipeline.

Offensive ML benchmark and eval harness

The agent stack is only as good as how you measure it. We run benchmark targets (intentionally vulnerable apps with known findings) to grade every agent version. Your job: industrialize it. Production-grade eval harness, regression detection per release, anti-cheat. Your weekly question becomes "did this commit regress us against the ground-truth set."

Stack: Python, containerized benchmark targets, custom IaC, EU sovereign cloud, relational store for the finding ledger.

Coverage Graph and termination oracle

The Coverage Graph is the data structure that tracks which endpoints, params, and auth contexts have been tested. Without it, agents loop forever and CISOs don't trust the result. Your job: make it the system of record for "is this app done." Includes graph merge across pentests, gap-detection heuristics, exhaustiveness scoring that converts cleanly to an audit PDF.

Stack: TypeScript and Python, graph data store, custom visualizer.

Production infra at scale

Pentests have a human-in-loop today. The gate to scale is closing that loop while growing concurrent capacity as channel partners ramp. Your job (if infra is your wedge): GPU pool management, orchestration queue, observability that catches a stuck agent before the customer notices.

Stack: EU sovereign cloud, Kubernetes, IaC, full observability stack.

This isn't a wrapper. Here's the architecture.

A pentest isn't a single LLM call. It's hundreds of decisions chained over hours: enumerate the attack surface, prioritize endpoints, choose an exploit class, craft a payload, observe the response, decide whether to escalate. We model each decision as an agent's tool-call, and we orchestrate the chain with a supervisor agent that holds the Coverage Graph (the ledger of what's been tested, what's been found, what's left).

Every time a new frontier model ships, our agents get smarter for free. When open-weight catches up to closed-source, we swap models with a config change. The model is the commodity layer. The orchestration plus Coverage Graph plus termination oracle plus audit-PDF generator is the moat. That's what we own. That's where the 98% of customer-perceived value lives.

We're not a foundation-model research lab. If you want to publish on RLHF or fight Anthropic on alignment, this is the wrong job. We're an applied AI engineering team building infrastructure that compounds with the model frontier instead of competing with it. The right candidate finds the orchestration problem more interesting than the model problem.

Open positions

Paris, hybrid 2-3 days remote.

AI Engineer (orchestration)

Extend the Émile agent roster. Build the supervisor's memory layer and the termination oracle.

Likely background: 3+ years applied ML, comfortable with agent loops (LangGraph or your own), Python deep, has shipped production systems on top of foundation models.

Apply (AI Engineer) →yanis@fleuret.ai

Senior Offensive Security Engineer

Own the agent payload library and the human-in-loop review layer.

Likely background: 5+ years pentest or offensive ops, OSCP or equivalent (we don't gate on certs), comfortable writing the agent's logic, not just using it. PASSI helpful, not required.

Apply (Offensive Sec Eng) →yanis@fleuret.ai

Founding Engineer (open scope)

Pick your wedge: infra, Coverage Graph, benchmark harness, or the agent stack itself. Highest-agency seat on the team.

Likely background: 5+ years shipping production systems end-to-end. You'd join the founding team.

Apply (Founding Eng) →yanis@fleuret.ai

Hiring process: R1 with Yanis (30 min, fit and motivation). R2 tech with Augustin and Gabriel (30 min, architecture conversation). R3 take-home or working session (depends on role, 2 to 4 hours, paid). Decision in 1 to 2 weeks. Salary discussed at R1.

← Back to home