High-Risk Guidelines Are Not Built for Agents
New guidance. New deadline. Same blind spot.
Executive Summary
Two governance artefacts landed in May. The Commission published 148 pages of draft classification guidelines on May 19 — the first detailed interpretation of how to determine whether an AI system is high-risk. The Digital Omnibus political agreement on May 7 pushed high-risk enforcement from August 2026 to December 2027.
The compliance market read them as the starting signal. Guidance plus runway equals a clear path to compliance.
Both artefacts operationalize the same architecture: classify the system before deployment, document its intended purpose, assess whether it falls within one of eight regulated domains, and certify against that determination. For deterministic AI systems with stable functionality, this architecture works.
For agentic AI, the architecture does not reach. The high-risk framework assumes the system you classify today is the system running tomorrow. An agent determines its own functional purpose at runtime — through tool selection, action chaining, and workflow composition nobody anticipated at assessment time. The classification you file describes a point in time. The agent has already moved.
The guidelines tell you how to classify. The Omnibus tells you when. Neither addresses the system that reclassifies itself every time it runs.
The Comfortable Lie
Here is what the market wants to believe:
We have the methodology and we have the timeline. Now we build.
This belief persists because the two artefacts are individually defensible. The guidelines are careful interpretive work — technically anchored, properly restrained in ambition. The delay addresses a genuine readiness gap. Neither document is flawed on its own terms.
The blind spot is in what both share — the assumption that high-risk classification is a pre-deployment exercise performed once and revisited at the next audit cycle. For a medical device component or a credit scoring engine, that assumption holds. The system performs the same function on Tuesday that it performed on Monday.
Agents do not perform the same function on Tuesday. They compose a function at runtime based on which tools they select, which data they retrieve, and which action chains they assemble. The high-risk framework was designed for systems whose behavior is bounded by architecture. Agents are architecturally designed to exceed those bounds. The framework governs what stands still. The fastest-growing class of AI systems in enterprise deployment does not.
The Category That Doesn’t Reach
The classification guidelines define two paths to high-risk. Path 1: AI systems embedded in regulated products under Annex I. Path 2: standalone AI systems whose intended purpose falls within one of eight Annex III domains — biometrics, critical infrastructure, education, employment, essential services, law enforcement, migration, administration of justice.
The Article 6(3) filter allows a system to exit high-risk if it meets one of four conditions: it performs a narrow procedural task, it improves a previously completed human activity without changing the outcome, it detects decision-making patterns without replacing human assessment, or it performs a preparatory task.
Each condition assumes the system’s functional behavior is knowable at assessment time and stable afterward.
An agent with access to a case management system and a policy database starts by filing documents into fixed folders. Narrow procedural task — the filter applies. Six weeks later, the same agent begins cross-referencing applicant records against policy criteria and flagging inconsistencies for case officers. The case officer’s attention is now shaped by what the agent surfaced and what it didn’t. The narrow procedural task became a material influence on a decision affecting a natural person’s legal status.
Monday’s behavior qualifies for the filter. Wednesday’s behavior is high-risk. Same agent. Same deployment. Same tool access. The classification didn’t change. The agent did.
The filter conditions draw bright lines. Agents cross bright lines as a normal operating condition. The guidelines provide no mechanism for detecting when a filter exemption is functionally invalidated by the system’s own runtime behavior — because the classification architecture they operationalize was not designed for systems that change what they do after you assess what they do.
The exemption you documented is your evidence that you knew the domain was in scope. It is not your defense when the agent crosses into it autonomously.
The Delay That Widens the Gap
The Omnibus pushed high-risk enforcement 16 months. The market read it as preparation time.
Every month between now and December 2027 is another month of agentic deployment without classification infrastructure. Agents shipping today will carry 19 months of operational history by enforcement — 19 months of workflow compositions nobody documented, domain crossings nobody detected, functional purpose drift nobody tracked.
The delay was designed for a compliance architecture that treats classification as a preparation exercise. Document the system, complete the assessment, certify before the deadline. For deterministic systems, the delay closes the readiness gap.
For agents, it widens it. The classification you perform in mid-2027 will describe the agent as it behaves at that moment. It will not capture the 19 months of operational history behind it. The regulator will ask about the lifecycle. The documentation will answer only the snapshot.
72%of organizations cannot trace agent activities across environments. 16% are confident they could pass a compliance audit on agent activity. Those are CSA’s findings from January 2026. These are the organizations that now have 16 additional months to deploy agents into production without the infrastructure to track what those agents actually do.
The Convergence Point
The security team monitors agent behavior. They see the agent query an HR database with authorized access, cross-reference employee records against an external benchmark, and generate a performance ranking nobody requested. The SOC logs the anomaly.
The compliance team works through the classification guidelines. The system is classified as a productivity tool. The filter exemption is documented. Classification complete.
Same agent. Same day. The security team detected the moment the classification expired. The compliance team does not know the exemption was invalidated by runtime behavior already flagged in a ticket three floors down.
The guidelines assume classification is a pre-deployment exercise performed by compliance. They do not address the operational reality that classification validity depends on continuous behavioral monitoring — which is a security function, not a compliance function. The Omnibus gives both teams more time to not talk to each other.
The organizations that survive enforcement will be the ones that connected the security detection infrastructure to the classification obligation before the regulator connected them in an investigation.
Conclusion
The guidelines are defensible. The delay is pragmatic. Both were built for systems whose behavior can be described before they operate.
The organizations that treat these two artefacts as the complete governance answer will arrive at enforcement with a classification that matches nothing the regulator can observe.
The organizations that build classification as a continuous function — runtime detection of the moment an agent’s behavior exits the assessed space, connected to the team that holds the regulatory obligation — will arrive with the one thing no guideline and no timeline extension can substitute.
A high-risk determination that survived contact with the system it describes.
This analysis references the European Commission’s draft guidelines on the classification of high-risk AI systems under Article 6 (published May 19, 2026, consultation open until June 23, 2026) and the Digital Omnibus political agreement of May 7, 2026. CSA data from Securing Autonomous AI Agents, January 2026. Nothing in this article constitutes legal advice or compliance certification.



Violeta, nice Article, but what you are describing is a failure in people's understanding of the law, not a failure of the law or classification guidelines.
The EC will be issuing guidance on substantial modification at some point which will no doubt make similar points to you.