The AI Decision Your Board Got Half Right
What happens when the AI investment that impressed the board meets the regulator who doesn't care about your ROI.
This piece has two authors because the problem it describes sits in a gap between two worlds that rarely talk to each other.
Neha Kabra has spent eighteen years inside the rooms where AI deployment decisions get made — at McKinsey and Standard Chartered, working with boards, CXOs, and PE operating partners on the tradeoffs that determine whether AI programs create value or stall. She writes from inside the business architecture.
Violeta Klein, CISSP, CEFA writes from the other side of the table. Where the governance the board approved meets the regulator who wasn’t in the room. Where the documentation that satisfied the risk committee fails the enforcement examination it was never designed for.
We kept running into the same pattern from opposite directions. CXOs building governance that passes the board but wouldn’t survive a regulator. Compliance teams building frameworks that satisfy the regulation but have no connection to how the business actually makes decisions. Two architectures. Same organization. No shared vocabulary.
Neither of us could write this piece alone.
In this article
The Decision That Starts It All — How a board approves an AI credit decisioning deployment — and what it actually creates
What the Regulator Sees — The same deployment examined from the enforcement side
The Five Governance Tradeoffs — Five decisions every board makes that create regulatory exposure
The Ownership Gap — Four functions, four partial views, no single owner
What a Defensible Architecture Looks Like — Meeting the standard and building something that works
1. The Decision That Starts It All
Neha Kabra
Credit decisioning is where a retail bank makes money and where it creates liability. When AI runs that process at scale, the board isn’t approving a technology investment. It’s approving a system that will make millions of decisions about people’s access to credit — with the bank’s operating license sitting behind every one of them.
“The board sees a business case. A market opportunity. A cost reduction. A competitive necessity. They approve it the way they approve most technology investments — on the strength of the return. The liability lasts the life of the system.”
What AI actually changes
Analytics-led credit decisioning has been standard practice in retail banking for over a decade. Scorecards, risk models, digital intake — the infrastructure existed long before AI entered the conversation. What AI changes is not the data. It’s who handles the steps between the data.
Traditional decisioning moved information between humans. AI moves decisions between systems — with humans inserted at specific points rather than present throughout. That distinction is where the exposure sits.
The six steps — and where AI changes the equation
The step that matters most — and why it’s the hardest
Step 3 is where the exposure concentrates. AI synthesizes data, summarizes financials, and produces a credit memo faster than any analyst. The problem is that AI is probabilistic. Run the same case twice and you get two different memos. The output always looks authoritative. It is not always accurate.
The human approver at Step 4 relies on that memo to make a final decision. If something is wrong and it looks right, the override function exists on paper but not in practice. The glossiness of AI output is not a feature in a credit context. It is a risk that has to be designed around before deployment — not discovered at enforcement.
The question the board didn’t ask
The board approved the system. They approved the business case, the risk framework, and the governance structure that sat behind it. What they didn’t approve — because nobody asked — is whether any of that would satisfy the person who wasn’t in the room.
Its recommendations shift as the underlying patterns shift. If the system’s behavior drifts enough that it functionally changes what it does — scoring applicants on criteria that were never assessed, weighting factors that were never documented — that drift constitutes a potential substantial modification under the regulation.
A substantial modification triggers a new conformity assessment.
The bank has no mechanism to detect it. No process to flag it. No owner responsible for watching.
“The governance the board approved was designed to launch the system. It was never designed to follow it.”
2. What the Regulator Sees
Violeta Klein, CISSP, CEFA
The same deployment. Different table.
The regulator does not ask whether the investment was sound. They do not ask about the competitive landscape, the board’s risk appetite, or the deployment timeline. They ask whether the system is safe to operate — and whether the organization can prove it.
A retail bank deploys an AI credit decisioning system. The system evaluates loan applications, scores applicants, and generates recommendations that relationship managers act on. The business case was compelling. The board approved it. The regulator arrives and opens a different file.
Show us the intended purpose documentation.
The regulation requires a description of what the system does, what it was designed to do, and the boundaries of its operation — documented before deployment. The bank’s documentation describes “an AI-assisted credit assessment tool.”
The regulator asks what “assisted” means operationally. Does the system recommend, or does it decide? Does the relationship manager review every output, or only exceptions? If the system scores an applicant and the manager follows the score 95% of the time, the system is not assisting. It is directing.
The documentation says one thing. The operational reality says another.
Show us the risk assessment.
A system that evaluates the creditworthiness of natural persons operates in high-risk territory under the EU AI Act. That classification is not discretionary — it follows from the system’s function in a regulated domain. The regulation requires a risk management system that identifies known and reasonably foreseeable risks, conducted before deployment and maintained throughout the lifecycle.
The bank’s risk assessment was conducted once, by the model risk team, using the framework they use for all models. It evaluated accuracy, bias, and performance metrics. It did not evaluate whether the system’s outputs materially influence decisions about people’s access to financial services — which is the question the regulation actually asks.
The assessment answered the bank’s question. It did not answer the regulator’s.
Show us the conformity assessment.
High-risk AI systems require a conformity assessment before placement on the market. For credit decisioning systems, this is a self-assessment — but self-assessment does not mean self-certification. It means the organization must document that it has verified, against specific regulatory requirements, that the system meets the standard.
Model validation asks one question: does the model perform as intended? Conformity assessment asks a fundamentally different set of questions: does the system comply with the regulation’s requirements for risk management, data governance, transparency, human oversight, accuracy, robustness, and cybersecurity?
One question. Seven requirements. Model validation answers the first. The regulation asks all of them.
Show us the human oversight mechanism.
The regulation requires human oversight by designated, competent personnel with the authority and ability to intervene. The bank’s process assigns oversight to the relationship managers who use the system daily.
The relationship managers were not trained on the system’s limitations. They were not given the authority to override the system’s recommendations without escalation. They do not have visibility into why the system scored an applicant the way it did.
Oversight exists on the organizational chart. It does not exist in practice.
“The regulator does not audit the chart. They audit the practice.”
Show us how you would know if the system changed.
This is the question that separates governance built for the boardroom from governance that survives enforcement.
The credit decisioning system was assessed at deployment. The documentation describes the system as it was designed. But the system in production is not static. Its recommendations shift as the underlying patterns shift. If the system’s behavior drifts enough that it functionally changes what it does — scoring applicants on criteria that were never assessed, weighting factors that were never documented — that drift constitutes a potential substantial modification under the regulation.
A substantial modification triggers a new conformity assessment.
The bank has no mechanism to detect it. No process to flag it. No owner responsible for watching.
“The governance the board approved was designed to launch the system. It was never designed to follow it.”
3. The Five Governance Tradeoffs That Create Regulatory Exposure
Five governance decisions every board makes on AI deployment. Each one creates a regulatory exposure the board never priced in.
3a. Cost of Governance
Violeta Klein, CISSP, CEFA
The regulation does not price governance as an optional investment. It prices it as a legal minimum.
A high-risk AI system under the EU AI Act requires, before deployment: a risk management system maintained throughout the lifecycle. Technical documentation describing the system’s purpose, capabilities, limitations, and performance. A data governance framework ensuring training data meets quality criteria. Human oversight by designated, competent personnel. Accuracy, robustness, and cybersecurity measures maintained continuously. Post-market monitoring. Automatic logging of system operations.
That is not a governance framework layered on top of a deployment. It is a parallel infrastructure that must exist before the system goes live. The cost of not building it is statutory: up to €15 million or 3% of global annual turnover.
The gap between what the board funded and what the regulation requires is where the exposure sits.
Neha Kabra
Governance operates across three lines — and the cost of getting each one wrong is different.
The first line is the relationship manager using the workbench to submit a credit proposal — the point where AI output meets human judgement for the first time. If the RM can’t challenge what the system produces, the first line isn’t functioning. It’s performing.
The second line is the risk officer reviewing what the first line produces. Their effectiveness depends entirely on the quality of what comes from below. Reviewing AI-generated credit memos without visibility into how they were produced is signing off on a process you can’t actually see.
The third line is the board and the technology stack behind both — the systematic auditability that makes the first two lines defensible when the regulator arrives.
Perfect governance across all three lines isn’t achievable in the near term. The CXO’s decision isn’t perfect versus imperfect. It’s where to strengthen first — and how to build a trajectory the regulator can follow.
Governance cost isn’t a line item problem. It’s a lines of defense problem. Fund the first line properly, and the rest of the architecture has something real to build on.
3b. Reputational Risk
Violeta Klein, CISSP, CEFA
Boards frame AI risk as reputational exposure. The governance model gets built around a single question: how bad could this look?
The regulation asks a different question entirely. It prices non-compliance, not reputation. The penalty framework operates independently of whether anyone noticed, whether the media reported it, whether customers complained. A system operating in a high-risk domain without conformity assessment is non-compliant whether or not it produces a headline.
The more dangerous inversion: a system that works flawlessly and generates no complaints can still be operating illegally. If nobody assessed whether a credit decisioning system constitutes a high-risk AI system under the regulation, the system’s operational success is irrelevant to its compliance status.
The board optimized for the wrong risk. The reputational risk they priced was the one where the system fails publicly. The regulatory risk they missed was the one where the system works perfectly — in a domain nobody classified.
Neha Kabra
Reputational risk lands on the business — regardless of where the governance failure originated.
Reputational risk in financial services almost always gets reported at the business unit level — not the risk function level. When a mortgage portfolio deteriorates, the headline reads “Bank X’s retail lending division posts $200 million loss.” Not “Bank X’s risk committee missed a model assumption.”
The 2008 subprime crisis made this pattern permanent. Countrywide’s mortgage business became synonymous with the crisis. Bank of America’s mortgage unit absorbed tens of billions in losses in the years following its acquisition. The business took the hit — publicly, permanently, and regardless of where the governance failure actually originated inside the organization.
For the head of retail banking deploying AI in credit decisioning today, that history is directly relevant. The reputational exposure doesn’t sit with the function that approved the governance framework. It sits with the function whose name is on the product.
The investment calculus changes when you own the consequences.
This changes the investment calculus. The question isn’t whether governance is the business leader’s responsibility — formally, it often isn’t. The question is whether the business leader is prepared to own the consequences when something goes wrong at scale. Because the consequences will land on the business, not on the framework.
The practical trade-off isn’t perfect governance versus no governance. It’s building a defensible first line now — even imperfectly — versus carrying undisclosed exposure on a book that, if it deteriorates, will be reported under your division’s name.
“Invest in governance because it protects the business. Not just because the regulation requires it.”
3c. Model Risk Management
Violeta Klein, CISSP, CEFA
Financial services organizations have decades of model risk management infrastructure. Validation frameworks. Backtesting. Governance committees. The assumption is natural: existing model risk frameworks cover AI.
They cover the model. They do not cover the system.
Model risk management asks whether the model performs as intended. The EU AI Act asks whether the system — not the model, the system — makes decisions about people in a way that requires regulatory oversight. A credit scoring model that passes validation can still be operating as an unregistered high-risk AI system if nobody assessed whether it falls within the regulation’s scope, nobody documented the intended purpose at the system level, and nobody built the human oversight mechanism the regulation requires.
Model validation and regulatory conformity assessment are not the same process, do not ask the same questions, and do not produce the same evidence.
Neha Kabra
Tiered model routing: governance architecture, not just cost management.
A well-governed financial services organization deploying AI in credit decisioning doesn’t run every case through the same model. It builds a tiered routing architecture — simpler, lower-risk applications processed through faster, less expensive models; complex cases escalated to more capable ones; the highest-sensitivity decisions, perhaps the top five percent, routed to the most capable model available. This isn’t cost optimization alone. It’s a documented decision logic for which model touches which decision — an auditable governance architecture, not just a deployment choice.
Model checks model: building quality control into the process.
The second layer is quality control. Because AI is probabilistic — the same case can produce a different credit memo on every run — a well-governed deployment uses one model to check another model’s output before it reaches the human approver. This doesn’t eliminate variance. It creates a systematic check on it before the output influences a consequential decision.
Human in the loop: the trigger has to be jointly owned.
The third layer is the human in the loop. But here is where most deployments get it wrong. The trigger that routes a case to human review — the definition of what constitutes an edge case — is typically set by the model risk team in isolation, using technical parameters that have no connection to what a genuinely complex credit decision looks like in practice.
The business knows what a hard case looks like. The risk function knows where the model’s failure modes sit. Neither can define the human review trigger alone without creating a gap. A jointly owned definition — business and risk sitting in the same room, calibrating the trigger against real operational context — is what makes the human oversight mechanism function in practice rather than just on paper.
The governance architecture is only as strong as the collaboration that defined its boundaries.
3d. Scaling Governance for Repeatable Use Cases
Violeta Klein, CISSP, CEFA
One governance template. Standardized risk assessments. Reusable compliance artifacts. Governance-as-a-platform. The CFO’s dream.
The regulation requires a risk assessment for each system, not each template. Scaling governance through standardization creates documentation that describes a generic system. The regulator examines the specific system — the specific data it processes, the specific decisions it influences, the specific population it affects, the specific operational context it operates in.
A templated assessment that says “this system processes personal data in a financial services context” does not satisfy a regulation that asks “what specific risks does this specific system pose to the specific natural persons whose creditworthiness it evaluates?”
Templated governance looks efficient internally. It looks like a gap externally.
The insurance parallel makes the pattern visible. A life and health insurance risk assessment system deployed across multiple product lines with the same governance template faces the same structural problem. Each product line affects a different population, processes different health data, and operates under different actuarial assumptions. One template cannot document risks it was never designed to differentiate.
Neha Kabra
Templates are not the problem — missing context is.
Templates have always existed in financial services. The relationship manager workbench, the credit assessment process, the underwriting checklist — standardized steps and documented workflows are not new. They are not the problem. In an AI-first world, the differentiator is not whether you have a template. It is whether the AI system has a deep enough understanding of business intent and context to differentiate between cases that look identical on the surface but require completely different responses.
The contact center test: same classification, different intent.
A contact center handling bill payment calls illustrates the point. Every call carries the same surface classification — bill payment. But the intent behind each call is different. A customer calling because their limit is exceeded needs a different resolution path than one calling because digital authentication failed, or because they need to register a new payee, or because they want to query a charge. If the AI system routes every call on the surface classification, it is technically functioning and operationally failing. The wrong solution gets deployed at scale — consistently, repeatedly, and invisibly.
In credit decisioning, outcome without context is not a decision.
Extrapolate that to credit decisioning. A credit application can be declined for twenty different reasons. The right next step — refer to a specialist, request additional documentation, trigger a manual review, offer an alternative product — depends on which reason applies. An AI system that classifies the outcome without understanding the causal context will produce a decision. It will not produce the right decision.
Building a repeatable AI capability requires investing in the context layer first — the shared, aligned understanding of business logic that allows the system to differentiate at the intent level, not just the classification level. That understanding cannot be templated. It has to be built, validated, and documented specifically for each deployment context.
A template governs the process. Only context governs the outcome.
3e. ROI vs. Governance Tradeoffs
Violeta Klein, CISSP, CEFA
The board-level tension is structural: governance slows deployment, deployment drives returns. The pressure to defer governance to “phase two” is real, rational, and dangerous.
“There is no phase two.”
Regulatory obligations attach the moment the system is deployed. A high-risk AI system placed on the EU market without conformity assessment is non-compliant from day one — not from the day the organization gets around to completing the assessment. Deferring governance does not defer liability. It creates undocumented liability that compounds with every day the system operates.
The business case priced the upside. Nobody told the board the downside had a statutory price tag.
Neha Kabra
Governance gets deferred because it arrives too late in the wrong room.
The reason governance gets deferred is structural, not intentional. The business case is built by the team that owns the ROI. Governance cost sits in a different function, with a different budget and a different reporting line. By the time the governance estimate arrives, the business case is already approved and the deployment already sequenced. Governance becomes a retrofit.
“Retrofits are always more expensive than builds.”
The practical fix is uncomfortable: governance has to be in the room when the business case is being written. Not reviewed after. Not added as a phase two workstream. A line in the original investment case, with a named owner, before the board sees the number.
AI is new enough that nobody gets it right the first time.
But there is a harder problem underneath the sequencing problem. AI is new enough that nobody gets the governance right the first time. The system will encounter cases it wasn’t designed for. The context layer will need enriching. The model routing thresholds will need recalibrating as model costs and capabilities shift. The intent-level logic will need updating as new case types emerge. The human-in-the-loop trigger will need revalidating as the system’s failure modes become better understood.
That is not a governance failure. That is the nature of the technology.
The regulator doesn’t object to learning. They object to undocumented learning.
The question is what iteration looks like inside a regulatory framework. The regulator doesn’t object to learning and improving. They object to undocumented learning and improving. Every change — the routing logic, the context layer, the decisioning tools, the oversight mechanism — needs to be documented, assessed, approved, and auditable. The governance architecture has to be alive, not static.
This is what changes the investment calculus. The ROI calculation that excludes governance cost is incomplete. But so is the governance budget that funds a one-time build. The real cost is the operating rhythm — the ongoing discipline of reviewing, revising, and revalidating as the system evolves.
Governance isn’t a project. It’s how you run the system.
4. The Ownership Gap
Violeta Klein, CISSP, CEFA
Four functions. Four partial views. No single owner of the question the regulator actually asks.
The CISO understands the security exposure but not the regulatory reporting obligation. The compliance lead understands the obligation but not the system’s technical behavior. The CTO understands the architecture but not the regulatory classification logic. The CRO understands the risk framework but not how the system’s outputs materially influence decisions about people.
The regulation holds the organization liable — not the function.
Until the internal accountability structure matches the external liability structure, governance is a fiction distributed across an org chart that nobody owns.
The serious incident reporting obligation makes this concrete. If the credit decisioning system produces an outcome that constitutes a serious incident — discriminatory lending at scale, a breach of fundamental rights — the reporting clock starts at awareness. The CISO’s incident response playbook does not include a regulatory filing step. The compliance team’s reporting workflow does not start with a SOC alert. The clock runs while three teams debate which process applies.
Neha Kabra
Violeta has described the structural problem precisely. Four functions, four partial views, one organization holding the liability. The instinct is to solve it with a new committee, a new governance workstream, a new reporting line. That instinct produces more paper and less accountability.
The ownership gap doesn’t close through org structure. It closes through a shared definition of what you are trying to achieve — and what failure looks like.
The four questions that close the gap.
Before any governance framework is designed, the business leader and the risk function need to agree on four things.
What does a well-governed AI credit decision look like — not in regulatory language, but in terms the business actually operates by?
What is an unacceptable outcome — not just a compliance breach, but a business failure the organization cannot absorb?
What is the tolerance for AI error at each step of the process, expressed in business consequence terms, not model accuracy metrics?
And how will we monitor, review, and revise as the system evolves — because the governance that is sufficient at deployment will not be sufficient at month eighteen.
The fourth question is the one that makes the other three honest. Without it, you have a static definition that becomes obsolete the moment the system encounters something it wasn’t designed for.
When these four questions have clear, jointly owned answers, the four functions Violeta describes have something to govern toward. The CISO knows which workflows carry the highest consequence. The compliance lead understands the business logic behind the routing decisions. The CRO can assess model risk in business outcome terms. The CTO knows which edge cases carry real operational risk.
The business leader’s job is not to coordinate all four functions. It is to ensure the shared outcome definition exists — and is specific enough that governance has a target, not just a perimeter.
5. What a Defensible Architecture Looks Like
Violeta Klein, CISSP, CEFA
What the regulator needs to see:
A documented classification decision. Not a spreadsheet. Not an internal memo. A formal determination that this system operates in a high-risk domain, with documented reasoning connecting the system’s function to the regulatory category. Who made the determination. What methodology they used. Why the determination is defensible.
A risk assessment that addresses the specific system. The specific data it processes. The specific decisions it influences. The specific population it affects. The specific risks it creates. Updated when the system changes — not annually, not at the next audit cycle, but when the system changes.
Human oversight that exists in practice. Designated personnel. Trained on the system’s limitations. With the authority to override. With visibility into why the system produced the output it produced. Oversight that a regulator can verify through evidence, not through an org chart.
A detection mechanism for behavioral drift. The ability to identify when the system’s operational behavior has departed from the documented intended purpose. And a documented response protocol for what happens when the boundary is crossed — reassessment, not incident response.
A reporting capability. The operational infrastructure to detect a serious incident within the mandatory reporting window and file the required notification. Not a policy document. An operational capability with named owners, tested procedures, and evidence that it works.
Neha Kabra
Violeta has mapped what the regulator needs to see. Meeting that standard and building something that actually works are not the same exercise. Three business elements make the difference.
Context first.
The governance architecture is only as strong as the shared understanding of business intent sitting underneath it. In a credit decisioning deployment, that means the routing logic, the oversight trigger, and the review criteria are all traceable back to a jointly owned definition of what a good decision looks like — and what an unacceptable one looks like. Without that definition, the five requirements Violeta describes become documentation exercises. With it, they become a coherent system.
Prioritize the first line.
With finite governance budget and multiple competing demands, sequencing matters. In credit decisioning, the first line — the relationship manager, the workbench, the point where AI output meets human judgement — gets funded first. If the first line fails, everything downstream fails with it. Start there. Build the audit trail around it. Extend outward as capacity allows.
Build for change, not for launch.
The system approved today will not be the same system running in eighteen months. A defensible architecture has a review rhythm built in — regular, documented, jointly owned by business and risk. Every change to the routing logic, the context layer, or the oversight mechanism is assessed and approved before it goes live. Not because the regulation requires it. Because an ungoverned change to a live credit decisioning system is a liability the business will own.
The credit decisioning system that passes both the board and the regulator is not more complex than the one that passes only the board. It is more deliberate.
Closing
Violeta Klein, CISSP, CEFA
The governance that passes the board and the governance that survives the regulator are not two different programs. They are two tests of the same architecture.
The organizations that build for both will not spend more. They will build something that evolves with the system it governs.
The ones that build for the board alone will discover the gap when someone arrives who wasn’t in the room.
Neha Kabra
The distinctive contribution a business leader makes to governance is demanding a different kind of signal — one that translates model behavior into business consequence early enough to act.
Here is what lands in most governance meetings today on a credit decisioning deployment:
Model validation — pass. Bias testing — no significant deviation. Human override rate — 4.2%. Audit trail — complete. Regulatory status — compliant.
Here is what should land instead:
In the last 30 days, the model routed 23% more cases to the highest-risk tier against a stable application volume. The portfolio is absorbing more complexity than the governance was designed for. RM override rate on AI-generated memos in the £250k–£500k bracket has dropped from 8% to 1.2% over six months — that is not improved confidence, that is eroded judgement. And three case types appearing in the last 60 days — self-employed applicants with irregular income patterns — were not present in the training data. The model is making decisions on cases it was never assessed against.
None of those signals appear in a compliance report. All of them are actionable before the portfolio absorbs the cost.
Regulations exist to protect consumers and facilitate sound business. The governance architecture that serves both goals surfaces signals like these — in business language, at the right moment. Not compliance status after the fact. A live read on whether the system is still doing what the board approved.
That is the shift. Not more governance. Better signal.
Regulatory Disclaimer: This article provides educational analysis of the EU Artificial Intelligence Act (Regulation (EU) 2024/1689) and related governance frameworks. Nothing in this article constitutes legal advice, regulatory interpretation, or compliance certification. Organizations should consult qualified legal counsel specializing in EU AI Act compliance before making classification determinations or deployment decisions.
The views expressed by Neha Kabra are in her personal capacity and do not represent the views of McKinsey & Company or any other institution.











@Violeta Klein, CISSP, CEFA this piece reads fab. We did well!
You mapped the exposure correctly. The system failure sits where AI output becomes a decision. This is not a gap between board and regulator, it is a lack of control at execution. The moment a recommendation turns into action is not governed, it is assumed.
Documentation exists, risk frameworks exist, model validation exists, audit trails exist, oversight is assigned. None of it governs the decision itself. The flow still allows AI to generate output, a human to accept it, and an action to execute without ownership being declared. That is why override rates drop and authoritative output passes through unchecked.
A defensible system enforces ownership as a condition of execution. The transition from recommendation to action is a controlled boundary where the system pauses, the decision is classified, ownership is assigned, intent is confirmed, then execution proceeds. Without that control governance observes, with it governance acts. The signal that matters is the state of the decision at the moment it becomes real.