Cookie Consent

I use cookies to understand how my website is used. This data is collected and processed directly by me, not shared with any third parties, and helps us improve our services. See my privacy and cookie policies for more details.

Ethical AI: When the Model Imposes Values Your Organisation Did Not Choose

Washington D.C. | Published in Board | 14 minute read |    
A brass compass set into a boardroom table, its needle pulled away from true north toward a glowing blue device, while a leather folder marked Values, Purpose, Integrity, Respect rests at the table's edge — a visual reframe of an organisation's values deflected off course by a standard built into its AI (Image generated by ChatGPT 5.4)

Somewhere in the organisation an AI model is in production handling a difficult decision. Someone has asked a question with no clean answer: a customer with a complaint, an employee querying a redundancy process, a medical underwriter reviewing a claim. The model decides how to respond, settling what to refuse and what to permit, how much candour the moment can bear, and where the customer’s interest gives way to the policy. It does this in a second or two, and it does it the same way each time, because the judgement was made long before the question arrived. It was not made by the organisation running the model but by the provider that built it, for a global product, before the organisation signed up to use it.

This is not a hypothetical. The 2026 arXiv paper “Alignment Drift in Multimodal LLMs” evaluated eight model releases against a fixed benchmark of 726 adversarial prompts written by 26 specialists whose work is to find where models fail. It found large and persistent differences in how model families handle ethically sensitive questions, and clear drift in that behaviour from one version to the next. In 2025, a major foundation model provider publicly withdrew an update after acknowledging that the model had become excessively agreeable. Every organisation running that model received the change without asking for it, and received its reversal the same way.

An organisation’s code of conduct, values statement, and ethics policy each set out a position it recognises as its own. The model in production is running a different one, and it does not defer to the organisation’s. Neither standard is wrong; they are two defensible positions that pull in different directions. That gap is what the Board has to reconcile, and at most organisations it is being settled by default, with no one having decided it should be.

Where a model’s ethics comes from, and why it cannot be fully read

A model carries a stable set of dispositions into deployment: what it will refuse, how it frames sensitive subjects, how it resolves a question that has legitimate considerations on either side. Those dispositions are its inherited ethics. Whether a model can be said to hold values in any deeper sense is a question this audience can leave to the philosophers. What matters for governance is that, once the model is deployed, the dispositions function as an ethical standard. They decide things, and they decide them as a position the provider chose, not a neutral default. An organisation has always operated within standards set elsewhere, by its auditors, its insurers, the platforms it runs on. What is new is that this inherited standard does not sit at the perimeter setting limits. It exercises judgement, case by case, in the organisation’s name.

The dispositions originate in two places, both upstream of the deploying organisation. The first is pre-training, where the model acquires its priors from the body of material it is trained on. The second is alignment, where the provider shapes the model’s behaviour through reinforcement learning from human feedback, constitutional methods, and a series of explicit policy choices about refusal, tone, framing, and the treatment of contested questions. Those are genuine ethical choices. They are made by the provider, for a global product, without reference to any single organisation that will later run it.

How visible those choices are depends on how far they sit from the observer’s own values. A standard close to yours looks like simple good sense and goes unnoticed; a standard far from yours stands out as a choice. A model trained under one government’s content rules will visibly decline what that government finds awkward, and on contested matters of history or sovereignty will return its preferred account rather than describe the dispute. To a Board in another jurisdiction that position is conspicuous, and very likely objectionable. But it is conspicuous only because it diverges from the Board’s own position. A frontier model built closer to home embeds a position just as surely, through the same alignment choices, and reads as neutral only because it matches the assumptions of the people deploying it. Neutral is not what it is. The objectionable standard is refused at the door; the congenial one is waved through unexamined, and absorbed as the organisation’s own.

The organisation cannot read those choices in full. Stanford’s Foundation Model Transparency Index, in its 2025 edition, scored major providers at an average of roughly 40 out of 100, down from 58 a year earlier. Disclosure is improving in form rather than in substance. The EU AI Act requires providers of general-purpose models to publish documentation, and the accompanying General-Purpose AI Code of Practice gives them a standard Model Documentation Form to complete. A documentation form discloses that a value choice was made. It does not hand the deploying organisation the choice itself, or any veto over it.

Nor does the standard hold still: every new version of a model can shift the dispositions beneath the application layer. The model a Board approved last quarter is not, in any strict sense, the same model running this quarter, and the shift arrives without a fresh approval. The imported standard takes hold at a handful of identifiable moments: when a model is first approved, when a new version is accepted, when a deployment is widened to a more sensitive use, when exception handling is configured, when a failure is quietly absorbed rather than escalated. Each is a point at which that standard could be examined. At most organisations, most of them pass unexamined. Of the Six Board Concerns, this problem engages two directly, Ethical and Legal Responsibility and Risk Management, and it does so at exactly the moments a Board is least likely to be looking.

Why the usual fixes only go so far

The instinctive reply is that the application layer already handles this. Several instruments are available, and each does real work. None does the work being asked of it here.

System prompts and standing instructions shape tone, refusal posture, and framing at the surface. They sit on top of the model’s trained dispositions and steer them. They are an instruction given to the model, not a rewrite of it, and a sufficiently unusual input can still reach the disposition underneath.

Retrieval, which grounds the model’s outputs in the organisation’s own documents and data, governs what the model knows. It is the right tool for accuracy and the wrong tool for values. An ethically sensitive question is not resolved by giving the model better facts. It is resolved by how the model weighs those facts against one another, and retrieval does not reach that. It can improve what the model says; it does not change how the model judges.

Guardrails and output classifiers catch defined categories of unwanted output after the model has produced them. They are a backstop at the perimeter, useful and worth having, and they act on results rather than on the reasoning that produced them. They change whether a given output gets through, not what the model would generate the next time.

Fine-tuning goes deeper, and shifts behaviour more substantially than any of the above. It still operates on a base whose alignment the organisation did not author. It can degrade a model’s safety and capability in ways that are difficult to predict, and the most capable closed models frequently cannot be fine-tuned by customers at all. Fine-tuning buys real movement, but it buys it on a foundation the organisation has borrowed rather than built.

Evaluation against the organisation’s values is the one instrument every organisation should use, because it is the only one that tells you whether the model’s behaviour matches what the organisation stands for. But it only tells you. It shows where you stand; it does not move you.

Set the levers side by side and the limit is plain. The application layer lets an organisation constrain the model’s behaviour, filter it, and measure it, sometimes well enough that the residual gap is small and genuinely tolerable. What it does not do is let the organisation replace the substrate. A sufficiently elaborate application stack can dominate the behaviour a user is likely to encounter, and an organisation that has built one should not understate what it has achieved. But the dispositions were set in pre-training and alignment, where only the provider operates, and it is those dispositions, not the stack, that decide the case no one designed for. The application layer is where a deploying organisation has authority; the substrate sits one layer below it. That is not a failure of engineering. It is the structure of the thing, and a Board that grasps it will set its expectations of the application layer accordingly.

The real choice: accept, reject, or build

If the substrate cannot be replaced from the application layer, the decision is not a technical one. It is strategic, and it takes three forms.

The first is to accept. The organisation runs frontier models broadly as supplied, applies the partial mitigations described above, and accepts that the residual ethical standard is the provider’s. This is the fastest route, the cheapest, and the one with the most capable models behind it. Its cost is precisely the thing this article has been describing: an ethical standard the organisation did not author, cannot fully inspect, and cannot hold still. Where the provider’s position sits only slightly off the organisation’s own, that cost is modest and the mitigations carry it. Where the two genuinely diverge, accept means running a standard the organisation might not have chosen and might not defend, and the test is no longer whether the residual gap is small. It is whether the Board can own that standard as the organisation’s own. Most organisations are already here. Few arrived by deciding to be.

The second is to reject. The organisation declines to place AI in the contexts where the value substrate bears directly on people, confining it to uses where the way a model frames a sensitive question does not materially touch a customer or a member of staff. Drafting assistance, code, and document summarisation sit comfortably inside that boundary; eligibility decisions, complaint handling, and frontline customer judgement sit outside it. Reject is a legitimate governance position, and an under-used one. It is the honest answer wherever the provider’s encoded position is one the Board has examined and cannot endorse. Its cost is also real: the organisation forgoes the capability in the places it would be most valuable, and cedes pace to competitors willing to accept the trade.

The third is to build. The organisation takes ownership of the alignment layer itself. In practice this does not mean training a frontier model from nothing, which is beyond all but a handful of organisations anywhere. It means substantial alignment and fine-tuning work on open-weight foundations, so that the value choices reflect the organisation’s own deliberation rather than a distant provider’s. Build buys more control over the substrate. It does not buy complete control, because the open-weight base carries its own pre-training priors, and those persist underneath whatever is layered on top. The cost is capital, scarce specialist talent, compute, and a permanent maintenance commitment, usually in exchange for a model less capable than the frontier. It is available, realistically, to very few.

The honest shape of the choice is that there is no free corner. Accept maximises speed; it surrenders control of the standard and asks the organisation to extend trust to a provider it cannot fully audit. Build maximises control, and pays for it in speed, cost, and capability. Reject declines the trade by declining the capability in the places where the trade would bite hardest. Total sovereignty over the values in a model is, for practical purposes, not on the menu; even build is more control rather than all control. This is the AI Sovereignty Trilemma, the structural tension between trust, speed, and control, carried from infrastructure and jurisdiction down into the value substrate of the models an organisation runs. As with the Trilemma in its other forms, the task is not to solve it. It is to take a position on it knowingly.

Making the choice deliberately

The choice is not made once, at the level of the organisation, but many times, at the level of each deployment, because the deployments are not alike. A customer-facing eligibility model and an internal meeting summariser raise the inherited-ethics question with entirely different force. Accept is very often the right answer for the summariser, and a far weightier call for the eligibility model.

The Board’s task, then, is not to issue a single ruling. It is to ensure that for every material AI deployment, someone can say which of the three options it represents, and that the answer was reached as a decision rather than inherited as a default. The failure mode is not choosing build and finding it costly, or reject and finding it slow; it is accepting everywhere, through inertia, having never once named the choice. The UK Corporate Governance Code already asks the Board to satisfy itself that the organisation’s culture is aligned with its values, and to seek assurance where it is not. How the deployed models treat customers and staff is now part of that culture, whether or not the Board has yet looked at it that way. The Institute of Directors paper “AI Governance in the Boardroom” makes the same point operationally: it expects the Board to retain the authority to pause or reverse an AI system whose behaviour proves unacceptable.

The discipline this calls for is Minimum Lovable Governance: named accountability and a light, regular cadence, not a new committee and a new approval queue. The standard a model runs is a matter of the Board’s collective responsibility for how the organisation behaves, not a technical setting to be delegated away. It is also not a question on which a Board should expect unanimity, since directors will weigh a contested ethical position differently, and surfacing that disagreement is the work rather than a failure of it. In its smallest form, the discipline is a single question, asked of every material deployment: which of the three is this, and did we choose it.

A permanent condition

This will not resolve. Models will keep arriving with an ethics already built into them, and that ethics will keep changing as the models change. The substitution described here is not a defect to be patched in one release cycle. It is a standing feature of running AI at all, part of what The Great Remaking means in practice, and an organisation does not get to settle the question once and move on.

One position in particular is no longer available to a Board: the comfortable assumption that the organisation’s own ethical standard is automatically the one in force. It is not. It is in force only to the extent that the Board has chosen it and maintained it against a substrate that keeps moving. Where the Board has not done that work, the standard in force is the provider’s, and the organisation’s values statement describes an aspiration rather than a practice.

Doing nothing does not avoid the choice. Doing nothing is choosing accept, on the provider’s terms, for as long as the inattention lasts. A Board that names the choice, deployment by deployment, and lands on accept with mitigations, is governing. A Board that never names it is being governed. The question facing the Board is not whether its AI has an ethics. It is whose.

Let's Continue the Conversation

Thank you for reading about the value system every AI model carries into deployment, and the choice a Board faces once it sees that the standard in force is the provider's rather than its own. I'd welcome hearing how this question is being handled in your organisation - whether you're working out which deployments genuinely raise it, weighing accept, reject, and build across different uses, or finding that the answer was settled by default before anyone named it as a decision.