Ontologies and Knowledge Graphs: Why Structure is the Next Data Frontier

Seattle | Published 31 May 2026 in Data | 15 minute read |

An open leather-bound accounting ledger on a dark polished desk in warm lamplight, its hand-ruled columns of figures dissolving at the right-hand edge into a glowing blue three-dimensional network of connected nodes that hovers above the desk surface, a visual reframe of the same information shifting from a flat record to be read into an explicit structure that can be traversed and reasoned over (Image generated by ChatGPT 5.4)

When we digitised a group of regional newspapers in 1998, the first thing we adopted was a metadata standard. Dublin Core gave us fifteen fields that provided a standardised way to summarise our newspaper articles, making it possible to find and organise our digital archives dating back over 200 years. It was enough to earn syndication revenue, because the aggregators could now retrieve and index content we had told them how to read.

Then we built something more ambitious. Every article that named a company linked to our business directory. Every car review linked to local car dealers, every property feature to estate agents, every job story to recruiters. Every named professional, whether accountant, lawyer, or banker, linked to a people database. Each link routed traffic, generated referrals, and earned a commission. Without the vocabulary that exists today, we had built a knowledge graph: the newspaper at the centre, advertiser and referral revenue on the spokes, every node typed, every relationship made explicit. The first revenue stream came from metadata, which made the content discoverable. The second came from structure, which allowed information to be connected, traversed, and monetised. The editorial content remained the same. What changed was the visibility of the relationships between things.

The principle is identical now, at vastly larger scale and higher stakes. The organisations creating durable AI advantage are the ones making their latent graph explicit and machine-traversable: the customers, contracts, products, suppliers, employees, and geographies whose relationships already exist but remain hidden inside documents, systems, and people’s heads. Metadata describes what a document is, which is enough to retrieve it. Structure makes explicit how things relate to one another, which is what makes them possible to reason over. Two terms describe how that is done. An ontology is the shared agreement about what entities exist in a domain, what they may be called, what properties they hold, and how they are permitted to relate: the schema. A knowledge graph is the working network of typed entities and explicit relationships built to that schema: the manifestation.

From quality to structure

Ask a Board about its data and the conversation usually turns to quality. Completeness, accuracy, timeliness, lineage, governance: these are the properties Boards have been asked to oversee, and the investment has followed. Master data management, data catalogues, and data quality programmes have absorbed real budget and real attention. The work was necessary and remains so, but quality is no longer where the next advantage lies.

The binding constraint on AI value is increasingly structure. The distinction is simple but decisive: data quality tells the organisation whether information is reliable; data structure tells the machine what that information means. The first is a question about the data itself. The second is a question about the relationships between things, and relationships are where reasoning lives. A large language model with access to perfectly accurate but flat data can summarise a document competently. What it cannot do is reason over how that document relates to the seventeen others that bear on the same decision unless those relationships have been made explicit. The summary may be accurate. The reasoning is absent.

The first sign of the shift is linguistic. Organisations increasingly talk about knowledge rather than data. Data is what sits in tables. Knowledge is what an organisation can act on. The transformation from one to the other is structural, not a matter of cleaning. This is where the data loop from The Great Remaking does its work: advantage accrues when proprietary data is shaped by AI-integrated workflows and structured to be reasoned over. A larger pile of unstructured records is simply a larger pile. The question for the Board, then, is not whether the organisation has a data strategy. Most do. The question is whether that strategy treats structure as seriously as it treats quality. The relationships already exist. They live in what the organisation’s people understand and its systems do not. Structure is the work of making that latent graph explicit.

How you make the graph explicit

You make the latent graph explicit by agreeing what things are. That sounds trivial. It is not.

Take a single word every organisation uses: customer. Does it include a prospect who has never bought anything? A former customer who churned last year? A user of a free product who has never paid? A contact at a partner firm? Ask three departments and you will get three answers. To sales, a customer is anyone in the pipeline. To finance, it is an entity that has been invoiced. To support, it is whoever is entitled to raise a ticket. Each definition is correct for the work that produced it, and the organisation has functioned for years without ever reconciling them, because nothing forced the question. The relationships exist, but they exist three times over, in three incompatible forms, none of them written anywhere a machine can use.

Agreeing that definition, and writing it down in a form a machine can use, is the first act of making the graph explicit. The set of those agreements is the ontology: the shared statement of what entities exist, what they are called, what properties they hold, and how they are allowed to relate. A customer relates to contracts. A contract relates to products and to a term. A supplier operates in jurisdictions and depends on commodities. None of this is exotic. It is the organisation writing down what it already half-knows, in a form that no longer depends on who happens to be in the room.

The knowledge graph is what those agreements produce once they are populated with real data. Every customer, contract, product, supplier, and jurisdiction becomes a typed node, and every relationship between them becomes an explicit, traversable link, sitting on top of the clean, governed data that the quality work already produced. This is the join between the two: quality gives you trustworthy nodes; structure gives you the relationships between them. One without the other is incomplete. Accurate data with no relationships cannot be reasoned over. Rich relationships built on unreliable data simply produce confident, traceable nonsense.

What makes this hard is not the technology. It is that the definitions are choices, and the choices have owners, histories, and consequences. That is the part the Board needs to understand, and it is where this stops being a data exercise and becomes a governance one.

What the Board is actually being asked to govern

Those choices do not stay where they are made. Whichever definition of customer the organisation settles on, sales or finance or support or some negotiated blend, that choice now propagates. It decides who appears in the retention numbers the Board reviews, who falls inside a regulatory obligation, and who an AI system treats as a target for outreach or excludes from it. The same is true of every core entity. How a contract is defined determines what the organisation believes it is committed to. How an employee is defined determines who workforce and compliance systems account for. How an incident is defined, where the line falls between an incident, a near-miss, and an operational hiccup, determines what gets reported, what gets disclosed, and what every downstream system learns from.

None of these is a technical question. Each is a question about what the organisation officially knows, and they used to be resolved informally: by precedent, by departmental practice, by long-tenured people who simply knew how the organisation counted things. That worked because the definitions stayed implicit and local. Once they are encoded into a graph that AI systems reason over, they stop being implicit and local. They become explicit, queryable, and binding on everything downstream. The organisation no longer has three working definitions of a customer that quietly coexist. It has one, and it is acting on it at scale.

The Board’s role is not to author these definitions; that is rightly delegated. Its role is to recognise that the choices are being made, by someone, somewhere, and to ensure the ones that matter are visible, defensible, and aligned with the organisation’s strategy and risk appetite. This is where Minimum Lovable Governance applies. Not every entity definition warrants Board attention. The handful that touch revenue, risk, regulatory exposure, and customer trust do, and the rest can be delegated, but only once the definitional layer is understood to be something the Board can see and govern in principle.

There is a precedent every director already understands. A chart of accounts is nothing more than an agreed set of definitions: what counts as revenue, what counts as a cost, where one category ends and the next begins. It is guarded ferociously, because every figure in every report inherits those definitions, and a quiet change to one of them changes what the numbers mean without touching a single number. No Board would let the finance function silently redefine revenue. The definitions inside the knowledge graph now carry the same weight, across a far wider surface than finance ever touched, because they determine what every AI system in the organisation treats as true. The chart of entities is the new chart of accounts.

The graph answering a question

Get those definitions right and the reward is not abstract. It is the ability to answer questions the organisation has never been able to answer well.

Picture the typical questions Audit and Risk Committees may ask. What is our exposure to a particular country? Which of our suppliers depend on a contested source of energy? Where do we carry concentration risk on a single commodity, and which of our suppliers’ suppliers operate in jurisdictions newly subject to sanctions? These are not exotic questions. They are routine. And most Boards will only receive partial, hedged, weeks-late answers.

The reason is structural, not a failure of effort. Procurement produces a spreadsheet. Risk produces an assessment. Legal produces a memo. The three disagree at the edges, the Board accepts a directional read, and the meeting moves on. Supplier data, in most organisations, lives in flat tables: identifier, name, contract value, payment terms, primary contact. That is enough to pay an invoice. It cannot answer the Board’s question, because the question is not about any single supplier. It is about the relationships between suppliers, jurisdictions, commodities, and contracts, and a flat table holds none of them.

Put the same information into a graph and the question changes shape. Each supplier, jurisdiction, commodity, contract, and regulatory framework becomes a typed node, and the dependencies between them become links the machine can follow. “What is our exposure to a particular country” stops being a research project and becomes a query: start at the country, traverse to every supplier that operates there, follow each one to the commodities it provides and the contracts it sits under, then follow those suppliers to their suppliers, and surface the concentrations. The answer is not a spreadsheet assembled by hand over two weeks. It is a path through the graph, generated on demand.

It is not a perfect answer. Every graph is only as honest as the definitions inside it, and the exposure it reports is the exposure the organisation has chosen to model, no more. But it is a traceable answer and a repeatable one. The auditor can see the path it travelled. The Risk Committee can interrogate the definitions it rests on. The next Board pack does not start from a blank page. Every Board keeps a list of questions it asks repeatedly and never gets a satisfying answer to, and that list is almost always a list of questions the underlying data cannot structurally answer. This is how the list gets shorter.

Why reasoning needs structure

A brief note on scope, because the claim is easy to overstate. Rule engines, expert systems, and constraint solvers have produced reliable inferences for decades without anything anyone would call an enterprise ontology. The claim is not that proof requires a knowledge graph. It is narrower: reasoning across everything an organisation knows, at the scale and speed AI now makes possible, requires that knowledge to be structurally explicit. Without structure, a system can still produce confident output. It cannot produce reasoning anyone can check.

Not all AI output is the same kind of thing, and the difference decides what structure you need. The four indicator types that underpin the Verification Premium separate them cleanly. A lagging indicator is a matter of fact and needs only reliable data. A leading indicator is a signal, drawn from patterns in that data. A predictive indicator is a probability, a model’s read on what might happen. A reasoned indicator is different in kind: it is a conclusion that follows from premises, and producing one requires the system to know what things exist, how they may relate, and what follows from what. That knowledge is the ontology. It is the only one of the four that cannot be faked with more data, because it does not rest on data volume at all. It rests on structure.

This is what the Verification Premium rests on. An organisation can verify only what its systems can reason over, and at scale, reasoning is a function of structure. The premium does not go to whoever generates the most AI output. It goes to whoever can show their output holds. Everything else is probability dressed up as proof.

One qualification, and the whole argument turns on it. No ontology is reality. The graph is the organisation’s official account of what it knows, not the world itself, and the proof it produces is proof within that account. This is not a weakness. It is the point. A probability, however confident, gives the Board nothing it can govern. A model, however imperfect, gives it a defined artefact: something it can inspect, challenge, and stand behind. Governance needs an object, and structure is what produces one.

Which matters now because the demand has stopped being voluntary. The FRC’s 2024 UK Corporate Governance Code presses on explainability, the IoD’s work on AI governance presses on decision traceability, and the Data (Use and Access) Act 2025, in force since 5 February 2026, gives a person subject to a solely-automated significant decision the right to contest it and to demand a human review. Consider what answering that demand requires. A regulator, or a customer’s lawyer, asks why a particular decision was made. Under flat retrieval, the organisation can produce the policy that ought to have applied and an assurance that it generally does. Under a knowledge graph, it can reconstruct the specific decision: the authority it fell under, the policy that defined that authority, the factors the model was permitted to weigh and the ones it was forbidden, and the inputs it actually saw. The first is a defence. The second is a record.

That is the difference between accuracy and accountability. Quality makes an answer correct. Structure makes it defensible. A Board promised explainable AI without that structural layer has been promised something the organisation cannot deliver, and will discover the gap at the worst possible moment, with a decision already made and someone on the outside asking it to be explained.

What this means for the Board’s agenda

Three questions follow, and none requires technical fluency to ask.

The first is who owns the chart of entities. Most organisations cannot answer it. Master data management is the closest existing function but rarely operates at the level of definitions and relationships, and data architecture reports too far down to carry governance weight. The Chief Data Officer is usually the right home, though the remit may not yet stretch to structural governance. Asking makes the ownership gap visible, which is the first step to closing it.

The second is to treat the definitions of core entities as Board-relevant decisions. Not all of them; the Minimum Lovable Governance principle holds. The definitions that touch revenue, risk, regulatory exposure, and customer trust are the ones that warrant Board visibility, and the discipline is to govern those few deliberately rather than to govern everything badly.

The third is to ask what the organisation’s AI strategy assumes about structure. Most AI strategies assume good data. Very few assume structured data. The gap between the two is where most AI programmes stall, and where the next decade of competitive separation will form.

The direction of travel has been consistent for thirty years: no structure, then metadata, then domain ontology, now enterprise-scale knowledge graphs. What has changed is the size of the prize. A regional newspaper that monetised typed relationships in 1998 was solving the same problem the modern enterprise is solving now, at smaller scale and with worse tooling, and it was paid handsomely for getting there first. The latent graph already exists inside every organisation. The only choice is whether to make it explicit and govern it, or leave it to form by default. This is an advantage competitors cannot purchase or shortcut, because it is built from what the organisation officially knows about itself. The Board that grasps this can move. The Board that treats structure as a technical detail will discover, too late, that the advantage it failed to govern is the one its competitors quietly accrues.

Let's Continue the Conversation

Thank you for reading about why data structure, not data quality, is becoming the constraint on durable AI value. I'd welcome hearing about your organisation's experience with the relationships hidden inside its data - whether you're working out who actually owns the definitions your AI systems reason over, finding that a question the Board keeps asking is one the underlying data cannot structurally answer, or discovering that your AI strategy quietly assumes good data where it really needs structured data.

Cookie Consent