Enterprise Agents Don’t Need
Better Models. They Need a
Substrate.
Manifesto
By Nishant Srivastav
The first time you ask an enterprise question of an enterprise agent, you notice it.
“Which of our top-tier accounts have a renewal in the next 90 days, are owned by a CSM who joined less than six months ago, and have at least one open Sev-1 incident from the last quarter?”
The agent retrieves passages from your wiki. It pulls account summaries from Salesforce notes. It produces a confident answer that names three accounts. Two of them are wrong. The CSM tenure check was inferred from a Slack message. The Sev-1 list came from an outdated runbook. The renewal dates are off by a quarter. And the answer is unauditable — you cannot see, in a chain you can defend, why the agent named those three.
This is not a bad agent. This is the architecture working exactly as designed.
The architecture is wrong.
The Runtime Shape We Inherited
Every enterprise agent stack in production today inherits the same runtime: a query arrives, retrieval runs against an index, a context window is assembled, the model generates, the context is discarded, the next query starts over. This is RAG — retrieval-augmented generation — and it works beautifully for what it was designed for: answering questions about a corpus of unstructured text.
It was never designed for enterprises.
Enterprises do not run on a corpus. They run on entities and the relationships between them — customers and contracts, employees and roles, incidents and severity, products and SKUs, transactions and accounts. The questions that matter are the ones that cross these relationships. The runtime we use today does not see relationships. It sees similarity.
This is the foundation. From it, three failures follow.
Failure One:
The Retrieval Primitive Is Wrong
Vector retrieval finds chunks that resemble a query. Enterprise questions ask what connects to the query. These are different operations and one cannot substitute for the other.
A semantic-similarity search over chunks can return three passages that all mention “renewal” and “Sev-1” and “new CSM,” but it cannot tell you whether the renewal, the Sev-1, and the new CSM belong to the same account. The relationship — the join — does not survive the embedding. The model is left to guess from co-occurrence, and it guesses wrong.
You can patch this with rerankers, with query rewriting, with hybrid sparse/dense retrieval, with multi-step retrieval chains. None of them reach the underlying problem. The underlying problem is that you have asked a relational question of a similarity engine. The answer will be probabilistic when the question is structural.
Failure Two:
Context Is Reconstructed from Scratch on Every Invocation
This is the failure most operators do not see, because they do not notice the absence of a thing.
Every time the agent runs, it starts over. Whatever it retrieved last time is gone. Whatever it figured out about the business — that this customer’s contract was rewritten last quarter, that this CSM owns the strategic accounts, that this incident class always gets escalated — is gone. The next invocation re-derives everything from scratch by retrieving again.
The agent never accumulates. It is, every single time, a new hire on its first day, reading whatever the retrieval index hands it and trying to look competent. Multi-agent systems compound the problem: each agent independently retrieves, independently constructs its own context, and the agents work as a group of strangers who do not share a worldview.
There is no model improvement that fixes this. Longer context windows let you stuff more in; they do not give you persistent, shared, accumulating understanding. Bigger models read more carefully; they do not remember between calls. The problem is not how much the agent sees in a single invocation. The problem is that there is nothing for it to come back to.
Failure Three:
There Is No Evidence Chain
When the agent gives you an answer, can you reconstruct how it got there?
In the current runtime, the trail looks like this: a question, a set of retrieved chunks scored by a vector index, a model generation, an answer. The chunks are documents; the documents are content; the content is what someone wrote at some point about something. There is no canonical reference for the entities the answer names, no governed source of truth for the relationships it asserts, no audit log that says the renewal date came from this row in this system at this time, the CSM ownership came from this record in HR, the incident came from this ticket.
For an internal Slack bot, this does not matter. For an agent operating inside a regulated industry — financial services, healthcare, defence, pharma, legal — it is the only thing that matters. Without an evidence chain, the agent cannot be deployed where the answers carry consequences. The architecture cannot produce one because the architecture is not connected to the systems of record. It is connected to a similarity index over documents.
The Misdiagnosis
The industry’s current answer to all of this is: bigger models, longer context windows, better RAG.
None of it will work.
Bigger models read the wrong context more eloquently. Longer context windows stuff more chunks into the same broken loop at higher cost and higher latency. Better RAG — rerankers, query decomposition, GraphRAG — improves retrieval inside an architecture whose problem is not retrieval. You can polish a runtime that reconstructs ephemeral context on every call until it shines, and at the end of the polishing you still have a runtime that reconstructs ephemeral context on every call.
The architecture has to change. Not the components plugged into it.
The Inversion
The change is structural, and it is simple to state.
The enterprise agent does not retrieve context per invocation. The enterprise agent reasons over a persistent substrate that is the enterprise’s context. That substrate is a knowledge graph: a canonical model of the entities and relationships that constitute the business, built from the systems of record themselves, governed at the substrate layer, queried by traversal rather than by similarity, and accumulating over time as the business changes.
More precisely: the substrate is an entity-relationship model of the enterprise. The knowledge graph is its most general form. A semantic layer over a data warehouse — dbt Semantic Layer, Cube, AtScale, the metric layers in Snowflake and Databricks — is a special case of the same idea, restricted to the structured slice: entities as dimensions, relationships as joins, attributes as measures. For the rare enterprise where every question lives inside a clean warehouse, the semantic layer is enough. For the typical enterprise, where the questions worth asking cross from CRM rows into contract clauses into support tickets into Slack threads, only the graph generalises far enough. The argument is not that graphs beat semantic layers. The argument is that an agent without an entity-relationship substrate of some form is structurally broken, and for any real enterprise the only substrate that scales is the graph.
Documents do not disappear. They hang off the graph as evidence. Vectors do not disappear. They index the documents that the graph points to. But the runtime is no longer RAG. The runtime is a graph the agent thinks with.
Every one of the three failures dissolves at the substrate. Retrieval becomes traversal — relational questions are answered by relational queries. Context stops being reconstructed because it stops being ephemeral; the graph is the persistent state, shared across every agent and every invocation. Evidence chains become graph paths — every assertion in an answer is anchored to a node and an edge, every node anchored to a source system, every traversal auditable end to end.
Where the Market Actually Is
It is fair to ask: if this is the right architecture, who is building it?
The honest answer is that the category is forming, and the first wave of products is already in market. None of them have assembled the full architecture this argument describes. Each has located one or two of its properties and stopped there.
Palantir got there first. AIP has explicitly inverted around the Ontology — what Palantir calls Ontology-Aware Generation is the move this argument describes, made by an incumbent, in production. The shape is right. The limitation is the access model: the substrate is proprietary, deployment-heavy, and locked inside a platform that takes months of professional services to install. Most enterprises will never buy Palantir. An architecture cannot become a category if it is reachable only through one vendor and one deployment model.
Atlan has built the cleanest articulation of “context layer for AI agents” in the market. Their Enterprise Data Graph, Active Ontology, and Context Engineering Studio are a direct rendering of the inversion, and the substrate is real. The limitation is closer to perimeter than to shape: Atlan’s centre of gravity is the structured warehouse and the metadata that governs it. The unstructured surface — contracts, tickets, conversations, regulatory text — sits at the edge of their model rather than as first-class entities alongside the structured ones. The substrate is right; the perimeter is narrower than the agent ultimately needs.
Cognee has built the automatic-construction half of the problem. Their pipeline assembles knowledge graphs from heterogeneous sources without hand-modelling. Zep / Graphiti sits in the same shape with a temporal twist — strong on how entities and facts change over time. Both are real, both are being adopted, and both are positioned as agent-memory libraries that plug into agent SDKs — not as full enterprise substrates with governance, lineage, and audit native at the substrate layer.
Stardog and others in the enterprise graph-database lineage offer agentic interfaces over their graphs. Voicebox grounds LLM output in SPARQL queries against the graph. The shape is closer to substrate than to memory, but the graph is treated as a database to query, not as the runtime context the agent thinks within.
GraphRAG remains what it was. A smarter retrieval index inside the same RAG runtime. Not a substrate.
The architecture this argument is pointing at — canonical, automatically built, governed, full-enterprise, agent-native, accessible — has not yet been assembled in one product by one vendor. Each of these players has located a piece. Whoever puts the pieces together first wins the next decade of enterprise AI.
The Stake
Most enterprise agent stacks shipping today will have to be rebuilt. Not extended, not improved — rebuilt. The runtime is wrong, the substrate is wrong, the foundation cannot carry the weight of what enterprises actually need from agents: relational reasoning, accumulating understanding, defensible evidence.
The teams that recognise this early will win the next decade of enterprise AI. The teams that bet on longer context windows and smarter retrievers will spend the next decade explaining why their agents keep getting accounts wrong.