Writing & Ideas

The Data
Architect's
Notebook

On enterprise data systems, AI infrastructure, the hardware-software frontier, and how machines learn.

◆ Featured Essay
The KV Cache Is the Problem: Why Enterprise AI Reliability Requires a Hardware Fix
May 2026  ·  12 min read  ·  AI Infrastructure

Every time an enterprise LLM hallucinates a fact, forgets context mid-conversation, or produces inconsistent outputs across identical prompts — the instinct is to blame the model. Retrain it. Tune the prompt. Add retrieval. But after 17 years of building data systems, I've come to a different conclusion: the root cause is architectural, and it lives in the hardware. Specifically, in how Key-Value caches work — or fail to.

Read Essay →

The Symptom We Keep Misdiagnosing

Enterprise AI deployments fail in predictable ways. The model that worked in the pilot halluccinates in production. The chatbot that answered correctly in testing contradicts itself three turns later. The summarization tool that impressed the stakeholder demo produces subtly wrong outputs on real documents. The failure mode is consistent. The diagnosis is usually wrong.

The industry has converged on a set of explanations: the model needs fine-tuning, the prompts need engineering, the retrieval pipeline needs improvement, the guardrails need tightening. These interventions help at the margins. They don't fix the problem.

What KV Cache Actually Does — and Why It Breaks

When a large language model processes a long context window — a 50-page contract, a multi-turn customer service history, a complex data migration specification — it doesn't re-read every token with every inference step. It caches the intermediate attention computations (keys and values) for previously processed tokens in GPU SRAM and DRAM.

This Key-Value cache is the mechanism that makes long-context inference computationally feasible. It is also, I'll argue, the mechanism that makes long-context inference epistemically unreliable.

The KV cache is a write-once, sequential memory structure. It has no associative retrieval, no importance weighting, no mechanism to surface relevant distant context when new queries arrive. It simply fills up, and then — in most production deployments — it evicts.

When context exceeds the cache budget, existing systems use heuristic eviction policies: recency-based (discard oldest tokens), attention-score-based (discard lowest-attended tokens), or sliding window approaches. None of these are semantically aware. A critical constraint stated on page 3 of a contract is just as likely to be evicted as a boilerplate header — because the cache doesn't understand what it's storing.

The Neural-Holographic Cache Controller: A Different Architecture

The NHCC (U.S. Patent Application No. 19/656,853) proposes replacing this architecture with a holographic encoding and phase-conjugate retrieval system. Instead of storing key-value pairs as discrete addressable tokens, the NHCC encodes contextual information as distributed interference patterns — analogous to how holographic film stores light as phase relationships rather than pixel intensities.

The result: any stored piece of context can be retrieved by partial query — not just sequential access. The model can reconstruct a relevant early constraint from a late-arriving query without that constraint needing to survive in the active cache. This is what human working memory does. It's not how current KV caches work at all.

This matters enormously for enterprise applications. When you're running an LLM over a 500-page data migration specification, the model needs to recall a field-level constraint from section 4 when it's processing section 47. Sequential KV cache eviction makes this structurally impossible under real memory budgets. The NHCC makes it architecturally trivial.

Implications for Enterprise Data Architecture

As a data architect, I think about this in terms of access patterns. Every data system I've built has had to answer the question: what retrieval semantics does this system need? Relational databases give you set-based retrieval. Graph databases give you traversal-based retrieval. Document stores give you content-addressable retrieval.

KV caches, as currently designed, give you sequential retrieval with heuristic eviction. For most memory subsystems in software, that would be considered a severe design limitation. In the memory substrate of our most powerful AI systems, it's accepted as standard.

The NHCC is an argument that this doesn't have to be true. That AI reliability is, at its foundation, a data architecture problem — and data architecture problems have architectural solutions.

About This Post

This essay draws on research developed for a Harvard Business Review practitioner article and from the NHCC patent application (U.S. App. No. 19/656,853). Contact me for speaking, advisory, or collaboration inquiries.

More Writing
Apr
2026
HNSKT vs. DKT: Why Symbolic Reasoning Changes What Knowledge Tracing Can Do
Deep knowledge tracing (DKT) models are impressive at predicting the next correct answer. They are poor at explaining why. HNSKT's ILP layer changes that relationship fundamentally — and the difference matters for anyone building intelligent tutoring systems that need to act on their predictions.
Mar
2026
Zfields, Mapping Actions, and the Hidden Intelligence of Syniti ADMM
Most data migration practitioners treat Syniti's ADMM as a sophisticated ETL tool. It's more than that. The zfield architecture and mapping action framework encode transformation logic as metadata — which means your migration logic becomes reusable, auditable, and composable in ways that traditional ETL pipelines simply aren't.
Feb
2026
The Underpowered Pilot Problem: What Small Educational Datasets Can and Can't Tell You
Academic AI research in education frequently runs into a structural problem: the datasets that are available for experimentation are small. Here's how to design experiments, report findings, and honestly caveat conclusions when your pilot dataset is real but limited — without undermining the research.
Jan
2026
Why Your SAP Cutover Is Already Failing (and How to Fix It Before Go-Live)
I've seen the same pattern on a dozen enterprise SAP programs: the cutover plan looks complete on paper, but three things are always underestimated. Delta load sequencing. Reconciliation tolerances. And the human approval loop on data exceptions. Here's a framework that addresses all three.
Dec
2025
Cross-Domain Transfer in Knowledge Tracing: Can a Model Trained on Math Help Teach Physics?
One of the most provocative experiments in HNSKT is the cross-domain transfer evaluation: train on ASSISTments mathematics data, test on physics and reading comprehension. The results are surprising, humbling, and clarifying about what knowledge tracing actually models.
Nov
2025
From ECC to S/4HANA: The Data Migration Decisions That Actually Matter
S/4HANA migration projects fail at the data layer more often than at the technical layer. Here are the five data-side decisions that determine whether your program succeeds: legacy data harmonization scope, object hierarchy redesign, cleansing-vs-migrate-as-is tradeoffs, simulation cycles, and reconciliation strategy.