EGAF

Enterprise GenAI Architect Framework

Designing generative and agentic AI systems that survive enterprise complexity.

developing

Vision

A demo that works is not a system that ships. The Enterprise GenAI Architect Framework treats context, evaluation, human oversight, and operating model as the load-bearing structure of enterprise AI - the parts that decide whether a capable model becomes a dependable system or an unaccountable liability.

The gap between the demo and the deployment

Generative AI demos exceptionally well and ships poorly. The distance between the two is not model quality - frontier models are extraordinary. The distance is everything around the model: the context it operates on, the evidence it acts from, the oversight that catches it when it’s wrong, and the operating model that keeps all of it accountable over time.

Enterprises feel this gap acutely because every constraint is real simultaneously. Compliance is not optional. Latency budgets are fixed. Cost scales with usage. And being wrong has consequences that land on real customers, real money, and real regulators.

In a consumer demo, a wrong answer is a curiosity. In an enterprise system, a wrong answer is an incident.

The four load-bearing layers

The framework organizes enterprise AI into four layers, each of which must be designed deliberately.

Context

A model is only as good as what it knows about your organization. Context - the documents, decisions, constraints, and provenance that give an answer its meaning - is the layer most often left to chance. Designing it as governed infrastructure is the difference between a system that reasons about your business and one that improvises. Systems like PrivateGPT exist to make this layer durable rather than ad hoc.

Evaluation

Public benchmarks tell you nothing about whether the system works on your tasks. Enterprise evaluation means measuring correctness, regression, and drift on the actual work, continuously, with a harness that fails loudly. Without it, you are flying on vibes.

Human oversight

The question is not whether humans stay in the loop but where, and with what authority. For agentic systems that take actions rather than produce text, oversight has to be designed around reversibility, approval thresholds, and a complete record of what was done and why. IRIS is built on exactly this principle.

Operating model

A system is not just code and weights. It is the people who own it, the process that changes it, and the accountability that surrounds it. The operating model is what keeps the other three layers honest after launch.

Why the architecture is the product

The temptation is to treat the model as the system and everything else as plumbing. The reverse is true. The model is a commodity capability that improves on its own. The architecture around it - context, evaluation, oversight, operating model - is what your organization actually owns, and what determines whether the capability is dependable.

For the broader stance on building AI that survives contact with the enterprise, see enterprise AI thinking and the Architecture Atlas.

Roadmap

How this framework evolves

2025 Q4 done

Reference architecture

A layered model separating the capability layer, the context layer, the evaluation layer, and the oversight layer.
2026 Q1 active

Evaluation harness patterns

Patterns for measuring correctness, regression, and drift on enterprise tasks rather than public benchmarks.
2026 Q3 planned

Agentic oversight model

Control patterns for systems that take actions, not just produce text - approvals, reversibility, and audit.
2026 Q4 planned

Operating model playbook

How the people, processes, and ownership around the system are structured to keep it accountable in production.

The gap between the demo and the deployment

The four load-bearing layers

Context

Evaluation

Human oversight

Operating model

Why the architecture is the product

How this framework evolves

Reference architecture

Evaluation harness patterns

Agentic oversight model

Operating model playbook