Converting Tribal Knowledge into Institutional Intelligence
Multi-agent analysis of Brandon Speweik's AIEOM White Paper. Three parallel research streams, nine advisory board reviewers, four analysis matrices, 40+ industry sources.
The board score is not a judgment — it's a navigation tool. Each gap has a specific fix, a technical feasibility rating, and a projected score impact.
& Schema
& Prediction Error
Case Study
& Causal Graph
Executive Verdict
No framework, product, or theory in the current landscape integrates: decision memory as a governed asset + automated capture from governance + outcome tracking and learning loops + multi-agent governance with structured dissent + temporal awareness + cross-session institutional continuity. This gap is the paper's opportunity — and its current weakness.
Score Roadmap — 48% → 92%
The board score is not a judgment — it's a navigation tool. Each gap has a specific fix, a technical feasibility rating, and a projected score impact.
& Schema
& Prediction Error
Case Study
& Causal Graph
Per-Reviewer Gap → Fix → Projected Score
| Reviewer | Now | Primary Gap | Required Deliverable | Projected | Feasibility |
|---|---|---|---|---|---|
| Karpathy Engineering Chair |
42% | "What would I build?" — No implementable architecture, no schema, no components | Reference architecture + entity schema + building blocks diagram | 88% | DONE Diagrams 02, 04; Architecture section in this report |
| Hawkins Architecture Chair |
50% | Sequential pipeline (L1→L2→L3→L4) is fragile — one layer fails, all fail | Hub-and-spoke architecture with parallel domain spokes | 90% | DONE Diagram 03; ADI already uses this pattern |
| Sutskever Intelligence |
40% | "Not an AI paper" — No AI memory model, no learning mechanism, no emergence | AI Architecture section: agent memory model, learning from decisions, emergent capabilities | 82% | READY Architecture section addresses; POC validates |
| Barrett Predictive Brain |
45% | Memory treated as storage, not prediction — no prediction error loop | Add Prediction Error as 6th core entity + learning loop mechanism | 90% | DONE PredictionError entity defined; Learning Loop in architecture |
| Li Perception |
48% | "Execution Truth" is a perception problem — no sensing architecture | Sensing Layer specification: what to instrument, how to capture, event schemas | 88% | READY Sensing Layer defined in building blocks; POC Week 2 validates |
| Pearl Research Chair |
52% | No causal graph — all "leads to" claims are correlation, not causation | Causal DAG: entities as nodes, confounders identified, testable interventions | 92% | 4 HRS Draw causal graph + identify confounders; add as appendix |
| LeCun Design Chair |
55% | Manual knowledge capture won't scale — supervised learning for organizations | Self-supervised pathway: 80% auto-capture from workflows, humans validate not create | 92% | READY Enterprise Integration diagram shows auto-capture; POC Week 2 proves it |
| Ochoa Trading Reality |
58% | Entirely theoretical — no deployment evidence, no real-world validation | At least one case study: ADI as proof + POC on real GFT initiative | 95% | POC ADI validation ready now; GFT POC needs 4-week sprint |
Technical Feasibility Summary
Every reviewer concern maps to a buildable deliverable. Three are already done (in this report's architecture section and diagrams). Three are ready to build with existing tools and patterns. Two require dedicated effort — a causal graph (4 hours) and a POC case study (4-week sprint). Nothing requires research breakthroughs or unproven technology.
Step 1 (Ready now): Add this report's architecture section, entity schema, and diagrams as Appendix A to the paper. This alone moves Karpathy (42→88%), Hawkins (50→90%), Barrett (45→90%). Estimated new score: ~72%.
Step 2 (4 hours): Draw Pearl's causal graph and add LeCun's auto-capture specification. Estimated new score: ~85%.
Step 3 (4-week POC): Run validation sprint on one GFT initiative. Satisfies Ochoa (evidence) and Sutskever (working AI). Estimated new score: ~92%.
Contributor Analysis
Ranked by contributor convergence x severity x actionability:
| Rank | Issue | Raised By | Severity | Action |
|---|---|---|---|---|
| #1 | Institutional memory not engineered — no architecture, entities, schema, or system components | Seba, Tom, Brandon (deferred to KS) | Critical | Add reference architecture section |
| #2 | Paper too abstract — "reads like talk," no proof of what to build | Seba, Tom | Critical | Add vignettes, templates, concrete examples |
| #3 | Operating model never concretely defined — "governance framework? delivery lifecycle?" | Seba, Tom | Critical | Define explicitly: what IS this? |
| #4 | Enterprise system integration missing — CRM, ERP, HR, SDLC connections | Seba, Tom | High | Add integration mapping diagram |
| #5 | Archetypes feel detached — not connected to memory engineering | Seba, Tom | High | Add pilotability scorecard + archetype→memory mapping |
| #6 | No maturity model with observable criteria | Tom | High | Add 3-5 level model |
| #7 | RACI / ownership clarity by archetype | Tom | Medium | Standard RACI table |
| #8 | No "when not to use" section | Tom | Medium | Add boundary conditions |
| #9 | Framing tension: "AI model" vs "knowledge model" | Seba | High | Author decision required |
| #10 | Executive summary needs sharpening | Tom (provided rewrite) | Medium | Merge Tom's version |
Where do the three contributors agree and disagree?
"Where does memory live? In what schema? In what system? How does it tie into identity, approvals, lineage, HR SSOT, CRM, ERP? Right now memory feels metaphorical, and not engineered."
Sebastian's Feedback — Chronological
| Date | Concern | Severity | Our Response |
|---|---|---|---|
| Feb 18 | Operating model not defined — "governance framework? delivery lifecycle?" | Critical | See Framing section — recommend Framing C |
| Feb 18 | Archetypes don't tie to institutional memory story | High | Connect archetype → memory engineering requirements |
| Feb 18 | Sections 1-6 read like data contextualization, not AI | Critical | Reframe as "Institutional Intelligence" (not just "AI") |
| Feb 18 | Suggests: "enterprise knowledge model that enables AI" | High | Aligned with Framing C recommendation |
| Feb 20 | "Reads too much like talk" | Critical | Add: schema, architecture, POC, vignettes |
| Feb 20 | "What do I build?" | Critical | See Architecture section — full schema + building blocks |
| Feb 20 | No entities (Decision, Evidence, Workflow, Policy, Outcome) | Critical | Defined — see Core Entities section |
| Feb 20 | No building blocks (workflow engine, semantic layer, KG, governance engine) | Critical | Mapped — see Architecture Diagram |
| Feb 20 | "Stops too early — strong on philosophy, weak on architecture" | Critical | This entire analysis exists to fill that gap |
All 9 board reviewers agree with Sebastian's core critique. Whether expressed as "no causal graph" (Pearl), "no prediction loop" (Barrett), "no system components" (Karpathy), or "no sensors" (Li) — same structural absence, different vocabularies.
Tom provided an alternate executive summary ("Tom's Take") and 8 strategic improvement suggestions. All boards endorsed these.
Show a pilot that "worked" then failed during governance/integration/ownership transfer. Map what knowledge was lost. Board endorsement: Ochoa says this single addition would transform the paper from theory to practice.
Decision tree or scorecard that routes a use case to an archetype and outputs required evidence, governance posture, and minimum foundation requirements. Should take <5 minutes. This directly addresses Sebastian's concern about archetypes feeling detached.
Make these "first-class deliverables" of the operating model. Include: decision record template, evidence map template, versioned workflow policy, outcome-to-recommendation linkage model. Karpathy endorses: these are the most practical bridge between vision and implementation.
3-5 levels with metrics: % workflow steps with evidence, exception capture rate, traceability coverage, time to reproduce a decision rationale, audit cycle-time reduction. Partially addresses the "what IS the operating model?" gap.
Product owner, ops owner, risk/compliance, data stewardship, platform team. Clarify what "human approved" means operationally in decision and automation scenarios.
Concrete governance patterns that scale. Include "do this early" checklist. Show tiered controls by archetype, pre-approved evidence standards, and reusable guardrails that prevent bespoke reviews.
So leaders can adopt without treating it as a parallel system. Short section mapping model to common constructs: product operating model, SRE/observability, data governance, risk management, SDLC.
For low-risk internal assistants, short-lived experiments, or single-team utilities. Keeps the model pragmatic and avoids sounding universally heavyweight. Easy section to write — high impact on credibility.
Industry Landscape
Eight domains analyzed across 40+ sources. The market is fragmented — each domain solves a piece of the puzzle, but nobody integrates them all.
| Domain | Key Players | Maturity | What They Solve | What's Missing |
|---|---|---|---|---|
| AI Agent Memory | Letta, Mem0 ($24M), Zep, LangGraph | Emerging | Persistent agent memory across sessions | Agent-centric, not organization-centric |
| Enterprise Knowledge Graphs | Palantir Ontology, Neo4j, GraphRAG | Established | Decision modeling at enterprise scale | No "why" — captures what, not rationale |
| AI Operating Models | McKinsey, Gartner, Forrester, Deloitte | Emerging | Descriptive frameworks for AI governance | No "how" — all description, zero implementation |
| Decision Records (ADRs) | UK Gov GDS (Dec 2025), AWS, arc42 | Established | Capturing architecture decisions with rationale | Static, manual, no outcome tracking |
| Semantic Layer / Data Products | dbt, Atlan, Collibra, Databricks | Established | Consistent metric definitions (data truth) | Metric truth ≠ decision truth |
| AI Governance Frameworks | NIST AI RMF, ISO 42001, EU AI Act | Established | Compliance, risk management, accountability | Process compliance ≠ organizational learning |
| Digital Thread / Digital Twin | Siemens, PTC, GE Digital | Established | Execution truth for physical products | Physical artifacts, not decisions |
| Organizational Learning Theory | Nonaka SECI, Argyris, Senge | Mature | Theoretical foundation for org learning | 30-year theory, ZERO AI implementation |
What Nobody Has — The Integration Gap
No framework, product, or theory integrates all six of these capabilities:
Position at the intersection of all eight domains. Connect: agent memory (Letta/Mem0) WITH governance (NIST/ISO), knowledge graphs (Neo4j/Palantir) WITH decision records (ADRs) WITH outcome tracking, operating model theory (McKinsey) WITH working implementation, and organizational learning theory (Argyris/Nonaka/Senge) WITH modern AI systems.
Market Validation Signals
| Signal | Source | Implication |
|---|---|---|
| 40%+ of agentic AI projects will be canceled by end 2027 | Gartner 2025 | Organizations need institutional learning, not just agents |
| 75% of firms will fail at advanced agentic architectures independently | Forrester 2026 | Complexity requires operating model guidance |
| 63% of organizations lack AI-ready data management | Gartner 2025 | Operational truth is unbuilt in most enterprises |
| Mem0 raised $24M Series A for agent memory | 2025 | Market validates the memory problem |
| UK Government published ADR framework | GDS Dec 2025 | Institutional adoption of decision recording accelerating |
| McKinsey published "Agentic Organization" model | 2025 | Top firms framing this as organizational, not technical |
| AI knowledge management market: 47% CAGR ($5.2B→$7.7B) | 2024-2025 | Growing market with no clear winner |
Academic Foundations the Paper Should Reference
Connection: Single-loop = fix the error. Double-loop = question the assumptions that caused the error. The paper needs a double-loop mechanism. ADRs are single-loop (record the decision). Outcome tracking with assumption revision is double-loop.
Connection: The paper's governance process naturally implements SECI. Board reviews = Externalization (tacit→explicit). Knowledge base = Combination. Learning loops = Internalization. The GRAI Framework (2025) extends SECI for generative AI.
Connection: The paper's title echoes this. But Senge warns (2023 interview): AI may undermine genuine learning motivation. The system must produce learning, not just compliance artifacts.
ADI Board Review
Nine reviewers across three advisory boards. Each reviewed independently. Dissent documented, never suppressed.
Aggregate Scores
Key Insights by Reviewer
Click any card to expand the full review.
2. Workflow Instrumentation — Event bus (Kafka/OpenTelemetry for organizations)
3. Knowledge Graph — Neo4j mapping decisions↔workflows↔policies↔outcomes
4. AI Agent Runtime — RAG baseline + agentic workflows with KG access
5. Governance Engine — Policy-as-code enforcement + evidence requirements + audit trails
Board Blindspots — Gaps No One Mentioned
The Framing Problem
Sebastian is asking for a knowledge architecture, not an AI architecture. His entities (Decision, Evidence, Workflow, Policy, Outcome) are knowledge graph entities. His building blocks are knowledge infrastructure. The paper's title ("AI-Enabled Operating Model") creates expectations the content cannot meet.
Three Possible Framings
| Framing | What It Implies | Architecture Required | Who It Serves |
|---|---|---|---|
| A: "AI Operating Model" | AI is the system; knowledge is input | Agent runtime, model orchestration, learning loops | CTOs, AI leads |
| C: "Institutional Intelligence Operating Model" ★ | Intelligence is the system; knowledge and AI are components | Full stack: sensing → storage → reasoning → prediction → learning | CIOs, transformation leaders |
| B: "Knowledge Operating Model + AI" | Knowledge is the system; AI is one consumer | Knowledge graph, semantic layer, decision records | COOs, delivery leads |
The paper is strongest when describing the intelligence system — the full loop from sensing execution truth through predicting and learning. AI is a capability within that system, not the system itself. This resolves Sebastian's tension while preserving the paper's ambition.
Architecture Map
This is the architecture section the paper is missing. Answering Sebastian's "what do I build?" directly.
Core Entities
The atomic units of institutional memory:
Building Blocks — Interactive Architecture
Click any block to see implementation details:
- Workflow instrumentation
- Event bus / stream
- Process mining
- Evidence auto-capture
- Decision Record Store
- Knowledge Graph
- Semantic Layer
- Temporal versioning
- AI Agent Runtime
- Pattern Detection
- Prediction Engine
- Recommendation
- Policy-as-code
- Access control / RBAC
- Approval workflows
- Evidence requirements
- Audit trails
- Outcome tracking
- Prediction error calculation
- Double-loop analysis
- Knowledge deprecation
- Pattern reuse library
- ERP / CRM connectors
- HR SSOT sync
- SDLC hooks (Jira, GitHub)
- Identity (SSO, RBAC)
- Data Catalog (Atlan, dbt)
Sensing Layer — Implementation
Li's critique addressed: "How does the system see execution truth?"
Technology options: Apache Kafka / AWS EventBridge for event streaming. OpenTelemetry for instrumentation. Celonis or custom process mining from event logs.
Key principle (LeCun): Minimize manual capture. Instrument workflows to capture execution data automatically. AI infers patterns without human labeling. Humans validate, not create.
Auto-capture sources: Jira ticket state changes, GitHub PR reviews + commits, Slack decision threads, Calendar meeting patterns, CI/CD deployment events, monitoring alerts.
Memory Layer — Implementation
Sebastian's question answered: "Where does memory live? In what schema?"
Decision Record Store: PostgreSQL with JSONB for flexibility + ACID for governance. See schema in the full analysis document.
Knowledge Graph: Neo4j for complex traversal or PostgreSQL with Apache AGE for simpler setups. Maps Decision↔Evidence↔Workflow↔Policy↔Outcome relationships.
Temporal versioning: All entities are versioned. Policies have effective_from/effective_until. Decisions can be superseded. Knowledge has a half-life.
Reasoning Layer — Implementation
Sutskever's critique addressed: "Where is the actual AI architecture?"
AI Agent Runtime: LangGraph for stateful agents with checkpointing. RAG over knowledge graph for decision context. Agentic workflows for multi-step reasoning.
Memory model: Persistent (cross-session via Letta/Mem0 patterns), not per-request. Agents maintain organizational context across interactions.
Emergent capability: Pattern detection across domains — AI synthesizes insights that no individual human possesses by connecting decisions across workflows.
Governance Engine — Implementation
The paper's key concept, now specified:
Policy-as-code: Open Policy Agent (OPA) or custom rules engine. Policies define evidence requirements, approval thresholds, and access controls per archetype.
Tiered governance (Tom's suggestion): Knowledge Assistants = lightweight review. Decision Accelerators = evidence + traceability required. Automation Enablers = full governance with monitoring + rollback.
Dissent preservation: Governance records include documented disagreements. Dissent is a first-class entity, never suppressed.
Learning Loop — Implementation
Barrett's prediction error + Argyris's double-loop:
Single-loop: Decision → Outcome → Was outcome as expected? → If not, what was the error? → Fix the process.
Double-loop: Decision → Outcome → Prediction error → What assumption was wrong? → Update the assumption → Update the governance policy → Deprecate stale knowledge.
Implementation: Outcome tracking table links decisions to measured results. Prediction error is quantified. Lessons learned trigger knowledge graph updates and policy revisions.
Enterprise Integration — Implementation
Sebastian's "How does it tie into CRM, ERP, HR SSOT?":
| System | Pattern | What Flows |
|---|---|---|
| ERP (SAP, Oracle) | Event-driven (CDC) | Operational decisions, process changes |
| CRM (Salesforce) | Webhook / API | Customer decisions, sales rationale |
| HR SSOT (Workday) | API | Role changes, team rotations (knowledge risk) |
| SDLC (Jira, GitHub) | Webhook | Architecture decisions, code changes, ADRs |
| Identity (Okta) | SAML/OIDC | Who approved what, role-based access |
| Monitoring (Datadog) | Event stream | Operational outcomes, anomaly detection |
Alternative: Hub-and-Spoke (Hawkins)
Instead of sequential layers (L1→L2→L3→L4), consider parallel domain spokes — each maintaining its own complete knowledge model — with a convergence hub that detects agreement, conflict, and novelty. Conflict between domains IS the signal — it reveals where institutional knowledge is inconsistent. More resilient (one domain failing doesn't cascade), more scalable (add domains independently), more informative.
POC Proposal — Technical + Business
Objective
Validate the paper's thesis with a working implementation. Pick one real GFT AI initiative (ideally one currently in pilot) and implement institutional memory around it. Measure whether the second initiative starts faster because knowledge was preserved.
Technical Build — 4-Week Sprint
Decision Record Store + Basic UI
PostgreSQL schema for decisions, evidence, outcomes. Simple web UI for capturing/viewing. Hook into existing Jira/GitHub.
Success metric: Capture 10 decisions in first week, <5 min each.
Workflow Instrumentation
Event capture from Jira (ticket states), GitHub (PR reviews, commits), Slack (decision threads). Auto-link workflow traces to decisions.
Success metric: 70%+ of workflow events auto-captured without manual entry.
Knowledge Graph + AI Agent
Build relationship graph between decisions, evidence, traces. RAG agent that answers: "What decisions were made about X?" "What evidence supports Z?"
Success metric: Agent correctly retrieves relevant decisions 80%+ of the time.
Outcome Tracking + Learning Loop
Record outcomes against predictions. Calculate prediction error. Generate "lessons learned." Connect back to knowledge graph for updates.
Success metric: At least 3 decisions have measured outcomes with documented learning.
Business Case
| KPI | How Measured | Target | Business Value |
|---|---|---|---|
| Decision capture rate | # decisions recorded / # estimated | >80% | Institutional knowledge preserved vs. lost |
| Capture effort | Average time to record a decision | <5 min | Low friction = adoption; high friction = abandonment |
| Auto-capture rate | % workflow events captured automatically | >70% | LeCun's self-supervised principle — scalability |
| Retrieval accuracy | Agent correctly answers decision questions | >80% | Can AI actually USE institutional memory? |
| Knowledge reuse | Initiative #2 references decisions from #1 | Qualitative | Compounding demonstrated vs. pilot reset |
| Time-to-start reduction | How much faster does initiative #2 ramp up? | Measurable delta | Direct cost savings, the paper's core promise |
| Decision quality trend | Prediction error over time | Decreasing | Organization is learning, not just filing |
| Paper Claim | POC Evidence | How We Prove It |
|---|---|---|
| "Pilots don't compound" | Measure initiative #2 ramp time vs #1 | Side-by-side comparison with and without decision memory |
| "Execution truth captures how work is done" | Auto-captured traces match reality | Compare instrumented workflow vs. documented process |
| "Governance as memory" | Decisions + dissent are queryable assets | Demonstrate a new team querying past decisions and rationale |
| "AI systems that remember" | Agent uses decision history to improve recommendations | Agent's second recommendation is better than first because of learned context |
| "Transparency by design" | Evidence→decision→outcome traceability without extra reporting | Generate audit report automatically from system, no human compilation needed |
Conservative ROI Model
Average enterprise AI initiative costs $500K-$2M. If 40%+ are canceled (Gartner) and the pilot reset loop adds 30-50% cost to survivors, a system that reduces ramp time by even 20% across 10 initiatives saves $1M-$4M/year at a mid-size enterprise. The POC cost is ~$50K (4 weeks x 2-3 engineers). ROI if it works: 20-80x.
| Scenario | Initiatives/Year | Avg Cost | Reset Overhead | Reduction | Annual Savings |
|---|---|---|---|---|---|
| Conservative | 5 | $500K | 30% | 15% | $112K |
| Moderate | 10 | $750K | 40% | 25% | $750K |
| Aggressive | 20 | $1M | 50% | 35% | $3.5M |
Excludes: risk reduction from better governance, audit cost reduction from automated traceability, talent retention value from reduced knowledge loss during turnover.
Project ADI as Validation Use Case
ADI already implements several of the paper's concepts — showing it's not just theory:
| Paper Concept | ADI Implementation | Status |
|---|---|---|
| Decision memory | PostgreSQL Knowledge Base (kb_decisions, kb_actions, kb_reviews) | Working |
| Governance as memory | Advisory Board reviews with documented dissent | Working |
| Multi-agent governance | 26 agents across 4 boards + 3 overseers | Working |
| Structured dissent | Board reviews preserve disagreements, never suppress | Working |
| Execution truth | Session hooks, protocol gates, doc tracker | Working |
| Cross-session memory | Letta subconscious (persistent memory agent) | Working |
| Prediction error loop | Identified as P0 action item | Planned |
| Self-supervised capture | Hooks auto-capture; decisions still manual | Partial |
ADI proves the concept works for AI development governance — with working code, real data, and measured outcomes. The POC extends this to enterprise operations at a client, validating generalizability.
Architecture Diagrams
Select a diagram to view. Download .drawio files to edit in draw.io.
Priority Actions for GFT Team
Deliverables for the Team
| Priority | Deliverable | Effort | Addresses |
|---|---|---|---|
| P0 | Share this analysis with Brandon, Tom, Sebastian | Ready now | Frames the entire conversation |
| P0 | Reference architecture diagram (clean version of building blocks) | 2 hours | Sebastian #7-#10, Tom #4, #8 |
| P1 | Core entity schema (SQL + relationship diagram) | 1 hour | Sebastian #10 "what do I build?" |
| P1 | POC proposal for team review (technical + business) | 1 hour | Tom #2 (proof), Ochoa (evidence) |
| P2 | ADI implementation mapping (paper concepts → working code) | 30 min | Shows it's not just theory |
| P2 | Industry positioning summary for Marketing | 1 hour | Brandon's publication needs |
Key Conversation Points for the Working Session
Not "AI Operating Model" (sets wrong expectations) and not just "Knowledge Operating Model" (too narrow). Framing C — "Institutional Intelligence" — resolves Sebastian's tension while preserving ambition. AI is a capability within the system, not the system itself.
Single highest-value addition. The building blocks view from this document. Core entities + building blocks + enterprise integration points. This is Brandon's proposed Diagram #9 (adding to his 8).
Even preliminary results dramatically strengthen the paper. Pick one real initiative. Measure: does initiative #2 start faster? ROI model shows 20-80x return on $50K investment.
Memory is necessary but insufficient. The learning loop (predict → act → measure → compare → update) is what makes memory compound. Propose adding prediction error as a core mechanism in Section 7.
The paper assumes manual knowledge capture. The practical answer to "who maintains all this?" is: auto-capture from Jira, GitHub, Slack, process mining. Humans validate, not create. Self-supervised > supervised.
Don't rewrite the paper. Brandon owns it. Provide architectural supplements.
Don't prescribe specific products. Recommend categories, not vendors.
Don't make it about ADI. Use ADI as ONE validation case.
Don't block Marketing timeline. Architecture as appendix, not prerequisite.