Transcript

The Six-Month Window Your Board Needs to Understand

11 May 2026/21 min

## The Implementation Paradox

[A] "But then you look at what is happening with artificial intelligence in the enterprise right now, and that entire equation is just completely broken."

[B] "Perfectly. We're looking at this landscape full of incredibly smart technology, but the business results are, honestly, they're shocking. It is the absolute definition of an implementation paradox. I mean, the underlying technology is demonstrably more capable than ever before. We have AI models passing the bar exam, writing complex code."

[A] "Right."

[B] "Yet the actual bottom line business value is just flatlining for the vast majority of companies trying to use it."

[A] "Well, welcome to the deep dive. Today, we're exploring a massive contradiction that you, listening right now, are probably feeling in your own industry. We have this staggering level of machine intelligence, yet according to the data we're looking at today, 95% of enterprise AI initiatives are delivering absolute zero in return on investment."

[B] "Zero."

[A] "Zero. After all the hype, all the pilot programs, the massive budgets. So, our guiding source today is a comprehensive 2026 report from Spinout. It's titled, The Six-Month Window Your Board Needs to Understand. And we've got a collection of their technical insights on AI infrastructure, too."

[B] "Which are fascinating."

[A] "They really are. And the core thesis here is something every leader needs to hear. Currently, your back office costs are linear. Every time your company grows, you have to add headcount to handle the invoices, the orders, the data entry."

[B] "Yeah, more revenue means more staff."

[A] "Exactly. AI has the power to completely shatter that linear equation. But most companies are doing it entirely wrong because they're buying shiny tools instead of fixing their underlying messy processes."

## The Intelligence Race Is Over

[B] "The fundamental shift happening right now is that the AI intelligence race is officially over. The last few years, everyone was just battling over who had the smartest model or the biggest context window."

[A] "Right, which is just the amount of text or data an AI can hold in its short-term memory before it basically forgets what you were talking about."

[B] "Exactly. But that technological race is no longer the bottleneck. We are now entirely in an organizational race."

[A] "And what fascinates me about that shift is the specific timeline Spinout highlights. They point to a massive capability jump in January of 2026 where the bottleneck literally moved overnight."

[B] "Yeah, that was a huge turning point. It wasn't just that the models got slightly faster. The fundamental architecture of how AI operates completely changed. For years, the constraint was the model itself. Like, could it understand your prompt? Could it generate a coherent response? You would ask a question, it would give an answer, and it would just stop."

[A] "Right. One and done."

[B] "But in early 2026, developers started putting AI agents into continuous loops. Think about it like baking. Previously, an AI would guess a recipe once, bake the cake, and even if it was completely burnt, it would serve it to you and stop working."

[A] "Here's your burnt cake. Enjoy."

[B] "Exactly. With agent loops, the AI tries baking the cake, takes a digital taste test, realizes it used way too much sugar, adjusts the recipe, and bakes it again."

[A] "Wow."

[B] "And it does this trial and error adjustment in milliseconds, continuously, until the test passes."

[A] "And they started spawning sub-agents too, which honestly blew my mind when I read it."

[B] "Oh, it's wild. Like, if an AI hit a complex problem, it wouldn't just give up or ask the human for help. It would essentially spin up a specialized digital worker, a piece of code designed to handle just one specific chunk of the task independently."

[A] "Yeah, delegates."

[B] "Right. The main AI waits for the result from that sub-agent and then continues. Problems that used to require a human project manager coordinating a team of three people were suddenly being handled by the system itself."

## Why the New Constraint Is Organizational

[A] "Which fundamentally alters where the failure point lies. Because human employees are incredibly good at reading between the lines."

[B] "Oh, sure."

[A] "If you give a human a vague instruction, they'll figure it out. They'll ask a colleague. They'll check an old email. They'll just use their intuition."

[B] "AI agents, even with sub-agents and self-correcting loops, they cannot do that. Because they don't have intuition."

[A] "Exactly. They need explicitly structured, mathematically testable instructions."

[B] "So the new constraint is organizational readiness. You know, it feels like people treat modern AI like a simple chatbot you ask trivia questions to. It's like hiring a PhD physicist just to stand in your living room and flip the light switch on and off for you."

[A] "That's a perfect analogy. You have this incredibly powerful engine, but you're barely utilizing its actual capacity because of how you're directing it."

[B] "But I do have a question here. If the AI is so incredibly smart and it has these self-correcting loops, why can't it just figure out what a company wants?"

[A] "That's a good question."

[B] "Like, why is the burden suddenly back on the human operators to be perfect communicators?"

[A] "Well, an AI only knows the world you explicitly show it. If you drop that genius physicist into a pitch black room with no map, they still can't find the door."

## The 95% Failure Mechanism

[B] "Right. And this brings us right to the mechanism behind that 95% failure rate. Liam Otley at Morningside AI spent two and a half years implementing AI for massive global brands, and MIT independently confirmed his findings."

[A] "Yeah. The Deloitte survey from 2026 backs this up, too."

[B] "Exactly. It gives us the exact why behind the failure. 84% of companies have not redesigned jobs around AI capabilities. They're basically treating AI like a magic wand. They just wave it and hope for the best."

[A] "Yeah. They take a frontier model, the absolute cutting-edge, state-of-the-art AI, and they point it at a disorganized department with scattered data, contradictory policy documents, and processes literally nobody fully understands."

[B] "They're just plugging a futuristic brain into the exact same chaotic, undocumented legacy systems they've been using since 2015. And that is why 74% of companies report zero tangible value."

[A] "The AI fails the moment it hits reality. It just breaks."

[B] "Right. The AI goes into its loop, tries to find a customer's billing history, and realizes the data is split across three different systems that don't talk to each other. It doesn't know what to do, so it hallucinates an answer or just errors out."

## The Klarna Signal

[A] "Okay, so if plugging a highly advanced AI into a chaotic system guarantees failure, I want to look at the opposite approach. The Klarna example."

[B] "Yes. The Spinout report highlights what they call the Klarna signal, and the sheer scale of what Klarna pulled off is staggering. Sebastian Simiatkowski, their CEO, revealed they dropped their headcount from 7,000 employees down to below 3,000."

[A] "Which is a massive reduction."

[B] "Massive. And they didn't announce some buzzword-heavy transformation program. They didn't raise billions in new capital to do it. Their AI simply started handling the equivalent of 600 human agents' work."

[A] "Yeah. But here is the part that stood out to me. The AI wasn't doing high-level, complex, strategic thinking. It was mostly handling incredibly simple, repetitive questions. Things like, did I pay my bill? Or, can I extend my payment deadline?"

[B] "And to make that seemingly simple automation work, Klarna had to do something pretty radical under the hood. They rebuilt their tech stack from scratch to become genuinely AI native. Because AI needs unified context."

[A] "Exactly. Let's look at how almost every company operates today. Your data is spread across a dozen different SaaS systems. You know, software as a service."

[B] "Right, like Salesforce, Zendesk, whatever."

[A] "Yeah. You might have one rented cloud program for billing, a totally different one for shipping, and a third for customer support. These systems have separate data models. They structure their spreadsheets differently. And a human can kind of figure that out."

[B] "Right. When a human looks for a lost order, they mentally bridge the gap between those different screens. And AI cannot. If your billing software doesn't perfectly talk to your shipping software, the AI has poor context. And poor context mathematically produces poor results."

[A] "Precisely. Klarna had to eliminate those silos so the AI could see one continuous thread of data."

## The Boring Stuff Wins

[B] "It's so ironic to me. The AI revolution has been sold to us for years as this sci-fi utopia of artificial general intelligence solving the mysteries of the universe."

[A] "Oh, absolutely."

[B] "But the actual secret to surviving the AI revolution is it's literally just really efficient invoice reconciliation."

[A] "It is. It's almost funny. The key to success is entirely avoiding the impressive board presentations about synergy and just aggressively focusing on the most unglamorous parts of the business. The top 5% of companies that actually succeed with AI start strictly with the boring stuff. Manual data entry. Invoice matching. Document lookup. Product data updates."

[B] "Reclumbing."

[A] "Exactly. The plumbing. These processes are perfect for AI because they have well-defined inputs and expected outputs. The rules are clear."

[B] "Running well-defined transactions at high volume with high precision is the exact environment where an agent loop thrives."

## Intent Engineering

[A] "Okay. So it's all about the unglamorous plumbing. But even if you isolate a boring process, let's say customer service tickets, and you point your freshly unsiloed AI at it, things can still go horribly wrong."

[B] "Go very wrong. Spinout outlines that you need to build specific organizational foundations first. And the first one is intent engineering."

[A] "And Klarna serves as the prime cautionary tale for this as well. When they first deployed their AI on customer conversations, the initial metrics looked incredible. Ticket resolution times plummeted from an average of 11 minutes down to just two minutes. The CEO was looking at projections of $40 million in savings."

[B] "I mean, a massive win on paper."

[A] "On paper, yeah. Until the customers started loudly complaining."

[B] "Oh."

[A] "The AI achieved the exact goal it was mathematically given, which was resolve tickets fast. But the company's actual underlying intent wasn't just raw speed."

[B] "It rarely is."

[A] "Right. The true intent of the customer service department was to build lasting customer relationships in a highly competitive financial market. And those are profoundly different objectives."

[B] "Very different. It reminds me of those Amelia Bedelia children's books where the maid takes every instruction completely literally."

[A] "Yep."

[B] "It's like telling an AI, hey, clean my house. And you come back and the AI has thrown all of your belongings, your furniture, your clothes into a dumpster."

[A] "Right. Technically, the house is completely clean. The goal was achieved. But the human intent was completely missed."

[B] "But here is where I get stuck. How do you actually code that? How do you mathematically translate something as vague and human as make the customer happy into code an AI can understand?"

[A] "Well, you have to translate human culture into mathematical guardrails, specifically using reward functions and weighted algorithmic scoring. In a human organization, tradeoffs happen thousands of times a day. When speed conflicts with quality, who wins? Humans make those decisions intuitively. To engineer intent for an AI, you have to explicitly define the decision boundaries."

[B] "So you give it a point system."

[A] "Exactly. You might tell the AI's algorithm, speed gives you a plus 10 score, but an unhappy tone flag from the customer gives you a minus 50 score."

[B] "Oh, I see."

[A] "Suddenly, the AI mathematically realizes that hanging up on a frustrated customer to save three minutes of time results in a heavily negative score. You are literally encoding the escalation thresholds."

[B] "That makes so much sense. And this is why a mediocre AI model operating within a crystal clear intent infrastructure will beat a frontier model with fragmented knowledge every single time."

[A] "The frontier model is smarter, but the mediocre model knows exactly what good looks like."

## Process Archaeology

[B] "Okay. So once you've successfully coded what good looks like, the AI still needs to know the actual steps to take to get there. And according to the source material, the official company handbook is almost never how the work actually gets done."

[A] "Nope, never. Which brings us to the next concept. Process archaeology. AI archaeology is the process of excavating the undocumented workarounds and the tacit knowledge hidden deep inside legacy systems. Because if you want to automate a process, you can't just read the standard operating procedure."

[B] "The report gives such brilliant, honestly, painfully realistic examples of this. The official process manual says one thing, but the warehouse team knows from experience that supplier X is literally always two days late."

[A] "Always."

[B] "So the humans secretly order early to compensate. Or the finance team knows that one specific major client gets furious if they receive an automated collections email."

[A] "Yeah, so someone manually sends a softer, personalized invoice reminder instead."

[B] "Just to save the account."

[A] "Exactly. Or the classic corporate reality. An entire multi-million dollar department is actually being run by an old Excel file that Linda created back in 2019."

[B] "Oh, Linda's spreadsheet. It has broken macros, it's color coded in a way only she understands, and if she goes on vacation, the billing department basically grinds to a halt."

[A] "And before a business can become a futuristic AI powerhouse, it essentially has to hire corporate anthropologists to dig through its own messy habits."

[B] "It feels like we are collectively realizing that our global businesses have been held together by duct tape and Linda's spreadsheet this whole time."

[A] "It is a terrifying realization for a lot of executive boards. But if you automate without excavating all of that tacit knowledge first, you are guaranteeing a system failure. Because the AI agent will strictly follow the official documented process."

[B] "Exactly. It will ignore Linda's spreadsheet because it's not official. It will treat supplier X like everyone else, the supply chain will jam, and the AI will send the automated email that angers your biggest client."

[A] "The AI breaks the company because it missed the unwritten rules."

## The Digital Twin

[B] "Right. But how does a company fix that without breaking their live functioning business? I mean, you can't just let an AI practice on real customer orders while it learns Linda's unwritten rules."

[A] "No, you absolutely can't. That's why you build a digital twin. This is a crucial mechanism."

[B] "Okay, what is a digital twin exactly?"

[A] "A digital twin is a completely isolated, simulated version of your ERP and CRM systems. And just to clarify those terms for everyone, ERP, or Enterprise Resource Planning System, is essentially the central nervous system of a company's finances, inventory, and supply chain. And the CRM, Customer Relationship Management, holds all the client history and communications."

[B] "Exactly. So the company creates a fake mirrored version of these massive databases. You map every human exception, every workaround, and you run the automated AI agent inside this simulated environment."

[A] "So it's essentially a sandbox where the AI can fail safely."

[B] "Exactly. You test VAT calculations. You test how it handles different status flags and software dependencies. If the AI hallucinates or makes a mistake, it's only ruining fake simulated data."

[A] "That's incredibly smart."

[B] "Everything must pass perfectly in the digital twin, covering every possible edge case, before that agent is ever allowed an API key to touch production data. Narrow domain execution plus full edge case coverage is the formula for reliability."

## The Dark Office

[A] "Okay, so you've mathematically encoded the real intent. You've mapped the messy, duct-taped reality of how things actually work, and you've tested it safely in a digital twin. Now we arrive at the final state, putting the AI to work in what the report calls the dark office."

[B] "The dark office is a concept borrowed straight from the manufacturing industry. In lights-out manufacturing, the factory floor is fully automated. It runs 24/7 without a single human on site, so you literally don't even need to turn the lights on."

[A] "The factory floor is dark."

[B] "The dark office applies that exact principle to administrative white-collar processes."

[A] "And the report introduces these specific classes of software agents to run the dark office, and they have these ominous names, Ember and Umbra."

[B] "Yeah, the naming is a bit dramatic."

[A] "Oh, a little bit."

[B] "So Ember is the agent handling order management and exception handling, and it's active around the clock, constantly watching the systems. Then Umbra handles the heavy, unglamorous, repetitive tasks. It runs the night shift, basically from 11 p.m. to 6 a.m., batch processing every mundane task that accumulated during the day."

[A] "Think about the underlying economics of this model. Ember and Umbra do not take vacations. They don't need three months of onboarding. They don't need benefits."

[B] "Right."

[A] "They don't quit after 18 months and take all their institutional knowledge to a competitor. Their only compensation is electricity and API credits. The micropennies you pay to OpenAI or Anthropic every time the agent thinks or processes a piece of text."

## Infrastructure for the Dark Office

[B] "But to run a dark office safely, you need serious infrastructure. You can't just unleash Ember and hope it doesn't accidentally delete your customer database."

[A] "No, definitely not. The source details a very specific three-part infrastructure required for this. First is FastTrack, which is a methodology that takes a process from a basic specification to a working agent in just weeks."

[B] "Right. It accelerates deployment."

[A] "Then there's GuardRails, which involves continuous testing pipelines to prevent a piece of code that looked great in a demo from destroying your live database."

[B] "Essential."

[A] "And finally, SafeZone, which provides secure, isolated software environments for these agents to run in, ensuring they can't access parts of the network they shouldn't."

[B] "And this infrastructure creates a completely new mandate for the board of directors. The new message is stop buying massive multi-year IT projects and start buying operational outcomes."

[A] "That's a huge shift."

[B] "It really is. No more 24-month roadmaps from an AI center of excellence. Under the dark office model, you map a boring process, you hand it to an operator, and you pay per outcome. You pay per invoice handled or per customer return resolved with a guaranteed service-level agreement. And the entire thing goes live in six weeks."

## Concept Over Code

[A] "Exactly. I have to admit, the dark office concept feels slightly eerie to me. If Ember and Umbra are running the company while we sleep, perfectly processing invoices and handling returns autonomously, what happens when the market fundamentally shifts? Let's say a new regulation passes, or customer expectations totally change. Who turns the lights back on and tells the machines that the rules of the game have changed?"

[B] "That is the critical question of modern architecture. And it's addressed by an insight in the source called concept over code. We have to understand that because AI can write and generate code instantly, it pushes the actual cost of writing software towards zero. Because the code is essentially free to create, software itself is becoming disposable."

[A] "Disposable software. That is a massive paradigm shift from how businesses currently operate."

[B] "Huge. In the past, if the market changed or a new regulation dropped, you had to spend two years and millions of dollars updating your legacy IT systems to adapt."

[A] "Yeah, you were trapped by the expensive code you had previously paid for."

[B] "In the dark office era, if the market shifts, you don't spend months rewriting the massive legacy system. You literally throw the old agent in the trash."

[A] "Wow."

[B] "Because code is free to generate, you just define your new business intent, map the new process, and spin up a brand new agent in a matter of weeks. The value of your company is no longer in the proprietary code you own. The value is entirely in your human understanding of the business concept."

## The Takeaway

[A] "What a complete inversion of how we think about technology. To summarize the core takeaway for you, listening right now, the window of opportunity to implement this is wide open, but it is closing fast."

[B] "Very fast."

[A] "You need to stop looking at AI as an intelligence race. You know, obsessing over which new chatbot has the highest benchmark scores and start looking at it as an operational race."

[B] "Exactly. The bottleneck isn't the technology anymore. It is your organization's readiness."

[A] "You have to map your real processes. You have to be willing to act like an archaeologist, unearth Linda's old spreadsheet, and figure out how your company actually functions underneath the official manual."

[B] "You have to know reality."

[A] "Right. You need to explicitly mathematically encode your true intent, setting up those reward functions to navigate the tradeoffs between fast and good. If you do that, you can stop paying linearly for back office growth and successfully hand the boring, repetitive work over to the dark office."

## The Lingering Question

[B] "If we step back, though, and connect this to the broader trajectory of the workforce, this rapid transition to the dark office raises a lingering, almost haunting question about the future."

[A] "Oh, how so?"

[B] "If we fully embrace this model and autonomous agents like Ember and Umbra permanently take over all the boring, repetitive tasks, the basic data entry, the simple ticket resolution, the invoice reconciliation."

[A] "Yeah."

[B] "Historically, those mundane tasks served as the crucial training ground for junior employees."

[A] "Wow. That's true."

[B] "That unglamorous work was exactly how human workers learned the foundational, mechanical plumbing of their own industry. If we completely outsource all of that foundational work to the machines today to save on overhead, how will the next generation of human leaders ever learn the ropes?"

[A] "Right. They skip the fundamentals."

[B] "Exactly. If no human is down in the trenches doing the basic plumbing, we have to ask ourselves if we will eventually lose the human institutional memory required to even know what good looks like in the first place."

← Back to episode All episodes