Bret Taylor of Sierra on AI agents, outcome-based pricing, and the OpenAI board

Stripe's Cheeky Pint 1h41 7 min #7
Bret Taylor of Sierra on AI agents, outcome-based pricing, and the OpenAI board
Watch on YouTube

Summary

  • Bret Taylor — co-founder of Sierra, chair of the OpenAI board, and veteran of Google Maps, the Like button, Salesforce, and the Twitter board — discusses the rapid shift toward AI agents, how Sierra is transforming customer service, and what it means for the future of software, business models, and work.

The state of consumer AI and agent memory

  • Consumer AI in early 2026 is split between polished mainstream apps (ChatGPT, Gemini) that have no persistent memory and chaotic open-source projects like OpenClaw that achieve memory by writing to markdown files — a janky but surprisingly effective approach.
  • OpenClaw’s memory model — writing observations to markdown files, with imperfect compaction — mirrors how human memory actually works (a mix of context loading and random access), and may be more useful than sophisticated vector databases for general-purpose agents.
  • The broader lesson: code repositories are uniquely suited to AI agents because all context is in one place, in text, with formal feedback loops (tests, code reviews, version history). This makes coding agents far ahead of agents in other domains.
  • Harness engineering — building the scaffolding around an agent (documentation, rules, skills, MCP) — is emerging as a critical discipline. Some teams are finding that a directory of markdown files describing architecture and product intent works better than many MCP servers.
  • There’s a growing sense that the future of general-purpose agents may look more like a Unix-style file system than a collection of specialized micro-agents connected by APIs.

Sierra: AI agents for customer experience

  • Sierra builds AI agents that handle customer service across phone, chat, WhatsApp, and mobile apps — replacing IVR systems and unifying previously separate digital and call-center teams.
  • The company reached $100M ARR in seven quarters and is now around $165M ARR, making it one of the fastest-growing enterprise software companies.
  • Typical adoption starts with one channel and a few use cases (e.g., first-notice-of-loss calls for insurance, or chat for digital-native companies), but most clients expand to both phone and digital.
  • Rocket Mortgage is a standout example: its agent handles home search on Redfin, mortgage origination, and mortgage servicing — moving beyond customer service into end-to-end product usage.
  • A major shift: Sierra has digitized the telephone, the last remaining analog channel, so companies no longer need separate teams for digital and phone support.

Cost, satisfaction, and second-order effects

  • Sierra’s clients see 70–90% automation rates in customer service cases (Ramp is at 90%).
  • Counterintuitively, average handle time for human agents goes up because the remaining cases are more complex — but agent job satisfaction also rises because solving hard problems is more fulfilling.
  • One retailer saw total conversation volume increase 2–3x after deploying Sierra, because the AI was actually pleasant to talk to (unlike old chatbots). This is a form of Jevons Paradox: making something cheaper increases total usage.
  • Most clients care more about top-line metrics (net promoter score, churn reduction, competitive positioning) than cost savings — because if everyone has access to the same technology, the savings become consumer surplus rather than competitive advantage.
  • The ATM analogy: ATMs didn’t reduce bank branches because banks reinvented what branches were for. Similarly, AI customer service will reshape what companies do with the newly cheap interaction capacity.

Building agents that reason and stay grounded

  • The key breakthrough over old chatbots is reasoning capability — the ability to think across multiple conflicting data sources (e.g., three CRM systems from three acquisitions) the way a human would, rather than being tripped up by inconsistencies.
  • Innate LLM knowledge is unexpectedly valuable: a Sonos support agent works well partly because the model already knows about Wi-Fi problems from training data, even without explicit documentation.
  • For well-known brands, grounding is actually harder because the LLM thinks it already knows the answer. Sierra uses a “constellation of models” approach: a supervisor model inspects the reasoning of the primary agent and sends it back with notes if it goes off-script.
  • Layering a 90%-accurate reasoning model under a 90%-accurate supervisor yields ~99% effectiveness — a simple but powerful technique.
  • Clients express goals and guardrails; Sierra’s platform (Agent Studio) handles the complexity of making agents robust without requiring prompt engineering.

Co-evolving with model capabilities

  • Sierra was founded in February 2024 and has had to build custom solutions (e.g., Cantonese voice support) knowing they will likely be commoditized within a year or two as models improve.
  • This creates a strange organizational dynamic: teams must build things they plan to throw away, and resist the temptation to treat custom work as precious IP.
  • The company’s thesis: today’s advantage is technological; in three years it will be product and go-to-market. The technology-forward conversation will mature into a product-forward one, just as early SaaS companies stopped marketing multi-tenancy.

The SaaSpocalypse and systems of record

  • Public markets have marked down software companies 20–30% in recent months. This is rational given unprecedented uncertainty, but likely overblown for individual companies.
  • The real question: where will value reside in software? Systems of record (ERP, CRM, etc.) have been the gravitational centers of enterprise software for 30 years because their databases were the authoritative source of truth.
  • AI agents perform valuable labor — generating leads, auditing contracts, optimizing processes. The value may shift from the database (system of record) to the encoded process (system of engagement).
  • The closer a system is to being a literal ledger (e.g., financial ERP), the more durable its value. The closer it is to a workflow tool, the more vulnerable it is to disruption by agents.
  • LLMs are tolerant of messy data — you can paste in poorly formatted text and they’ll work with it. This reduces the advantage of having all data in one system of record.

Outcome-based pricing

  • Sierra uses outcome-based pricing: clients pay when the AI agent fully resolves a case without human intervention; escalations are free. For sales agents, it’s a commission model.
  • This is fundamentally different from usage-based pricing (which charges for tokens/compute). Token usage doesn’t correlate with business value — an agent could use 100x fewer tokens and be less valuable.
  • Outcome-based pricing aligns incentives: Sierra is motivated to make the product genuinely better, not just to sell more usage. It also forces accountability for the full implementation lifecycle.
  • The model is easiest in customer service (resolved vs. not resolved) and harder in domains like product usage where “success” is fuzzier — but the aspiration is to build agents that drive long-term relationships and outcomes, not just conversations.
  • Stripe uses similar outcome-based pricing (transaction fees) and finds it creates strong alignment with customers — Stripe actively pushes customers to adopt features that increase revenue for both parties.

Is Sierra short AGI?

  • The question: if models keep improving and absorbing capabilities that Sierra builds, does Sierra have a moat?
  • Bret’s view: applied AI is a massive, underpenetrated market. Most companies want solutions to their problems, not raw models. The buyer for frontier models (CTO/engineering) is different from the buyer for customer experience agents (Chief Customer Officer/CFO).
  • Enterprise software value comes from product nuance, go-to-market relationships, department-specific workflows, and ecosystem effects — not just the ability to write code.
  • If model development paused today, there are still trillions of dollars of unrealized economic value from applying existing models to business processes. The bottleneck is the lack of applied AI companies building purpose-built agents for specific domains.

AI productivity: processes, not people

  • Bret’s core thesis: the atomic unit of AI productivity is a process, not a person. AI won’t replace jobs so much as optimize end-to-end workflows that span multiple departments.
  • Example: onboarding a new supplier involves legal, procurement, IT, and a business sponsor. If the median time is 17 days, AI could compress it to 17 hours — but no single person owns that process today.
  • Most companies are organized by department, not by process, which makes it hard to absorb AI productivity gains. Giving everyone Copilot is incremental; reimagining the company around processes with accountable owners would be transformative.
  • White-collar knowledge work (legal, finance, procurement) should see major gains, but only if approached as narrow, well-defined problems rather than general-purpose AI for a department.
  • The economy is not all digital: flower shops, wet labs, shipping, and physical tasks won’t see the same productivity gains without robotics. Software and finance are the most absorbable domains.

Organizational structure in the AI era

  • The canonical Silicon Valley org chart (functional teams, ratios, pipeline coverage) is under pressure.
  • High-agency generalists — people with taste, infrastructure understanding, and deep customer knowledge — are becoming dramatically more valuable because AI agents (like Codex) give them an exoskeleton to build products with relative autonomy.
  • These people often get sidelined in larger companies because they don’t fit neatly into specialist roles. AI may reverse this, making them among the most valuable employees.
  • The organizational implication: flatter structures, with more empowered individuals who own end-to-end outcomes. The right job title and structure for these “product engineer” hybrids hasn’t been invented yet.

Board drama and reflections

  • Bret was on the Twitter board during the Elon Musk acquisition. He found the public spotlight and conflict unfulfilling — he prefers building things to governance battles.
  • He joined the OpenAI board after the Sam Altman firing crisis, as a mutually agreed-upon mediator, and became chair.
  • OpenAI is his first experience on a not-for-profit board with a fiduciary duty to the mission (“ensure AGI benefits all of humanity”) rather than to shareholders — a clarifying and meaningful difference.
  • He helped rebuild the board from scratch, thinking carefully about composition: safety expertise, economic impact, infrastructure financial expertise.

AI predictions for 2026

  • A scientific breakthrough using AI will break into mainstream awareness — something as culturally significant as the Kasparov chess match or AlphaGo, but in a domain people can understand (not just math proofs with incomprehensible titles).
  • Mainstream adoption of agents — both consumer (OpenClaw-style) and enterprise. The year agents go from niche to normal.
  • Most Silicon Valley companies will stop writing code by hand. This would have been a bold prediction four months ago; now it feels obvious. The diffusion to the rest of the world will take longer.
Back to Stripe's Cheeky Pint