AI investor + engineer discuss the current state of AI — Unsupervised Learning

This is a crossover episode between Jacob Effron’s Unsupervised Learning and Swyx’s Latent Space podcasts. Swyx (Shawn Wang) is an AI engineer, podcaster, and operator at Cognition who runs major AI engineering conferences. The conversation covers the current state of AI infrastructure, the coding wars, consumer AI, open models, and what’s coming next.

What Top AI Engineers Are Focused On

Swyx curates tracks for his AI engineering conferences (roughly one per quarter) and sees the top topics as:
- OpenClaw (the dominant story of the last 4–5 months)
- Harness engineering and context engineering (closely related topics in agents and RAG)
- Longer-tail evergreen topics: evals, observability, GPUs, LM infra, multi-modality, and generative media

Has AI Infrastructure Finally Stabilized?

Harrison Chase (LangChain CEO) recently said it finally feels like AI infrastructure has reached stability after years of constant reinvention (LangChain → LangGraph → Deep Agents).
Swyx agrees there’s some justification: the consensus pattern for agents is now relatively settled — LLMs with tools in a loop, a file system, retrieval, and skills (a markdown file with attached scripts, described as the “minimal viable format”).
He expects more adaptation around real-time elements, sub-agents, and memory, but the core harness pattern feels stable.
Selling to agents is different from selling to humans: Vercel’s CTO noted that 60% of traffic to Vercel’s admin app is now bots (mostly coding agents using CLIs). If your product doesn’t have an API, it effectively doesn’t exist.
Swyx argues that good “agent experience” is really just good developer experience — good docs, consistent stateless APIs, discoverability. He’s skeptical of gaming AEO/GEO beyond that, though short-term wins can compound.
Compounding advantage concern: Companies whose products were in pre-training data before 2023 have an installed advantage. But Swyx thinks in 3–4 years, better memory and personalization systems will matter more than current frequency-of-mention advantages.

When Does Doing RL / Own-Model Training Make Sense?

Both Cognition and Cursor do their own model training. Swyx describes an “agent lab playbook”: start with state-of-the-art models from big labs, specialize for your domain, then once you have enough workload and high-quality user data, train your own models for cost and latency savings.
There’s a genuine marketing bonus (naming your model, publishing research), but Swyx says there’s real user value too — Cognition’s Composer 2 and Devin 1.6 are top-five models in fair, unsubsidized market conditions.
Domain-specific models (e.g., for search) make clear sense. Infrastructure for this is getting easier (Thinking Machines, Tinker Thing, Prime Intellect).
The logic is a reversal of the bitter lesson: bootstrap on large general-purpose models, then as workloads become high-quantity and low-variance, distill down to smaller specialized models.
DIY RL for pure quality improvement (not cost) is less clearly justified, but Swyx notes the trade-off always involves holding quality constant while drastically reducing cost.
Custom chips (Cerebras, Talas) are increasingly important. Cognition runs on Cerebras; so does OpenAI. The speed improvements are dramatic — thousands of tokens per second vs. under 100 — and every 10x speedup unlocks new usage patterns.

The AI Coding Wars

The market is enormous and was essentially created in the past year:
- Anthropic is at ~$2.5B ARR from Claude Code (recognized ARR methodology is debated)
- OpenAI is estimated at ~$2B (no public number)
- Cursor is rumored at ~$B
Claude Code just celebrated its one-year anniversary.
Swyx pushes back on the “empty space” argument (that founders should bet on non-coding verticals because coding is already saturated). His counter: coding went from ~10% to ~50% of cloud use cases in a year — why can’t it keep going? Betting on mean reversion instead of momentum has been painful.
The current phase is capability exploration, not efficiency. People who spend more and experiment more are being rewarded. Token-maxing leaderboards reflect this — it may look sloppy, but the person spending $10K/day on API tokens discovers new capabilities first.
Anthropic plays the high-price, restricted-access, bundle-product-with-model strategy. OpenAI plays the open-access, subsidized, “come on in” strategy. Both work.
Market structure: Swyx thinks the most likely outcome is two big players (Anthropic and OpenAI) plus a long tail of specialists. For the structure to change materially, something like Microsoft waking up and leveraging GitHub at scale would be needed. Chinese labs (ZAI/GLM) are trying but haven’t broken through yet.
Application companies still have room because the model labs are distracted by expansion into other verticals (finance, healthcare, super-app strategy). Cursor and Cognition are comparatively focused on coding.

Consumer AI Has Hit a Plateau

Consumer AI as a category has plateaued — it’s not that ChatGPT is losing share to competitors, but that the category itself hasn’t figured out how to bring on more users or increase frequency for mainstream audiences.
In contrast, coding has gone parabolic — the entire space is growing rapidly.
First-mover stickiness in coding is surprisingly strong: Claude Code introduced people to the magical AI coding experience, and even though Codex is reportedly as good or better, Claude Code retains significant loyalty. Swyx thinks this may be because we’re still early (Claude Code is only ~1 year old vs. ChatGPT’s 3+ years) and because the current high-volatility phase may not produce the same stickiness as more mature markets.

The Next Frontier: Coding Agents Breaking Containment

Swyx’s central thesis: 2025 was the year of coding agents; 2026 is coding agents breaking containment to do everything else. Because coding agents generate software and software eats the world, coding agents effectively eat the world.
“Dark factories” (a term from Simon Willison) is the next frontier: zero human review of code. You just check in AI-generated code without reviewing it. OpenAI is exploring this. It requires flipping the SDLC — more testing, more automated verification — but unlocks massive software quantity, which in turn enables quality innovation.
Swyx thinks the people who will do best in 2026 are not the cynics who dismiss AI output as slop, but those who engage with it and steer it.

Open Models Are Gaining Ground

Swyx was bearish on open models a year ago (market share appeared to be ~5% and declining). He’s changed his mind — open model usage is going up, even if the capability gap on public benchmarks appears to be increasing (benchmarks are hard to trust due to gaming).
OpenRouter stats show people choosing open models in significant volume, though many are heavily discounted so price-adjusted analysis is needed.
The top 20% of the AI industry is moving toward more open models. Fireworks and Together AI are “crushing it.” Fine-tuning as a service, which Swyx previously thought wouldn’t work, is now viable as a derivative of the open models market.
The shift is driven by workloads scaling to the point where cost and speed matter — moving from “what can these models do?” to “how do we do them cheaper and faster?”

What Swyx Has Changed His Mind On

Open models: Was bearish, now bullish (see above).
Coding: Has come “full 360” — now deeply bullish on the scale and trajectory of AI coding.
RL/post-training for quality: Previously thought it wasn’t worth it since base models improve every 3–6 months. Now thinks it can be justified if it’s the single best thing you can do for a customer outcome in a 3-month window, even if you throw the trained model out later when base models catch up. The raw data and synthetic rubrics (including multi-turn RL work like Dr. GRPO) retain value.
Dark factories / zero human review: Didn’t appreciate this frontier until recently.

Unanswered Questions and Next Frontiers

Memory and personalization: The biggest limiting constraint on current LLMs. Context length has scaled slowly (4K to 1M tokens over ~3 years). Memory systems will likely determine product choice more than current frequency-of-mention advantages.
World models: Swyx sees this as critical for improving intelligence beyond next-token prediction — does the AI understand what a table is, what matter is, what physics is? He references Fei-Fei Li’s essay on spatial intelligence as the right problem statement, even if she doesn’t have the solution yet.
Robotics is the current hype manifestation of world models, but Swyx thinks the deeper stake is a more fundamental conception of intelligence.
His analogy: current LLMs are like Matt Damon’s character in Good Will Hunting — they know everything from books but have never lived it.

Summary

What Top AI Engineers Are Focused On

Has AI Infrastructure Finally Stabilized?

When Does Doing RL / Own-Model Training Make Sense?

The AI Coding Wars

Consumer AI Has Hit a Plateau

The Next Frontier: Coding Agents Breaking Containment

Open Models Are Gaining Ground

What Swyx Has Changed His Mind On

Unanswered Questions and Next Frontiers