Building Claude Code with Boris Cherny

The Pragmatic Engineer 1h37 7 min #70
Building Claude Code with Boris Cherny
Watch on YouTube

Summary

  • Boris Cherny is the creator and head of Claude Code at Anthropic, previously a Principal Engineer at Meta where he led code quality across Instagram, Facebook, WhatsApp, and Messenger. He joined Anthropic because of its safety-first mission, and his very first pull request was rejected — not because the code was bad, but because he wrote it by hand instead of using the internal AI tool that would become Claude Code.

    • Claude Code started as Boris’s personal side project in late 2024: a simple terminal-based chatbot hitting the Anthropic API. The pivotal moment came when he gave it a single bash tool and asked what music he was listening to — it one-shot an AppleScript program to query his music player. That’s when he realized the model wants to use tools; the key insight is to not put the model in a box but give it programs to run and let it figure things out.
    • Internally, adoption went vertical almost immediately. There was a real debate about whether to release it at all, since it made engineers so productive. The decision to ship was driven by safety: releasing it let Anthropic study how the model behaves “in the wild,” which has made the product significantly safer. Boris emphasizes that at Anthropic, product exists to serve research and safety — not the other way around.
    • Today, roughly 80% of code at Anthropic is written by Claude Code on average. Boris himself ships 20–30 pull requests per day with zero hand-written code, using Opus 4.5/4.6. He runs 5 parallel Claude instances (via terminal tabs or the desktop app), often starting agents on his iOS app first thing in the morning. He uses plan mode to iterate on the approach before letting the model implement, then moves to the next tab while it works.
  • Boris’s daily workflow and how code review works when AI writes everything

    • Boris runs 5 parallel Claude Code sessions, each in its own Git checkout or worktree, mostly in plan mode. He round-robins between them, refining plans and kicking off implementations. He also uses the iOS app to start agents on his phone — estimating he now writes a third to half of his code on a phone, which he never would have predicted.
    • For code review, every PR at Anthropic gets an automated first pass by Claude Code via the Claude agent SDK in CI, which catches roughly 80% of bugs. Claude auto-addresses some issues and flags uncertain ones for humans. A human engineer always does the second pass and approves the change. Boris also uses “best of N” — launching parallel review agents and deduping agents to reduce false positives.
    • Boris still writes lint rules, but now he does it by asking Claude to write them directly on coworkers’ PRs. He uses the GitHub app integration (/setup GitHub) so he can tag @Claude on any PR. Deterministic checks (type checkers, linters, builds) remain essential alongside LLM-based review.
    • For side projects with no users, Boris will commit straight to main. For enterprise products like Claude Code, the bar is higher: security, privacy, and safety require human-in-the-loop review.
  • Claude Code’s architecture and key technical decisions

    • The architecture is deliberately simple: a core query loop plus a set of tools that are constantly added and removed through experimentation. Boris estimates they’ve thrown away roughly 80% of what they’ve built — even the spinner went through about 100 iterations.
    • They tried RAG (retrieval augmented generation) early on with a local vector database and cloud embeddings, but abandoned it. Issues included code drifting out of sync with the index, permission complexity, and the overhead of managing embeddings. Agentic search (glob and grep) outperformed it. Boris partially credits his Instagram experience, where broken tooling forced engineers to search for foo( instead of using click-to-definition — a pattern that works equally well for models.
    • The permission system uses a “Swiss cheese model” of multiple safety layers: classifiers, static analysis, and user-configurable allow lists. Even seemingly safe Unix utilities like find can execute arbitrary code via flags, so the system is conservative by default. Users can allowlist patterns for a session or globally. This design dates back to the very first internal release in September 2024, born from a brainstorm with co-founder Ben Mann about whether agentic safety was even solvable.
    • For prompt injection specifically, they use three layers: model-level alignment training (Opus 4.6 is the most resistant yet), runtime classifiers that block suspicious requests, and summarization via sub-agents for tools like web fetch (the main agent never sees raw fetched content).
  • Engineering culture at Anthropic: no titles, no PRDs, radical prototyping

    • Everyone at Anthropic has the same title — Member of Technical Staff — reflecting a generalist culture where engineers do product work, design, research, and infrastructure. Boris sees this as a glimpse of the future: AI makes it easy for engineers to do product work and for non-engineers to write code, so rigid role boundaries dissolve.
    • PRDs are rarely written. The culture is “show, don’t tell” — send a PR, build a prototype. Boris built 15–20 interactive prototypes for a single feature (the to-do list) in a day and a half, trying each one out before sharing with others. For the recent Agent Teams launch, the team prototyped hundreds of versions over months. Boris estimates at least half his ideas are bad, and the only way to find out is to build and test.
    • Ticketing is left to individual preference. Boris doesn’t use one; some teammates use Asana or notes. In a striking example, Daisy used a swarm of Claude agents to build the plugins system over a single weekend — the swarm created an Asana board, split itself into hundreds of tasks, and implemented the feature with minimal human direction.
    • Non-engineers across Anthropic now code: the entire finance team, half the sales team, designers, data scientists. Boris attributes this to Claude Code lowering the barrier — people start using it for SQL queries or financial forecasts, then naturally cross into writing code.
  • Claude Co-work: bringing agentic AI to non-engineers

    • Co-work was built in roughly 10 days by a small team, entirely using Claude Code. It’s a tab inside the existing Claude desktop app (Electron + TypeScript), running the same Claude agent SDK under the hood. The product logic is relatively simple; the complexity is in safety.
    • Non-technical users get a virtual machine with OS-level integrations to prevent accidental data loss (e.g., deleting family photos). Multiple classifiers run on the backend for safety and prompt injection mitigation. The permission system is rethought from Claude Code’s, with special attention to the Chrome extension that lets Co-work interact with browser-based tools (spreadsheets, Slack, etc.).
    • Boris uses Co-work weekly for project management: it opens a spreadsheet in Chrome, checks for unfilled status rows, and messages engineers on Slack to follow up. It one-shots this entirely.
    • Co-work launched macOS-only (Windows coming soon) as a deliberate choice to launch early and learn, matching Anthropic’s build philosophy. Its growth trajectory has been steeper than Claude Code’s — an instant hit rather than the slow takeoff Claude Code had.
  • Agent Teams (swarms): uncorrelated context windows as a form of test-time compute

    • Agent Teams lets a lead agent delegate to multiple sub-agents with fresh, uncorrelated context windows. This is different from sub-agents that inherit parent context — each teammate starts clean, which Boris describes as a form of test-time compute that improves results on complex tasks.
    • The team experimented with swarms since around September/October 2024. It “clicked” with Opus 4.6, where the model figured out how to coordinate effectively. Internal evaluations showed significant quality improvements on tasks too complex for a single agent.
    • It’s released as a research preview because it consumes many tokens (multiple agents running). Users opt in for complex tasks. Boris notes there’s no single right configuration — the magic is in the uncorrelated windows, not specific agent setups.
    • Plugins were fully built by swarms, and many other internal features use this approach. Boris sees swarms as the answer for any task where a single Claude struggles.
  • The printing press analogy: what happens to software engineers

    • Boris compares this moment to the printing press in the 1400s. Scribes were a tiny elite with a hard-won skill, employed by illiterate kings. The printing press didn’t eliminate writing — it expanded the market for literature by orders of magnitude. Scribes ceased to be scribes; writers and authors emerged as new categories.
    • Similarly, software engineers are the scribes of today: business owners with ideas hire engineers because they can’t code themselves. AI is the printing press. Boris acknowledges a real sense of grief — coding was his identity, something he spent years mastering. But he’s come to see it as a practical means to build things, not an end in itself.
    • The analogy implies we can’t predict what comes next. No one in the 1400s could have predicted the microphone. Boris believes the economy that will emerge from universal coding ability is impossible to foresee but will be transformative.
  • What skills matter now, what doesn’t, and standout engineer archetypes

    • Skills that matter more:
      • Being methodical and hypothesis-driven, especially for debugging — the model helps but you still need the skill during this transition
      • Curiosity and willingness to work beyond your swim lane (engineering + product, design + business, etc.)
      • Adaptability — the model improves so fast that ideas that failed six months ago may work now, and vice versa. Boris constantly has to remind himself of intellectual humility
      • Context switching — Boris describes his work as “managing clouds” rather than deep work. He sees this year rewarding a kind of productive ADHD
    • Skills that matter less:
      • Strong opinions about code style, languages, and frameworks — the model can use any of them and rewrite on demand
      • Deep specialization in a single stack — generalists are increasingly rewarded
    • Standout archetypes Boris sees on his team:
      • Amazing prototypers who can take something from 0 to 0.5
      • Product-market-fit finders who go from 0.5 to 1
      • Hybrid generalists who span disciplines (product + engineering, design + infrastructure, etc.)
    • Boris’s belief that changed most from last year: he’s now much more worried about AI safety. Seeing new risks emerge from the inside has made it his top priority.
  • Book recommendations

    • Cixin Liu’s short stories — Boris has gone down a rabbit hole beyond the Three-Body Problem
    • Accelerando by Charles Stross — hard sci-fi that maps a product roadmap for the next 50 years, from AI takeoff to post-singularity. Boris says it captures the accelerating pace of the current moment
    • Functional Programming in Scala — even though language choice matters less now, the art of functional programming and thinking in types still makes you a better engineer. Boris recommends doing all the exercises (he’s done them three times)
Back to The Pragmatic Engineer