Stacked diffs and tooling at Meta with Tomas Reimers — The Pragmatic Engineer

Tomas Reimers, former Meta engineer and co-founder of Graphite, explains why Meta built its own deeply integrated internal developer tooling instead of using industry-standard solutions like GitHub, and how practices like stacked diffs—now gaining wider adoption—emerged from the unique constraints of large-scale engineering organizations.

Meta’s internal tools are highly interconnected, forming a seamless workflow from development to deployment:
- Fabricator: Internal code review tool (called “diffs” instead of pull requests).
- Sand Castle: Internal CI system (analogous to GitHub Actions or Buildkite).
- On Demand: Internal dev boxes for testing.
- Land Castle: System for rolling out code to users.
- These tools natively share data—e.g., a single diff view shows translation status, feature flag experiment results, rollout progress (1% → 10% of users), and AB test outcomes.
Integration extended beyond code review:
- Meta built its own calendar, task system (like Jira/Linear), and translation platform.
- Example: If a diff uses an untranslated string, Fabricator blocks deployment until translations exist.
This level of integration is rare outside companies like Google; most firms rely on fragmented third-party tools with superficial integrations.

Herald (later replaced by Butterfly Bot) was a rules engine triggered by events during code review:
- Example 1: Automatically comment on PRs using deprecated APIs, suggesting new syntax.
- Example 2: Notify maintainers if new call sites are added to a deprecated API.
- Example 3: Auto-add specific teams as reviewers or subscribers when sensitive code is modified.
Enabled scalable enforcement of engineering policies without manual oversight.

Meta alternated between requiring and removing code ownership:
- Initially required → removed for collaboration → reintroduced for critical systems → later relaxed again.
Contrast with Google’s hierarchical model:
- Ownership defined per folder; easy to identify reviewers and contact them directly via file tree navigation.
Tomas’s view: Code ownership depends on business context:
- High-trust, low-risk environments → rely on culture.
- Low-trust, high-stakes domains (e.g., payments, privacy) → enforce via automation.
- Even within one company, different teams (mobile vs. web, payments vs. UI) may need different constraints.

Problem: Engineers blocked on review before building next feature; large PRs lead to rubber-stamping.
Solution: Stack diffs—branch off your own unreviewed PR and keep working:
- Each small, logical unit (e.g., server → frontend) becomes a separate diff.
- Early diffs can merge while later ones are still in review.
Benefits:
- Reduces merge conflicts.
- Speeds up time-to-merge.
- Makes bugs easier to isolate (10-line vs. 2,000-line PR).
Challenges without tooling:
- Git rebases become complex when updating base branches.
- Reviewers confused if UI doesn’t clearly show stacking relationships.
Why only big companies invented it:
- High shipping volume + monorepos + resources to build custom tooling.
- Not suited for open source: contributors may disappear, so partial/incomplete work shouldn’t be merged.

Meta’s journey:
- Started with emailing zip files → evolved into multiple “poly lith” repos (web, Instagram, mobile, tooling).
- Moved toward a single monorepo to unify dependencies, simplify cross-team collaboration, and stabilize CI.
Advantages of monorepos:
- Single source of truth.
- Easier dependency management.
- Enforce consistent practices (e.g., minimum reviewers).
Polyrepos still dominate open source due to decentralized authorship and versioning.
GitHub’s design encourages polyrepos (mirroring open source norms), but industry is shifting:
- Companies like Shopify, Uber, and Meta adopt monorepos as they scale.
- Microservices ≠ multiple repos—Uber runs 5,000+ microservices in per-language monorepos.

Immediate effect: More code written → more PRs, larger PRs, more bugs.
Second-order effects:
- Code review becomes the critical bottleneck.
- “Vibe coding” (generating code without reading it) increases review burden.
How top companies will adapt:
- Adopt practices from high-volume orgs (Google, Meta, Uber): smaller PRs, better tooling, clear ownership.
- Use AI to handle mechanical review tasks (correctness, style), freeing humans for higher-level concerns:
  - Intent alignment.
  - System-wide implications.
  - Knowledge sharing and tribal knowledge transfer.
Testing evolution:
- AI can already generate unit and integration tests.
- End-to-end “sanity testing” (e.g., payment flows) increasingly automated.
Accountability remains human:
- Business logic decisions must be reviewed by people.
- Linux model as inspiration: contributors own their patches; responsibility flows up through maintainers.

Stacking didn’t spread outside Meta because Fabricator wasn’t available externally.
At Meta, stacking spread via evangelism:
- Early adopters (senior engineers) saw 3x productivity gains.
- They gave internal talks convincing skeptics.
- Once tried, most engineers didn’t go back (“once you stack, you don’t go back”).
Graphite’s pivot:
- Founded to build mobile release tools.
- Co-founders (ex-Meta) noticed missing tooling in the wild.
- Built internal stacking CLI for themselves.
- Former colleagues asked to use it → pivoted to code review in November 2021.
Selling to developers is hard:
- Engineering cultures vary widely.
- Must support diverse workflows and integrate with existing stacks.
- Unlike internal tools, requires marketing, sales, and customer success.

All metrics are proxies for velocity and health:
- Number of PRs: Rough proxy for output.
- Time to merge: Measures end-to-end efficiency.
- Idle review time (Uber’s metric): Time a PR sits with no action—highlights coordination failures.
Focus time (uninterrupted coding blocks) is valuable but hard to measure.
As AI increases PR volume, these metrics will become critical for all companies—not just large ones.

Tool from Meta now outside: Statsig (feature flagging + experimentation), combining Meta’s Gatekeeper and Deltoid.
Favorite language: TypeScript—praised for its type system, formal foundations, and evolution of JavaScript.
Book recommendations:
- Fiction: The Last Days of Night (historical fiction about Tesla, Edison, Westinghouse).
- Non-fiction: The Timeless Way of Building—argues users should build their own tools for sustainable design.

Summary