Stacked diffs and tooling at Meta with Tomas Reimers

The Pragmatic Engineer 1h13 4 min #31
Stacked diffs and tooling at Meta with Tomas Reimers
Watch on YouTube

Summary

  • Tomas Reimers, former Meta engineer and co-founder of Graphite, explains why Meta built its own deeply integrated internal developer tooling instead of using industry-standard solutions like GitHub, and how practices like stacked diffs—now gaining wider adoption—emerged from the unique constraints of large-scale engineering organizations.

Meta’s integrated developer tooling ecosystem

  • Meta’s internal tools are highly interconnected, forming a seamless workflow from development to deployment:
    • Fabricator: Internal code review tool (called “diffs” instead of pull requests).
    • Sand Castle: Internal CI system (analogous to GitHub Actions or Buildkite).
    • On Demand: Internal dev boxes for testing.
    • Land Castle: System for rolling out code to users.
    • These tools natively share data—e.g., a single diff view shows translation status, feature flag experiment results, rollout progress (1% → 10% of users), and AB test outcomes.
  • Integration extended beyond code review:
    • Meta built its own calendar, task system (like Jira/Linear), and translation platform.
    • Example: If a diff uses an untranslated string, Fabricator blocks deployment until translations exist.
  • This level of integration is rare outside companies like Google; most firms rely on fragmented third-party tools with superficial integrations.

Herald: Meta’s rules engine for automated code review actions

  • Herald (later replaced by Butterfly Bot) was a rules engine triggered by events during code review:
    • Example 1: Automatically comment on PRs using deprecated APIs, suggesting new syntax.
    • Example 2: Notify maintainers if new call sites are added to a deprecated API.
    • Example 3: Auto-add specific teams as reviewers or subscribers when sensitive code is modified.
  • Enabled scalable enforcement of engineering policies without manual oversight.

Code ownership: A shifting philosophy

  • Meta alternated between requiring and removing code ownership:
    • Initially required → removed for collaboration → reintroduced for critical systems → later relaxed again.
  • Contrast with Google’s hierarchical model:
    • Ownership defined per folder; easy to identify reviewers and contact them directly via file tree navigation.
  • Tomas’s view: Code ownership depends on business context:
    • High-trust, low-risk environments → rely on culture.
    • Low-trust, high-stakes domains (e.g., payments, privacy) → enforce via automation.
    • Even within one company, different teams (mobile vs. web, payments vs. UI) may need different constraints.

Stacked diffs: Solving the code review bottleneck

  • Problem: Engineers blocked on review before building next feature; large PRs lead to rubber-stamping.
  • Solution: Stack diffs—branch off your own unreviewed PR and keep working:
    • Each small, logical unit (e.g., server → frontend) becomes a separate diff.
    • Early diffs can merge while later ones are still in review.
  • Benefits:
    • Reduces merge conflicts.
    • Speeds up time-to-merge.
    • Makes bugs easier to isolate (10-line vs. 2,000-line PR).
  • Challenges without tooling:
    • Git rebases become complex when updating base branches.
    • Reviewers confused if UI doesn’t clearly show stacking relationships.
  • Why only big companies invented it:
    • High shipping volume + monorepos + resources to build custom tooling.
    • Not suited for open source: contributors may disappear, so partial/incomplete work shouldn’t be merged.

Monorepos: Industry trend driven by collaboration needs

  • Meta’s journey:
    • Started with emailing zip files → evolved into multiple “poly lith” repos (web, Instagram, mobile, tooling).
    • Moved toward a single monorepo to unify dependencies, simplify cross-team collaboration, and stabilize CI.
  • Advantages of monorepos:
    • Single source of truth.
    • Easier dependency management.
    • Enforce consistent practices (e.g., minimum reviewers).
  • Polyrepos still dominate open source due to decentralized authorship and versioning.
  • GitHub’s design encourages polyrepos (mirroring open source norms), but industry is shifting:
    • Companies like Shopify, Uber, and Meta adopt monorepos as they scale.
    • Microservices ≠ multiple repos—Uber runs 5,000+ microservices in per-language monorepos.

AI’s impact on software development and code review

  • Immediate effect: More code written → more PRs, larger PRs, more bugs.
  • Second-order effects:
    • Code review becomes the critical bottleneck.
    • “Vibe coding” (generating code without reading it) increases review burden.
  • How top companies will adapt:
    • Adopt practices from high-volume orgs (Google, Meta, Uber): smaller PRs, better tooling, clear ownership.
    • Use AI to handle mechanical review tasks (correctness, style), freeing humans for higher-level concerns:
      • Intent alignment.
      • System-wide implications.
      • Knowledge sharing and tribal knowledge transfer.
  • Testing evolution:
    • AI can already generate unit and integration tests.
    • End-to-end “sanity testing” (e.g., payment flows) increasingly automated.
  • Accountability remains human:
    • Business logic decisions must be reviewed by people.
    • Linux model as inspiration: contributors own their patches; responsibility flows up through maintainers.

Graphite’s origin: Bringing Meta-grade tooling to the market

  • Stacking didn’t spread outside Meta because Fabricator wasn’t available externally.
  • At Meta, stacking spread via evangelism:
    • Early adopters (senior engineers) saw 3x productivity gains.
    • They gave internal talks convincing skeptics.
    • Once tried, most engineers didn’t go back (“once you stack, you don’t go back”).
  • Graphite’s pivot:
    • Founded to build mobile release tools.
    • Co-founders (ex-Meta) noticed missing tooling in the wild.
    • Built internal stacking CLI for themselves.
    • Former colleagues asked to use it → pivoted to code review in November 2021.
  • Selling to developers is hard:
    • Engineering cultures vary widely.
    • Must support diverse workflows and integrate with existing stacks.
    • Unlike internal tools, requires marketing, sales, and customer success.

Engineering metrics that matter

  • All metrics are proxies for velocity and health:
    • Number of PRs: Rough proxy for output.
    • Time to merge: Measures end-to-end efficiency.
    • Idle review time (Uber’s metric): Time a PR sits with no action—highlights coordination failures.
  • Focus time (uninterrupted coding blocks) is valuable but hard to measure.
  • As AI increases PR volume, these metrics will become critical for all companies—not just large ones.

Rapid fire insights

  • Tool from Meta now outside: Statsig (feature flagging + experimentation), combining Meta’s Gatekeeper and Deltoid.
  • Favorite language: TypeScript—praised for its type system, formal foundations, and evolution of JavaScript.
  • Book recommendations:
    • Fiction: The Last Days of Night (historical fiction about Tesla, Edison, Westinghouse).
    • Non-fiction: The Timeless Way of Building—argues users should build their own tools for sustainable design.
Back to The Pragmatic Engineer