The history of servers, the cloud, and what’s next – with Oxide

The Pragmatic Engineer 1h39 5 min #60
The history of servers, the cloud, and what’s next – with Oxide
Watch on YouTube

Summary

  • Brian Cantrell’s journey from Sun Microsystems to Oxide traces the evolution of servers, cloud infrastructure, and computing hardware from the 1990s to today, revealing how economic cycles, open-source software, and hyperscaler custom infrastructure reshaped the industry — and why Oxide is now building a clean-sheet rack-scale computer to bring cloud-like on-premises infrastructure to a broader market.

The Dotcom Boom and Bust at Sun Microsystems

  • In the mid-1990s, Sun Microsystems was at the epicenter of the internet buildout: if you wanted to build a serious web presence, you bought Sun servers running Solaris on SPARC, often alongside Cisco switches.
    • PCs weren’t viable because Linux was still a hobbyist OS, BSD was under legal threat from AT&T, and GNU’s Hurd was perpetually vaporware.
    • Sun’s Solaris on SPARC offered a mature, high-performance Unix environment that matched the needs of databases and enterprise applications.
  • The dotcom boom was frenetic and unsustainable: customers ordered tens of thousands of servers, spending was extravagant (Brian recalls being drunk on 1952 Château d’Yquem at a corporate dinner), and everyone believed growth would last forever.
    • Brian’s key lesson: booms go on longer than you think possible, but when they collapse, they collapse faster than you can fathom.
    • The bust hit telecom spending like a cliff — orders from telecom customers at Sun went to zero in November 2000.
  • Counterintuitively, the bust produced better technical work than the boom: ZFS, DTrace, and the Service Management Facility were all developed between 2001 and 2005.
    • Innovation requires a degree of desperation; good economic times make it hard to summon that focus.
    • Sun open-sourced Solaris in 2005, giving these technologies “eternal life.”

The Shift to x86, Linux, and the Cloud

  • SPARC lost to x86 in the late 1990s: Intel architected around the memory wall using speculative execution, and by 2004–2005, x86 was the leading-edge microprocessor.
  • Linux matured rapidly in the 2000s, backed by IBM, SGI, and others contributing technologies like XFS. Google was always built on Linux.
  • AWS launched EC2 in 2006, and from roughly 2010 to 2014, it was a period of relentless execution with no real competitors — Azure and GCP were afterthoughts.
    • AWS deliberately obscured its financials, making the business look worse than it was, which deterred competitors.
    • Brian was at Joyent, running a competing public cloud, and knew AWS margins were excellent — S3 was effectively subsidizing Amazon’s war on big-box retail.
  • Kubernetes (starting around 2015) changed the game by providing a layer of cloud neutrality, giving organizations optionality beyond AWS.
    • Google open-sourced Kubernetes partly to help GCP gain traction against AWS’s dominance.
    • The CNCF was formed partly to give Kubernetes the marketing budget Google internally wouldn’t provide.

Hyperscalers Build Their Own Infrastructure

  • Google, Meta, Microsoft, and Amazon all independently concluded that off-the-shelf servers from Dell, HP, and Supermicro were inadequate at scale.
    • Google famously started by velcroing together cheap machines but quickly learned that memory correctness (not just failure) matters, and built custom-engineered systems with DC bus bars and purpose-built power distribution.
    • These companies were never meaningful customers of traditional server vendors because those vendors designed for small server rooms (6–24 racks), not thousands of servers.
  • Joyent was acquired by Samsung in 2016 because Samsung’s AWS bill was enormous and no off-the-shelf product existed to bring that workload on-premises.

Oxide: Building a Cloud Computer from Scratch

  • Oxide was founded on the premise that cloud computing is the future, but you should be able to buy and own it, not just rent it — for economic, security, and risk-management reasons.
    • Basecamp became a poster child for the economic advantage of owning rather than renting, though Oxide targets much larger scale where the economics are even more compelling.
  • Hardware design is fractally complex: power sequencing, signal integrity for DDR5 and PCIe, and the analog reality underlying “digital” systems make clean-sheet computer design extraordinarily difficult.
    • Most engineers rely on reference designs from component vendors; Oxide deliberately did not, hiring fearless electrical engineers from outside the traditional computer industry (e.g., GE Medical, CT systems).
    • Key innovations include blind-mating both power and networking (no cables at all), a custom switch using Intel Tofino programmable silicon, and a DC bus bar power architecture.
      • Blind-mating networking was a “bet the company” decision — hyperscalers said they’d do it if they could start over but were too afraid to retrofit.
  • The software stack is entirely open source: Hubris (a de novo OS in Rust for the service processor), a custom hypervisor, and the control plane (Omicron) that delivers an AWS-like console experience.
    • The hardest software problem was building a robust update mechanism for a distributed system that must update across an air gap with no human intervention.
      • Mupdate (minimum viable update) required taking the rack offline; later work by Dave Pacheco’s team enabled fully automated, zero-downtime updates — a process that took about two years of careful scope and quality management.

AI Tools at Oxide

  • Oxide engineers use LLMs extensively for document comprehension, editing, generating test cases, and answering small questions (e.g., “is this idiomatic Rust?”).
    • Brian finds LLMs more valuable in the small than in the large — as a polishing tool, not at the epicenter of creation.
  • For hardware engineering, LLMs are essentially useless (Brian rates it 0.01 out of 10).
    • Example: during first bring-up, a CPU wouldn’t come out of reset. The root cause was a firmware bug in a voltage regulator not sending an acknowledgement packet — a problem requiring weeks of deep analog debugging, physical measurement, and collaboration with AMD. No LLM could have helped.
    • Existing EDA tools already handle simulation and rule-checking; the hard problems are physical and require human teamwork, desperation, and diverse perspectives.
  • Brian is not a doomer but is frustrated by reductive narratives: “Intelligence is not enough” — building hardware requires characteristics beyond raw intelligence, including teamwork, persistence, and the ability to make observations from different angles.

Company Culture and Values

  • Everyone at Oxide is paid the same transparent base salary (approximately $207,000+ as of the interview), regardless of role — software engineer, support engineer, QA, or electrical engineer.
    • This policy, publicly disclosed in a 2021 blog post, sent hiring “nonlinear” and signaled that Oxide takes its values seriously.
    • It attracts extraordinary people in every discipline, including support engineers who previously had no viable career path in that role.
  • Oxide is fully remote, which works because much hardware engineering uses software tools (EDA, SolidWorks, simulation) that can be run anywhere; physical bring-up requires travel to manufacturing partners (Oxide assembles in Minnesota).
  • Hiring discipline is the top priority as the company scales: every employee has gone through the same rigorous process, which emphasizes values alignment and self-described motivation.
    • Brian warns against the common pattern of companies taking their eye off hiring discipline after raising a large Series B.

Advice and Recommendations

  • For junior engineers aiming to work at a high-bar company like Oxide: focus on getting better every day, not on creating as much as possible. Use LLMs as a private tutor to go deeper and overcome fear of unfamiliar domains.
    • The mindset shift is from “how do I get a job at a big company” to “what would I build if I could build anything?”
  • Book recommendations:
    • Soul of a New Machine by Tracy Kidder — Pulitzer Prize-winning account of building a computer at Data General; every engineer will see themselves in it.
    • Skunk Works by Ben Rich — the history of Lockheed’s Skunk Works and what engineers can do when tasked with the impossible.
    • Steve Jobs and the Next Big Thing by Randall Stross — an unflinching look at Jobs’s failures at NeXT, which Brian believes were essential to his later success at Apple.
Back to The Pragmatic Engineer