Nat Friedman (GitHub CEO 2018–2021, founder of AI Grant and California YIMBY) has launched the Vesuvius Challenge, an open competition to virtually read carbonized papyrus scrolls buried by the eruption of Mount Vesuvius in AD 79. The episode covers the history of the scrolls, the technical challenge of reading them without opening them, Nat’s philosophy of finding high-leverage projects, the GitHub acquisition and Copilot origin story, his views on open source and AI, and his personal operating principles.
The Vesuvius Challenge
In AD 79, Vesuvius buried the Roman villa district of Herculaneum under ~20 meters of volcanic mud and ash, carbonizing a large library of papyrus scrolls.
Herculaneum was an upscale town (compared to Pompeii); one enormous villa had belonged to Julius Caesar’s father-in-law and housed a major library.
The scrolls were rediscovered in the 1700s when a farm worker digging a well struck marble paving ~60 feet down; a Swiss engineer then tunneled and looted statues and art, initially discarding what looked like charcoal lumps until writing was noticed on some.
Hundreds of scrolls were destroyed in early attempts to open them (cutting with daggers, applying mercury, oils, rose water); an Italian monk named Piaggio later built a machine to unroll them extremely slowly (~0.5 cm/day), recovering a few Greek philosophical texts by the Epicurean philosopher Philodemus.
Roughly 600+ scrolls survive intact today but remain unreadable by physical means.
The potential payoff is enormous: historians estimate that reading all the surviving scrolls could approximately double the total body of text known from antiquity (themed as “multiple Shakespeares”).
The excavated scrolls so far are mostly Philodemus’s working library, but historians believe the villa’s main library—possibly Latin and much larger—has not yet been excavated.
There are hints (scrolls found in hallways, crates of Latin scrolls accidentally destroyed during excavation) that other literary, historical, or even early Christian texts could be present; some scrolls predate the villa by centuries, raising the remote possibility of material evacuated from the Library of Alexandria.
Nat personally experimented with making and carbonizing papyrus to understand the material: heating it in a Dutch oven at ~500°F for several hours produced fragile, blistered black rolls that crumble to dust when touched, confirming how destructive physical unrolling attempts have been.
The Technical Approach: Virtual Unwrapping
Brent Seales at the University of Kentucky pioneered “virtual unwrapping” and demonstrated it on the En-Gedi scroll (a carbonized scroll from a burned temple in Israel containing an early part of Leviticus), using 3D X-ray CT scanning plus computer vision to segment layers and flatten them geometrically.
Applying this to the Herculaneum scrolls is much harder for two main reasons:
Ink visibility: The ink used on these scrolls is nearly equally X-ray-absorbent as the papyrus, so letters do not show up with high contrast in scans (unlike the En-Gedi scroll).
Scroll distortion: The scrolls were long, tightly wound, and deformed by volcanic heat and mud, making layer segmentation difficult.
Seales’s team took scrolls to the Diamond Light Source particle accelerator in Oxford and produced 3D CT scans at ~8-micron resolution (each slice is a ~100+ MB TIFF file).
The key breakthrough enabling the challenge now: Seales’s lab trained a convolutional neural network to recognize ink inside CT scans by using broken-off fragments as ground truth.
Fragments that broke off during past unrolling attempts sometimes show visible lettering, especially under infrared imaging (~930 nm).
They CT-scan the fragments, register the infrared images, and use the paired data to train ink-detection models.
This works on fragments, but translating it to intact scrolls (where ink may be pressed against another papyrus layer rather than air) is still an open problem.
Nat estimates roughly a 50% chance the challenge will be solved this year, because the core techniques have been demonstrated individually and the data is likely of sufficient resolution (~6–8 voxels across a papyrus layer).
Why a Prize / Competition
Nat and Daniel Gross are funding a prize (in the style of the X PRIZE) for the first team to read substantial real text from a scroll without opening it.
Rationale:
The search space of possible approaches is large; an open competition gets many people trying many things in parallel.
It is likely faster than relying on a single academic team, even though Seales’s group could probably solve it on their own given more time.
Non-ML approaches are also possible and welcome.
Nat’s theory of leverage: if even one scroll is read successfully, it will catalyze funding to scan the remaining hundreds (cost estimated in the low millions) and to excavate the rest of the villa, because the value of the remaining material will become obvious.
Finding High-Leverage Projects
Nat’s general approach:
He does not assume the world is efficient—he does not reflexively think “someone must already be doing this.”
He trusts his own enthusiasm: if he gets excited about something and the excitement persists, he commits impulsively, even when the work turns out to be far harder than expected.
Examples: California YIMBY (small seed funding, outsized political influence) and the Vesuvius Challenge.
He notes that even in domains assumed to be highly efficient (e.g., hedge funds), secrets can be extremely valuable and yet persist for years—suggesting the world is less efficient than people think.
Anecdote: traders at a successful hedge fund estimated that broadcasting their 30-day-old strategies for just 10 minutes/month on Twitch would reduce profits by ~80% after a year; a 10-year lookback would be needed before disclosure had only a small effect.
Open Source and AI
Nat’s views on open-source AI are evolving:
He is increasingly worried about safety (industrial-accident-type scenarios or misuse), though he is not focused on short-term risks.
He still believes, on balance, that more people tinkering with models is beneficial (e.g., Georgi Gerganov’s 4-bit quantization of LLaMA running on a laptop excites him).
On data availability for training:
The two key inputs—internet-scale data and commodity GPUs (originally driven by video games)—remain abundant.
Proprietary but scrapable datasets may become harder to access, but overall the open-source world is unlikely to be data-limited for a long time.
He expects a proliferation of foundation models, because:
Best practices are easy to write down and spread.
Data is mostly public.
Hardware is commodity and improving rapidly.
Training-efficiency secrets (e.g., 50–200% improvements) are relatively simple and will diffuse.
The main force toward concentration would be training costs, but Nat is not confident that a trillion-dollar model will be enough more valuable than a $100B model to justify that spend, especially if efficiency techniques continue to improve.
GitHub Acquisition and Copilot
How the acquisition happened:
Nat joined Microsoft after it acquired his company Xamarin in 2016; within his first week he suggested to Satya Nadella that Microsoft should buy GitHub.
About a year later he sent a memo arguing that developers now drive IT purchasing decisions, that Microsoft was still perceived as an IT company rather than a developer company, and that GitHub housed the largest collection of a new generation of developers with no Microsoft affinity.
Satya approved quickly; three weeks later they had a signed term sheet and announced deal. Nat was appointed CEO and asked to run GitHub independently, as Microsoft had done with LinkedIn.
Dealing with skepticism:
The deal leaked before announcement and triggered a wave of negative posts; Nat’s open-source background helped a few people give it a chance.
On his first day as CEO, he addressed the engineering and product leaders and committed to shipping one community-requested fix by the end of the day, then repeating that for 100 days—to signal that GitHub’s priority was developers, not Microsoft.
He believes this helped earn trust, though he acknowledges most acquisitions fail because the acquired company’s culture (the fragile “productive harmonic” that enables innovation) is easily destroyed by mismatch with the parent culture.
Copilot origin story:
GPT-3 launched in May 2020; Nat, then GitHub CEO, immediately wanted to build products with it. Microsoft had already invested $1B in OpenAI at Kevin Scott’s urging (a prescient bet, since other companies could have done the same and did not).
Early prototypes were a Stack Overflow–style Q&A chatbot, but the model was too unreliable (~25% great answers, 75% useless/wrong) for a good product.
They then tried large-chunk code synthesis (whole function bodies), but the UI was awkward: users had to explicitly request completions, wait, and choose among multiple options—cognitively taxing and often unsatisfying.
The breakthrough was context-sensitive autocomplete: using the cursor position in the AST to decide whether to complete a single line or a full block, and rendering suggestions as gray text (inspired by Gmail) without requiring user interaction.
Once they had a model small enough for low latency but large enough for accuracy, retention was extremely high (~60%+ after 30 days for an intrusive alpha product), and it became obvious the product was good.
Copilot has since improved dramatically; Nat estimates the share of code written by Copilot has risen from the low 20s at launch to over 50% today, with a plausible path to ~95%.
Nat’s Operating Principles (from nat.org)
Raise the ceiling, not the floor: Technologies developed for rich-world markets (e.g., cell phones) eventually become cheap enough to transform developing regions, whereas many well-intentioned “raise the floor” projects (satellites, balloons, mesh networks for Africa) have underdelivered relative to their ambitions.
Model the world as ~500 people, not 8 billion: Most people are optimizing locally and are highly mimetic; genuinely decorrelated thinking is rare. The internet has correlated us further, making it harder to have original ideas—except for disagreeable or autistic people who care less about social consensus.
Large-scale engineering projects are more soluble in IQ than they appear: AI tools like Copilot raise the productivity of average and top engineers alike; Nat expects software development to change enormously, possibly shifting toward more special-purpose, custom software.
The cultural prohibition on micromanagement is harmful: Great individuals should be fully empowered to exercise their judgment. Nat prefers high-variance outcomes (excellent auteur-driven products or clear failures that can be iterated on) over consensus-driven mediocrity. He practiced this at GitHub, where he was unusually hands-on with product decisions for a CEO.
AI Economics and Outlook
Text-to-text tasks currently account for low double-digit percentages of the economy (per Bureau of Labor Statistics analysis), but the cost curve is falling rapidly.
Nat’s “bear case” for AI: we may be in a diminishing-returns regime where, for example, GPT-4 is ~100x more expensive to train than GPT-3 but not 100x more capable or economically valuable; sigmoid-shaped scaling curves make it hard to tell when returns will asymptote.
He is nonetheless very optimistic overall and expects widespread proliferation of foundation models rather than oligopoly, because the ingredients (data, hardware, techniques) are broadly available and secrets diffuse quickly.
He notes that incumbents have gotten smarter: unlike prior platform shifts (PC, web, mobile, cloud), where incumbents derided the new thing, today’s large companies are aggressively adopting AI.
Personal Reflections
Nat describes himself as an insecure overachiever who does not feel he has done anything of tremendous consequence; he deliberately avoids nostalgia and trophies, preferring to look forward.
He sees AI as the area he is most focused on going forward, looking for places where he can contribute.
He is investing in both capabilities and alignment, arguing they may not be opposing forces; he is puzzled by the lack of an open-source technical alignment community and hopes one will emerge.