Ari Morcos and Rob Toews return for a wide-ranging end-of-year conversation fresh from NeurIPS 2025, covering whether AI models are plateauing, the state of the major labs, the rise of Chinese open source models, and spicy predictions for 2026.
NeurIPS 2025: Growth and Decline of Open Research
NeurIPS attendance grew from 2,000 in 2015 to 30,000 in 2025, a more than 4x increase, yet Ari reflects that the field still feels extremely early despite the scale.
The conference has shifted from open research sharing to PR-driven presentations, with Google and Meta policies preventing publication of work that is actually meaningful for their models.
Researchers often strip large-scale results from papers to comply with internal policies, creating a “Goldilock zone” where papers are good enough to pass review but not so good they reveal competitive advances.
Are Models Plateauing?
Rob argues models are clearly plateauing: the slope of improvement from GPT-1 through GPT-4 was breathtaking, but incremental gains since GPT-4 have decreased significantly, consistent with S-curve dynamics.
He notes that for everyday consumer tasks, models may already be near their ceiling, and fundamental problems like continual learning and sample efficiency remain unsolved.
Ari pushes back: he thinks AI broadly is not plateauing, only consumer LLMs are starting to plateau on standard benchmarks. Video models, for example, are still improving rapidly.
Both agree that even with frozen model capabilities, trillions of dollars of enterprise value remain to be unlocked through deployment and productization.
Enterprise adoption beyond three proven use cases (internal document search, coding, and emerging customer support) remains limited, with accuracy being the primary barrier.
Reinforcement Learning: Where It Works and Where It Doesn’t
RL works best in a “Goldilocks zone” where the model knows enough that its on-policy guesses have some chance of being correct, but not so much that the task is already solved.
Coding models have reached this zone, which is why RL has driven clear improvements there.
Ari argues RL is not a panacea: the real question is how to get models to the point where they are ready for RL in a given domain, and for many enterprise use cases that preparatory work has not been done.
He emphasizes that stages of training (pre-training, mid-training, post-training, RL) are far more interconnected than organizational boundaries suggest, and the field needs to think about them as one continuous process.
Research Vectors to Watch in 2026
Continual learning: Moving away from separate training and inference phases toward models that update their weights in real time and learn throughout their lifecycle, the way humans do. Current workarounds include memory features in ChatGPT and periodic fine-tuning.
Recursive self-improvement: AI that can develop better AI, creating a potential runaway dynamic. OpenAI has publicly stated it wants a working prototype of an AI researcher in 2026.
Ari is skeptical that an AI researcher alone accelerates progress dramatically, arguing the real bottleneck is not idea generation but the cost of experimentation in GPU compute and time, analogous to drug development in biotech.
Data and sample efficiency: Current models require gigawatt-scale data centers and hundreds of millions of dollars to train, compared to the human brain’s extraordinary efficiency. Ari notes that constraints breed innovation, pointing to Chinese labs achieving remarkable results with far fewer resources.
Ari argues the field is in the “hand juicing” era of data utilization, with synthetic data and better curation offering order-of-magnitude gains. Ilia Sutskever’s claim that “pre-training is dead” because we have exhausted internet data was already proven wrong by Gemini 3.
The Neolabs: Can Startups Break Through?
New labs focusing on continual learning and recursive self-improvement are spinning out from the major labs, but both hosts are skeptical they can build independent businesses.
The best argument for neolabs is a version of the innovator’s dilemma: incumbents are locked into existing tech stacks and incentive structures oriented toward scaling, while startups can reason from first principles.
However, Silicon Valley is extremely porous, and the history of multiple independent discoveries suggests that breakthroughs spread quickly. Reasoning models went from OpenAI’s o1 to open-source replicas within three months.
Ari points out that the situation for today’s neolabs is fundamentally different from when OpenAI and Anthropic started: there is no entirely new product surface to capture, and enterprises are unlikely to choose a startup three months ahead over established providers.
The bull case is that whichever neolab cracks the next paradigm first could position itself in an oligopoly, similar to how OpenAI, Anthropic, and Google dominate the current one.
SSI’s Mystique
SSI stands out as the one exception to Silicon Valley’s porous culture of idea-sharing: nobody outside the company knows what they are building.
Their secrecy goes beyond any other lab: employees cannot bring phones into the office and are not allowed to comment on AI at all.
One anecdote: a job candidate was told by Ilia Sutskever that there are “two words” which, if revealed, would make SSI’s mission obvious, but Ilia refused to share them.
Ari gives Ilia credit for calling scaling correctly when others (including Ari himself) were skeptical, but notes that extraordinary claims without evidence should be treated with caution. SSI either has the next big thing or has nothing to leak.
OpenAI’s Code Red
Rob views the “Code Red” announcement as overplayed but symbolically important: it marks the end of OpenAI’s aura of infallibility.
Google holds structural advantages over OpenAI in talent depth, compute resources, vertical integration (TPUs in their 9th generation), cash position, and distribution (default models on phones).
OpenAI is projected to burn $150 billion before becoming profitable, expected around 2029, making it completely dependent on capital markets, while Google can self-fund in perpetuity.
Ari sees the Code Red as analogous to Zuckerberg’s “we’re going to war” moment when Google+ launched: an acknowledgment that OpenAI can no longer rest on its laurels.
OpenAI’s brand velocity with ChatGPT remains strong, but default models on phones (like Apple Maps overtaking Google Maps) will gradually erode its consumer dominance.
Disney and OpenAI Partnership
Disney’s exclusive partnership with OpenAI for video models using Disney IP is seen as smart for both sides: Disney gets favorable valuation and a controlled path for AI-generated content using its properties, avoiding the piracy pitfalls of the streaming transition.
The exclusivity is likely time-limited, but signals that legacy companies are learning to engage proactively with AI rather than resist it.
Thrive Holdings and OpenAI
OpenAI’s partnership with Thrive Holdings follows a pattern of extending reach through collaboration rather than building implementation capabilities in-house, similar to earlier partnerships with McKinsey, BCG, and Bain, and more recently the startup Distill.
The data angle is critical: working closely with application companies in non-tech-forward industries provides real-world feedback that lab evals cannot replicate.
Meta’s Super Intelligence Lab
Ari is increasingly bearish on Meta’s super intelligence efforts: there have been many departures with extremely short tenures, and a large layoff that functioned as a purge of everyone associated with Llama 4.
FAIR (Fundamental AI Research), once a deeply respected lab, is now “on its last legs” with employees actively looking to leave.
Meta is likely shifting to closed-source models, abandoning the open-source strategy that rehabilitated Zuckerberg’s image and created enormous tailwinds for Llama.
Ari is unsure whether a closed-source model from Meta, even if technically excellent, will generate the same enthusiasm as the open-source Llama lineage did.
US-China Chip Dynamics
The Biden administration’s strategy of denying China access to advanced US chips was effective in the short term but is now being partially reversed under Trump, with H200 sales to China permitted.
The calculus: selling China decent-but-not-cutting-edge chips competes with and potentially slows development of China’s domestic chip industry (Huawei, etc.).
Ari takes a stronger stance: handicapping China’s chip access will be viewed as a historic blunder that accelerated China’s domestic chip development by an estimated 5-10 years.
He argues the West fundamentally misunderstands modern China, still viewing it as a “copycat factory” when Chinese labs are genuinely innovating on both model and infrastructure layers.
China may reject the H200s anyway, recognizing that long-term self-sufficiency in chips is more valuable than short-term access to handicapped Western hardware.
The CCP’s ability to think in 10-40 year time horizons, unconstrained by election cycles, gives China a structural advantage in this long-term strategic game.
Amazon’s Nova Forge
Amazon announced Nova Forge, a platform for enterprise model customization, which Ari sees as validation of his long-held thesis that enterprises will build specialized models on their own data.
A key technical insight Nova Forge gets right: for continued training, you need intermediate pre-training checkpoints (before the model has “annealed and been over-baked”), not final checkpoints. Most model providers, including all Chinese labs, only release final checkpoints.
Ari predicts 2026 will be the year enterprise AI moves beyond the three proven use cases and starts to work at scale.
End of Year Reflections
Rob’s biggest surprise: Daniel Gross leaving SSI for Meta, breaking SSI’s culture of total loyalty and secrecy, especially given how much he likely knows about SSI’s secret work.
Ari’s biggest surprises: Meta’s complete about-face on open source (from Llama 3’s success to purging Llama 4 talent and going closed-source) and the rise of Chinese open source models, which in a single year went from DeepSeek V3 to a landscape where Chinese companies are the paragon of open source AI.
Rob changed his mind on: IPO timelines for the big labs. Both Anthropic and OpenAI are now on paths to go public, possibly in 2026 or 2027, driven by pressure from crossover investors and the fact that their businesses now look like real companies with fast-growing revenue.
Ari changed his mind on: RL. After 15 of RL being “the boy who cried wolf,” it has now generalized and produced real practical gains beyond what he would have bet a year ago, though he still considers it overhyped relative to claims that it is the be-all and end-all.
Spicy 2026 Predictions
Rob: Sam Altman will no longer be CEO of OpenAI by the end of 2026. As the company faces real competitive headwinds from Gemini 3 and its narrative shifts, the board or Altman himself may decide a more operationally focused leader (like Fidji Simo, CEO of Applications) is needed to prepare for IPO and manage day-to-day business.
Ari: Greater than 50% chance that at least once in 2026, the best model in the world will be a Chinese open source model with clear consensus that it outperforms all closed-source alternatives. He gives it a greater than 20% chance that the best model at year-end is not from a closed lab.
Jacob: After years of value accruing to AI applications over infrastructure, 2026 will see more value created on the infrastructure side as model stability finally allows real infrastructure companies to emerge, similar to how web search companies have begun connecting to agents.