The AI Developer Platform with an 80,000-Person Waitlist

Unsupervised Learning 1h5 7 min #7
The AI Developer Platform with an 80,000-Person Waitlist
Watch on YouTube

Summary

  • Harrison Chase, founder and CEO of LangChain, joins the podcast to discuss the state of AI development, LangChain’s evolution, and the future of agents, evaluation, and personalization. LangChain is the most popular open-source framework for building LLM applications, with 38,000+ Discord members and adoption by companies like Elastic, Dropbox, and Snowflake. The episode covers LangChain’s broad orchestration capabilities, the launch of LangSmith (a paid observability and testing platform now in general availability with 1,000+ teams), the current agent landscape, evaluation best practices, and Harrison’s views on what’s overhyped and underhyped in AI. He also shares advice for startups and reflects on being at the center of the AI developer ecosystem.

LangChain’s scope and architecture

  • LangChain started as an orchestration layer for building LLM applications, connecting models to external data and computation sources.
    • Core use cases include chatbots over data, retrieval-augmented generation (RAG), extraction, text-to-SQL, and API integrations.
    • The framework is intentionally broad because LLMs themselves are general-purpose, so the orchestration layer must support a wide range of applications.
  • Three interconnected technical pillars have driven LangChain’s development: retrieval, agents, and evaluation.
    • These areas are deeply related—agents use retrieval as a tool, evaluation is needed for both, and agents can assist in evaluation.
    • Harrison argues it’s hard to focus on just one because they form a flywheel.
  • LangChain recently released LangChain Expression Language and LangGraph, which provide a more flexible, lower-level orchestration layer.
    • Earlier versions used higher-level “chains” that were hard to customize without modifying source code.
    • The new abstractions treat components as composable functions with streaming as a first-class citizen.
    • LangGraph specifically enables building agents as state machines (graphs), which Harrison sees as the future direction for reliable agent design.

LangSmith: observability and evaluation platform

  • LangSmith is a separate SaaS platform focused on tracing, observability, testing, and evaluation for LLM applications.
    • It logs all steps of a chain or agent, including inputs, outputs, and exact sequence—critical for debugging multi-step applications.
    • Even single-prompt applications benefit from seeing exactly what the LLM receives (e.g., how conversational history is trimmed).
  • Testing and evaluation are harder to get value from than observability because they require more user effort.
    • Users must create evaluation datasets, define metrics, and review results.
    • Harrison sees this as an area where human-in-the-loop review is genuinely valuable, not a shortcoming—looking at data helps developers understand these systems deeply.
  • LangSmith is framework-agnostic and can be used with or without LangChain.
    • It treats the system being tested as a function and evaluators as functions, keeping abstractions low-level and generalizable.
  • The platform recently reached general availability after six months of iteration, with strong demand from teams moving from prototype to production.

Evaluation best practices

  • Teams face several key questions when building evaluation pipelines:
    • How to gather datasets: Typically start with ~20 hand-labeled examples, then incorporate edge cases from production traces (e.g., thumbs-down feedback).
    • How to evaluate individual data points: Classification tasks can use traditional ML; many tasks require LLM-as-a-judge, which introduces noise and still needs human oversight.
    • How to aggregate metrics: Some teams need high accuracy on critical datasets; others just need to confirm improvement over a previous version.
    • How often to evaluate: Mostly done before releases due to cost and manual effort; the goal is to eventually run evals in CI like unit tests.
  • Harrison emphasizes that creating an evaluation dataset is a forcing function that clarifies what the system should actually do—something often skipped because LLMs let you start building without defining expectations.
  • Best practices are just starting to emerge as more applications reach production (e.g., Elastic’s assistant, Notion QA).

The agent landscape

  • There was a hype cycle around fully autonomous generalizable agents (like AutoGPT), which proved unreliable.
    • The field has shifted toward more focused, constrained agents with specific prompts and tools.
  • Multi-agent frameworks (AutoGen, CrewAI) are gaining traction, but Harrison sees them as essentially state machines with controlled transitions between specific states.
    • This mental model makes them more palatable because developers can enforce specific transition probabilities and define clear states.
    • Example: A customer support bot with distinct states for gathering account info vs. debugging an issue, each with dedicated prompts.
  • LangGraph was released to support this state-machine approach to agent construction.
  • Harrison is bullish on state machines as a paradigm because they provide a helpful mental model for developers and allow enforcement of control flow.
  • Harrison identifies three categories of builders using LangChain:
    1. AI-native startups: Building forward-thinking, consumer-facing agents; still early and experimental.
    2. Digital-native companies: Shipping production applications like Notion QA and Elastic’s assistant—sophisticated, personalized, at-scale chat over knowledge bases.
    3. Enterprises: Much of the work is internal-facing (e.g., internal assistant platforms similar to GPT Store but connected to company data and APIs), which allows more risk-taking since failures aren’t customer-facing.
    • OpenGPTs (an open-source project by LangChain) is being adopted by companies for this internal use case.
  • Harrison predicts more complex chatbots represented as state machines (e.g., AI therapists, multi-stage support bots) and longer-running jobs (e.g., GPT Researcher generating research reports) will become more common in 2024.
    • Longer-running jobs require UX innovation since they take time and can produce imperfect first drafts.

Balancing present needs with future uncertainty

  • The AI space moves extremely fast, creating tension between building for current needs and staying flexible for future changes.
    • Harrison’s approach: build what’s needed now (even if it might be a “hack”) while investing in flexible abstractions.
    • LangChain waited to release version 0.1 until after multimodal models emerged to avoid broken abstractions.
    • Base classes are kept dead simple with minimal assumptions (e.g., no built-in retry logic in the base model class) to maximize flexibility.
  • Harrison believes the space is more stable now than six months ago, allowing for a stable release, though he acknowledges this could prove wrong.

Competitive landscape and LangServe

  • Harrison sees parallels with traditional observability (e.g., Datadog) but believes the value propositions are currently different.
    • LangSmith focuses on helping developers understand and iterate on non-deterministic LLM applications.
    • Datadog focuses on system-level monitoring and aggregate metrics.
    • Some companies use both together.
  • LangServe was released to make deploying LangChain applications easy.
    • Built on FastAPI, it provides standard invoke/batch/stream endpoints and auto-generates a playground for testing.
    • The playground facilitates cross-functional collaboration—letting PMs, subject matter experts, and non-technical stakeholders interact with and give feedback on applications.
    • Harrison delayed building a deployment platform because observability and testing were bigger blockers for teams.

Team and resource allocation

  • LangChain’s team is 18 people, split roughly 50/50 between LangSmith and open-source work.
    • Open-source work fluctuates between solidifying core abstractions and building use-case-specific resources.
    • LangSmith investment has ramped up due to strong demand and clear value.
  • Resource allocation is driven by where the team can provide the most value to users at any given time.

What will become obsolete

  • Context window management tricks (summarizing conversation history) may go away as context windows grow.
  • Multimodal models are not yet good enough for precise knowledge work requiring spatial awareness; Harrison thinks this will improve.
  • RAG is here to stay; Harrison doesn’t think it will be obviated.
  • State machines for agents will likely persist because (1) complex instructions and state-dependent retrieval still benefit from explicit control flow, and (2) the mental model is genuinely helpful for developers.
  • Prompt engineering hacks like “write this in JSON” should die as function calling and structured extraction improve.

Overhyped and underhyped

  • Overhyped: Multimodal AI—not yet good enough for the real use cases people want.
  • Underhyped: Few-shot prompting—teams having success are using it heavily, and dynamic example selection (pulling relevant examples based on the current input) could enable lightweight continual learning and personalization.

Surprises in building LangChain

  • Streaming was far more important and harder to implement correctly in an orchestration framework than Harrison initially realized.
    • LangChain Expression Language was redesigned to make streaming a first-class citizen.
  • OpenGPTs (a recreation of the GPT Store concept in open source) got more attention and positive response than expected, especially from enterprises wanting control over their systems.
    • Harrison realized LangChain wasn’t doing a good job showing people how to do interesting things with the framework—OpenGPTs made it more accessible.

Open-source models

  • Harrison thinks open-source models will become more ubiquitous but aren’t yet capable of handling the most interesting use cases (where people still use OpenAI).
  • He expects cool desktop applications running locally once local models improve, especially for personalized use cases where users don’t want data leaving their machines.

Most exciting AI startups

  • New Computer (by Sam Whitmore): Working on memory and personalization.
  • Fireworks and Together: Doing strong work on hosting and fine-tuning open-source models.
  • Ama: Models are impressive; LangChain collaborates with them.
  • Heart AI: Application layer, focused on relationship management with high personalization.
  • Harrison is particularly interested in personalization at the user level as a potential “second wave” of AI apps, analogous to how killer apps emerged years after the iPhone.

What Harrison would build

  • A journal app with memory: Users write journal entries (rich with personal context), then the AI starts conversations pulling in memories from previous entries and interactions.
    • The journaling UX ensures users are in a reflective mindset, providing high-quality personal data.
    • LangChain may release a demo of this.

Being at the center of the AI ecosystem

  • Harrison had ~200 Twitter followers when LangChain started and hadn’t done open source before.
  • He finds it a fun time to build and is focused on enjoying the process and providing value to developers.
  • Where to follow: Blog at blog.langchain.dev, Twitter, and YouTube (where LangChain is investing heavily in educational content like RAG explainers and build-from-scratch series).

Debrief highlights

  • Eval as a product design tool: Creating evaluation datasets forces clarity on what the system should do—valuable at the start of product development, not just for testing afterward.
  • “No GPUs before PMF”: Use GPT-4 until you have product-market fit, then optimize costs. (Parallel to Arvin’s advice from Perplexity.)
  • “You have to build”: Rather than endlessly speculating about whether current techniques will be obviated, developers should build now and iterate.
  • Personalization as the next frontier: Both Harrison and the hosts see user-level personalization as a potential step-change improvement in AI applications, analogous to how dynamic web content replaced static pages.
  • Enterprise adoption is happening but mostly internal: Regulated industries are cautious about user-facing AI, but internal tools are advancing rapidly.
  • 2024 as the year of AI applications: Drawing parallels to how foundational technologies (iPhone, internet, cloud) saw their most interesting applications emerge a few years after the initial platform shift.
Back to Unsupervised Learning