The Problem with AI Agents No One is Talking About | Yutori, Abhishek Das

EO 12min 3 min #11
The Problem with AI Agents No One is Talking About | Yutori, Abhishek Das
Watch on YouTube

Summary

  • Abhishek Das, co-founder and co-CEO of Yutori, discusses building reliable web agents, his research background, and why the AI industry’s tolerance for unreliable agent products is a problem.
    • Yutori is building agents that take actions and complete tasks on users’ behalf on the web, so people can focus on more meaningful work.
    • The three co-founders are all AI researchers by background, and the company is backed by investors including Fei-Fei Li and Jeff Dean.

From IIT Roorkee to Building Software

  • Das studied electrical engineering at IIT Roorkee but quickly realized his interests lay elsewhere.
    • He stopped focusing on coursework after his first year and instead spent most of his time learning programming and building software.
    • IIT Roorkee had a strong programming culture, particularly through a group called SDSLabs, a small cohort of coders building applications for the campus intranet.
    • Seeing users interact with what he built created a “dopamine hit” that kept him motivated, and being surrounded by equally obsessed peers amplified that drive.

The Last Generation to Use a Browser

  • Das had wanted to start his own company for a long time, considering it at the end of undergrad and PhD, but the timing finally felt right.
    • He believes web browsers have remained largely unchanged for two to three decades and that there is an opportunity to reimagine the experience.
    • The future involves talking to AI assistants that take actions and complete tasks on the web, with many agents working proactively in the background.
    • He sees digital agents arriving before physical agents, and envisions humans and agents working together to improve productivity, not replace humans entirely.
    • It also makes the web more accessible, for example his parents no longer needing to learn every new website if an assistant can do things for them reliably.

The Problem with AI Agents No One Is Talking About

  • There are hundreds of agent products claiming to do anything on the web, but most don’t actually work reliably.
    • Das pushes back on the normalization of non-determinism and low reliability in shipped agentic products.
    • Agents make sequences of decisions, and even with 90% accuracy at each step, error compounds quickly across 10, 20, or 50-step workflows, making overall success rates quite low.
    • A critical capability is the ability to recognize mistakes and backtrack to correct course, which current models largely lack.
    • Yutori runs every production query through comprehensive evals to identify where agents perform well versus where they fail, and which domains need more work.
    • Because new websites appear constantly, models will always encounter unfamiliar sites, so the key question is whether a model can recognize its own mistakes and self-correct.

The 80/20 Rule and Building Taste

  • Yutori takes an 80/20 approach to product development, prioritizing the top 10 features out of 100 possibilities.
    • Some priorities come from direct user feedback, but many come from builder intuition about features users haven’t explicitly asked for.
    • He gives the example of iOS and Android auto-reading two-factor authentication SMS codes and filling them in, a feature hard to imagine anyone requesting explicitly but which saves small amounts of friction many times a day for millions of people.
    • In a world where coding LLMs make first prototypes easy to produce, the true differentiator is taste, craft, and how intuitive and well-designed the product is.
    • The team dogfoods the product every week, dedicating an hour and a half to testing new features internally, which refines their sense of what feels good, bad, or magical.
    • At any given time they run tens of experiments internally, with only one potentially shipping to external users.

Why Reliability Matters More Than Raw Performance

  • Das was a supporting author on the Grad-CAM paper during his PhD, a project led by a lab mate that has received 20,000 to 30,000 citations.
    • Grad-CAM addressed interpretability in deep learning classification models, showing what part of an image a model was looking at to make predictions.
    • The core idea, that models should convey not just the final answer but the proof of work behind it, carries directly into how Yutori builds its product today.
    • Yutori’s Scouts feature lets users set up agents to monitor the web and generate reports, with a UI button that lets users inspect which websites were visited and what the agent actually looked at.
    • This transparency is critical for building user trust in a reliable product.
    • Das believes that attention to detail in visible parts of a product makes users more likely to trust the parts they cannot see, and that building something meaningful and reliable takes time and deliberate effort.
Back to EO