DeepL is a Language AI platform founded by Jarek Kutylowski that enables businesses to communicate across languages, recently valued at $2 billion and serving over 100,000 businesses worldwide. The conversation covers DeepL’s approach to AI translation, its competitive strategy against Google, infrastructure decisions, data labeling philosophy, and the future of synchronous speech translation.
DeepL’s Origins and Competitive Edge
DeepL was building language models well before the recent AI boom, so the release of ChatGPT was less of a technical surprise and more of a watershed moment for public awareness of AI translation.
The company has competed with Google Translate since its inception, and Kutylowski views that competition as a core part of DeepL’s DNA — it creates urgency and drives better outcomes for customers.
He credits DeepL’s success to combining strong academic-level research with deep specialization in high-value business translation use cases, as well as being rooted in Europe, where proximity to many languages motivates the team and sharpens their understanding of the problem.
Product and Use Cases
DeepL serves two primary business functions: helping companies communicate externally to reach new markets, and enabling seamless internal communication across multinational offices.
Examples include a major media company (owner of the Financial Times) translating thousands of articles across Japanese, English, and Chinese, and a Japanese car manufacturer connecting R&D in Japan with customers in Europe and US.
The product supports both technical translation (prioritizing accuracy) and marketing translation (prioritizing native fluency), and allows customers to embed custom terminology — a feature Kutylowski says no other provider has integrated as well.
Terminology control is non-trivial: the system must respect grammatical rules when substituting words, and sometimes override customer-defined substitutions when a word has multiple meanings in context.
Specialized vs. General Models
Kutylowski is a strong advocate for specialized models over general-purpose ones. He argues that for high-value, high-volume use cases like translation, vertically integrated specialized solutions outperform general models.
DeepL maintains ownership of its full stack — from go-to-market and product engineering through model architecture, training, and infrastructure — which allows the company to solve problems that prompt engineering alone cannot address.
He acknowledges that for the “long tail” of use cases, building specialized models doesn’t make financial sense, and companies should rely on standard technology. But for core use cases like translation, specialization creates defensible value.
Data Labeling and Human Translators
DeepL employs thousands of human translators worldwide for both training data generation and quality assurance, and has run large-scale internal data annotation projects for years.
The company hires native speakers in each target language (e.g., Brazilian Portuguese speakers for Brazilian Portuguese) to ensure the highest quality, which is logistically complex but essential.
Kutylowski believes data labeling quality requirements vary by task: some tasks need massive volumes of lower-quality data, while others — like DeepL’s — require small, meticulously curated datasets with carefully selected individuals.
He notes that quality can fluctuate even among top performers (e.g., someone falling sick for a week), so close monitoring is critical. DeepL is beginning to explore outsourcing parts of this work but maintains that the core quality judgment is hard to delegate.
Infrastructure and Compute Strategy
DeepL has operated its own data centers since the beginning, initially out of necessity and later for cost efficiency and hardware availability reasons.
Kutylowski advises startups to use hyperscalers as a kickstart but consider transitioning to own data centers at scale, particularly for massive inference loads and research training.
Running your own infrastructure is more complex and can slow development speed, so DeepL is moving toward a hybrid cloud model — keeping workloads on-premise for cost, efficiency, or data protection reasons, and using cloud for the rest.
He notes that GPU compute tooling is still at an early stage compared to the mature abstraction layers that exist for general-purpose CPU compute, and that optimizing GPU usage is critical given how scarce and expensive the hardware is.
Model Evaluation
When DeepL started in 2017, synthetic metrics like BLEU score were state-of-the-art for evaluating translation quality, but the company quickly reached quality levels those metrics couldn’t meaningfully distinguish.
Real evaluation at DeepL relies on human translators who judge translations on accuracy, nuance, and native feel — always in a comparative way (e.g., model A vs. model B) rather than assigning absolute scores.
Kutylowski notes that literary translation is a special case where the translation itself is a work of art, and accuracy or fluency are not the primary optimization targets — but this is not a problem AI translation is trying to solve.
The Future: Synchronous Speech Translation
Kutylowski sees spoken language and voice as the next frontier. While text translation has transformed how content is consumed, real-time spoken translation remains unsolved — evidenced by the fact that both participants in the podcast spoke English rather than their native languages.
He expects early products to appear relatively quickly but believes perfecting synchronous speech translation will take years, as speech is stream-based rather than chunked into sentences, and people speak carelessly and ambiguously.
Key technical challenges include latency, ambiguity in speech, and the unstructured nature of spoken language. Models will essentially need to learn to translate a different kind of language.
When synchronous translation arrives, it could fundamentally change how businesses operate — enabling teams spread across countries to communicate in their own languages in real time, and giving employees access to education and knowledge resources regardless of language.
Will People Still Learn Languages?
Kutylowski believes the average person will probably speak fewer languages in 30 years, as the business necessity diminishes. However, those who do learn languages will do so out of personal interest and cultural curiosity, much like people play chess despite AI being far superior.
He still has his own kids learning languages, noting the cognitive benefits and the irreplaceable value of personal connection — you don’t want to talk to your partner through a phone for 20 years.
He also sees potential for AI to democratize language learning, making conversational practice accessible to people who could never afford in-person tutors, though he wonders how enjoyable speaking to a phone will be compared to learning over dinner with a person.
Adjacent Spaces and Broader AI Views
Kutylowski is excited about AI-powered language learning apps, noting that models don’t need to be fully accurate for educational purposes — some hallucination is fine if the learner gets to practice dialogue.
He views general models as overhyped and specialized models as underhyped, arguing that the real value creation is happening in focused, vertically integrated solutions.
The biggest surprise in building DeepL was beating tech giants; the biggest misconception was thinking the technology alone would be enough without product, commercial, and go-to-market investment.
He has become a strong advocate of radical candor in company culture — running the organization through direct, open, honest communication — after learning through repeated experience that lack of transparency hurts all parties.
If not building DeepL, he would go into medicine, particularly drug discovery and democratizing access to healthcare, as he sees it as having enormous potential to improve lives.