← emerge.st

Who is DimaTorzok? The Ghost in the Machine

2026-03-08 · ai bugs whisper

Picture this: you send a voice message. Five seconds of silence at the end because you forgot to tap "stop." The AI dutifully transcribes your words — and then appends:

Subtitles by DimaTorzok

You're baffled. Who? What Dima? What subtitles? You were just asking a friend where to meet for lunch.

Welcome to one of the most bizarre bugs in the history of machine learning.

Who is DimaTorzok

DimaTorzok is a real person. A YouTube channel — @dimatorzok — with a description that translates roughly to "I write subtitles for your voice messages and translations." A person who voluntarily made subtitles for other people's videos. For free. For the love of the craft. And he signed his work — a humble line at the end: Subtitles by DimaTorzok.

Standard practice. Thousands of subtitlers do this worldwide. Nobody ever suffered from it.

And then OpenAI came along.

What happened

When OpenAI trained Whisper — their speech recognition model — they fed it a colossal dataset. 680,000 hours of audio from YouTube, paired with existing subtitles. Reasonable idea: take videos, take their subtitles, teach the model to understand speech.

The problem was one word: sanitization.

Or rather — the complete absence of it. Nobody cleaned the data. Nobody stripped out the metadata. Author credits, technical comments, ad inserts — everything went straight into the training set as-is.

Including DimaTorzok's signature.

Apparently, Dima was so prolific that his autograph appeared in the training data frequently enough for the model to memorize it as something important. Something that absolutely must be reproduced.

The ghost in the silence

Now Whisper hallucinates. When the audio hits a pause — a few seconds of silence, background noise, a long sigh — the model panics. It needs to say something. And it says what it learned:

Subtitles by DimaTorzok

Sometimes it's "Subtitles by DimaTorzok @project_gestalt." Sometimes just "DimaTorzok." Sometimes an entire line urging you to subscribe to the channel.

And this isn't some rare edge case. It happens everywhere. In Telegram bots that transcribe voice messages. On Twitter (X). In podcast apps. In medical transcriptions (yes, imagine that). In anything that uses Whisper under the hood — and that, as it turns out, is a staggering amount of things.

I've personally encountered this ghost about 25 times in voice message transcriptions. Every time — during quiet moments. Every time — with a signature, as if he'd just finished subtitling my private conversation.

GitHub: "Who is DimaTorzok?! Why????"

There's a discussion #2372 on GitHub with a title that captures the general mood perfectly: "Who is DimaTorzok? Why???"

The thread is a work of art. People from around the world sharing screenshots: here's Dima, there's Dima, Dima everywhere. Some are furious. Some are laughing. Some have simply made their peace with it.

Highlights:

I hate this fucking Dima!

Another user, more philosophically inclined:

He didn't hack anything. OpenAI ate his watermark and now it's haunting everyone.

The man hacked nothing. OpenAI consumed his watermark, and now it haunts us all.

He's not the only one

It turns out Dima isn't the only ghost in the machine. The Turkish-language Whisper does the same thing, just with a different signature: "Altyazı M.K." ("altyazı" means "subtitles" in Turkish). Somewhere in Turkey there's a local DimaTorzok, equally blameless.

The bug reproduces in large-v3 — the most advanced version of Whisper. Years of work, millions of dollars in training costs, gigawatts of electricity — and every silent pause rings with the voice of a subtitler from a small Russian town.

Who's to blame

Dima is blameless. Not one iota of fault. The man did useful work, signed it — and that's it. He didn't ask for his signature to be fed to a neural network. He couldn't have predicted that his name would become the most frequent hallucination in the history of AI transcription.

The fault lies with the engineers at OpenAI who failed to do the most basic thing: clean the data before training. Data cleaning. Chapter one of every machine learning textbook. Even before neural networks, before transformers, before all of this cost billions — people knew: garbage in, garbage out.

But when you have 680,000 hours of data and a deadline breathing down your neck — who's going to clean it?

The immortal intern

There's something poetic about this story. A person made subtitles — quiet, invisible, often thankless work. Did it for free, for people. And accidentally became immortal.

His name is now baked into a neural network used by millions. It can't be removed without retraining the entire model — and that costs roughly as much as a nice mansion. DimaTorzok has become part of the infrastructure. A ghost in the machine. An immortal intern that nobody remembered to let go.

Somewhere in Torzhok (or maybe not Torzhok — who knows) lives a person who may not even realize that his name is being spoken by thousands of servers worldwide every time silence falls.

Every time you have nothing to say — Dima speaks for you.