Study

What DELETE Doesn't Delete

The row vanishes from your view. On disk it's still there — and that has consequences.

Saturday, May 30, 2026 · Jaipur · 9 min

I deleted a few million rows to clean up a mess, ran a SELECT count(*) to admire the result, and the count came back slower than before I’d deleted anything.

That sentence shouldn’t make sense. Fewer rows should mean less to count. The table I was looking at in the UI had visibly shrunk — the duplicates were gone, the numbers were right, the cleanup had worked. And yet the database was now doing more work to answer the same simple question.

The reason is a single fact about how databases work that almost nobody is told up front, and once you see it, a dozen other things stop being mysterious. So let me walk you to it the way I got there.

What I assumed

My mental model was the one most people carry, because the UI teaches it to you: a row is a thing in a table; DELETE removes the thing; the table gets smaller; smaller tables are faster. Clean and obvious.

It’s also wrong in a specific, useful way. The first assumption — that DELETE removes the row — is the one that breaks everything downstream. It doesn’t remove anything. It writes a small note that says ignore this one, and walks away.

Once you accept that, the slow count explains itself. But to believe it, you need to know why a database would ever choose to keep something you told it to delete.

Delete doesn’t erase. It marks.

Imagine an editor working on a page in pen, who has made a rule for herself: never erase. When a line changes, she strikes through the old one and writes the new version below it. When a line is cut, she just strikes it through. The page gets messier, but it has a quiet superpower — someone who started reading the page a minute ago can keep reading their version undisturbed, even while she’s editing, because nothing they were looking at was ever destroyed.

That is almost exactly how Postgres handles your data. It keeps more than one version of a row on purpose, and it never overwrites in place. This approach has a name — MVCC, multi-version concurrency control — and the whole point of it is that name’s middle word: concurrency. Readers never have to wait for writers, and writers never have to wait for readers, because the old version is always still sitting there for anyone who started before the change.

To make that work, Postgres hands every transaction — every statement that changes data — a number that only ever counts up: 1, 2, 3, and onward. It stamps each row version with two of them: the transaction that created the row, and, once it’s deleted, the transaction that removed it. Born in #42, died in #91.

Those two stamps are how Postgres decides what you’re allowed to see. Your own query carries a transaction number too, and the rule is plain: you see a row if it was born before you arrived and hasn’t died yet from where you stand. So a DELETE doesn’t reach in and remove anything — it writes the died in #N stamp, the strike-through, and walks away. Queries that started before #N still see the row; queries after it skip it. To you, gone. On disk, untouched.

Hold onto that one idea — every row carries the numbers of the transactions that made it and unmade it — because it’s the thread that ties together everything after this.

A delete is a promise to forget, not an act of erasing.

The leftovers have a name

Each physical version of a row sitting on disk is called a tuple — think of it as one strike-through-able line on the editor’s page. A tuple that has been deleted (or replaced by a newer version) and is no longer visible to any running query is a dead tuple. It’s still on the page. It’s just crossed out.

After my cleanup, the table was full of dead tuples — a few million of them — taking up exactly as much room as they did when they were alive. The table’s files on disk had not shrunk by a single byte.

A few million rows at, say, ~100 bytes each is a few hundred megabytes (4M × 100B ≈ 400 MB) that the delete marked dead but did not hand back.

Now the slow count makes sense. count(*) works by walking the table’s pages on disk and tallying the rows it can see. The number of pages didn’t drop when I deleted — the dead tuples are still occupying them — so the scan reads just as much disk as before, and now it also has to check each dead tuple, confirm it’s invisible, and skip it. Same reading, extra deciding. The work went up, not down.

There was a second reason mine was especially slow, which we’ll get to: the database had quietly started cleaning up in the background, and that cleanup was competing with my query for the same disk.

Three bills come due

Keeping the dead around buys you that lovely concurrency, but the bill arrives in three parts.

The first is bloat — wasted space. Dead tuples occupy disk and memory like live ones. A table that’s a third dead tuples is a third bigger than it needs to be, and every backup and every full scan pays for the extra heft.

The second is slow reads — the one I hit. Scans drag through dead tuples to find the living ones.

The third is the strange one, and it’s the reason none of this is optional. Remember those ever-counting transaction numbers stamped on every row — the counter that hands them out isn’t infinite. It’s a fixed-size number that tops out near 4.2 billion and then has nowhere to go but back to zero.

Picture the token slip you get at a busy bank counter or a doctor’s clinic. The machine hands out 1, 2, 3 through the day and starts again at 1 the next morning. Keep last week’s slip in your pocket and today’s queue will eventually print a token with the very same number — and now whose came first? has no answer from the number alone. Postgres hits exactly this with its row stamps. Once the counter loops back to the start, a row stamped long ago can suddenly look newer than one stamped yesterday, and the born-before-you rule starts quietly handing back wrong answers. That isn’t a slow query — that’s the database showing you the wrong data and not knowing it.

Postgres stays ahead of it by freezing old rows before the counter can lap them. Freezing restamps a row as permanently old — older than any transaction that will ever run — which lifts it out of the number game entirely: a frozen row is simply always-visible, no comparison needed. That freezing is cleanup work, done by the same pass that clears dead tuples. Neglect it long enough and the counter creeps toward the loop-around, at which point Postgres does the only safe thing left: it stops accepting new writes and demands a cleanup, rather than risk lying to you about the past. A database that bolts its own doors to avoid corrupting history is a memorable way to learn that cleanup is load-bearing.

The cleanup crew

The cleanup has a name you’ve probably seen in logs without ever needing it: VACUUM.

VACUUM is the janitor that walks the table and does three things. It finds dead tuples no query can need anymore and marks their space reusable. It freezes the oldest rows, so the counter can keep looping without ever mistaking old for new. And it keeps a small per-page bitmap called the visibility map — a note on each page that says every row here is already visible to everyone — which lets a future scan skip that whole page instead of checking each row’s stamps one by one.

A row's life on disk: a DELETE only moves it from LIVE to DEAD. VACUUM is the step that actually reclaims the space — for reuse, not return.

One nuance trips everyone the first time: ordinary VACUUM makes the space reusable by that same table, but it usually does not shrink the file and hand the space back to the operating system. The table stays the same size on disk; the crossed-out lines just become writable again, and future inserts quietly fill the holes. So “I vacuumed — why is the table still 400 MB larger?” is not a bug — it’s the design. The space is free for reuse, not returned.

The part you don’t run

Here’s the relief: you almost never run VACUUM by hand. Postgres ships with a background process called autovacuum that watches every table and triggers a vacuum on its own once the dead tuples cross a threshold — very roughly, “more than about 20% of the table is dead.” It works through tables one at a time, on its own schedule, without anyone asking.

That’s the whole story of my slow count. The moment I deleted a few million rows, autovacuum noticed the sudden pile of dead tuples and got to work reclaiming them — which meant the disk was busy with its reading while I was trying to do mine. My count was slow partly because the table hadn’t shrunk, and partly because the database was already cleaning up underneath me. A little while later, with the dead tuples reclaimed, the same count was fast again, and I hadn’t done anything. The janitor had.

One cleanup you should never run casually

There is a heavier command, VACUUM FULL, and the difference between it and plain VACUUM is the kind of thing worth knowing before you reach for it at 2am.

	`VACUUM`	`VACUUM FULL`
What it does	marks dead space reusable in place	rewrites the entire table into a fresh, compact file
Returns space to the OS?	No — table stays the same size	Yes — the table physically shrinks
Locking	light; table stays readable and writable	takes an exclusive lock; the table is unusable until it finishes
When to use	constantly, automatically	rarely, to reclaim large bloat, during a maintenance window

Plain VACUUM is the janitor sweeping around people who are still working. VACUUM FULL is gutting the room and rebuilding it — faster to a clean result, but everyone has to leave first. On a live table that people are querying, “everyone has to leave” is the part that ends up in the incident review.

What to carry forward

The single fact under all of it: in Postgres, delete means forget, not erase. The row leaves your view immediately and leaves the disk much later, on the janitor’s schedule, not yours. Almost everything else — the slow count, the table that won’t shrink, the existence of VACUUM at all, the alarming idea that a database can shut itself down to protect its own counter — falls out of that one design choice, made so that readers and writers never have to wait for each other.

So the next time you delete a pile of data and something feels slower instead of faster, you’ll know it’s not broken. It’s just the gap between gone from view and gone from disk, and the quiet process that closes it.

You didn’t remove the row. You left a note for someone else to remove it later — and someone else did.

Previous move

17…

Inside Git's Credential Protocol

Next move

18…

The Two Addresses Every Server Has