mrb's blog

Deleting 39794 Gmail conversations

Keywords: performance

I unsubscribed from the qemu-devel mailing list, and deleted all the archived emails —spanning a few years— from my Gmail account. According to Gmail this represented 39794 conversations (threads).

  • My mailbox's size decreased by 752 MB: the average space consumed by a conversation was 19 kB.
  • It took 60 seconds to move all messages to the trash.
  • It took 250 seconds to empty the trash, or 6 ms per conversation, which suggests only one HDD seek per conversation on Google's servers.

This is good performance. Can you delete this quantity of emails just as quickly from your company's email server?

Comments

Stone wrote: When I attempted to delete the lkml contents from 2000-2010 from my GMail account (it became unbearably slow -- even in tags that were not lkml (such as Inbox), search (definitely not a simple inverse index, or at least a very badly performing one), etc.), getting messages into Trash was as far as it would go (this would take several minutes of GMail being unavailable).

Trying to then delete these messages proved to be impossible. Emptying the trash would make GMail unavailable for 30-45 minutes -- after which I'd find the Trash-folder in exactly the same state as I left it -- i.e. full. This happened consistently. Apparently "Empty the Trash" is an atomic operation -- one which can time out and thus fail.

I do not remember the exact amount of conversations, though if I remember correctly it was in the high hundreds of thousands.

It took a GMail support request to fix this, which made the account unavailable for 3 days straight (including rejecting any incoming mail) -- after which the Trash was empty. I wonder what would have happened had I let the Trash expire.

The account has since been working with acceptable speeds for the most part (though sometimes it still lags for up to 20 seconds when opening random labels.

As for doing these kinds of operations elsewhere; I'd imagine you could get performance surpassing that with Dovecot on a ZFS-backed (or similarly designed) server that ignores fsync-semantics and instead coalesces metadata updates into a single txg. The metadata might have to reside in cache already though.
25 Dec 2011 04:51 UTC

mrb wrote: I have read from Gmail engineers that, depending on your account, your data may be more or less fragmented, causing large variation of performance between users. In my case, my account seems as snappy as it was before.

Emptying the trash was presented as a non-atomic operation to me. During these 250 seconds, I was presented with a popup showing me how many conversations were left to delete, with a button to stop the emptying in progress.
25 Dec 2011 06:20 UTC