You’ve seen the headlines about a loot archive of stolen credentials called "Collection #1" that was leaked online in January. This collection contains 772,904,991 entries, one of the most significant credential leaks yet. The credentials are all stored within an email:cleartext_password format, making credential stuffing attacks relatively easy without having to worry about deciphering hashes.
As worrisome for potential targets as this can be, this post doesn’t deal with this particular pile of data (read Troy Hunt’s analysis of "Collection #1" leak here). Instead, I’ll take a closer look at why there’s a “#1” next to the collection name. While #1 is a massive heap of data, it’s only the tip of the proverbial iceberg. There are five collection archives in total, containing a total of 1TB worth of raw credential data waiting to be downloaded by attackers. So what’s in Collections 2 - 5?