UpGuard researchers say not all records represent unique, valid information, but the total they found in the January exposure included nearly 3 billion email addresses and passwords, as well as about 2.7 billion records that included Social Security numbers. It was not clear who set up the database, but it appears to contain personal details that may have been collected from several historical data breaches – including, perhaps, money obtained from the 2024 breach of background-check service National Public Data. It’s common for data brokers and cybercriminals to combine and recombine old datasets, but the scale and potential volume of Social Security numbers — even if only a fraction of them were real — was astonishing.
“Every week, there’s another discovery where it looks big on paper, but it’s probably not very new,” says Pollock. “So I was surprised when I started looking for specific cases here to verify the data. In some cases, identities are at risk in this data breach because they have been exposed, but not yet exploited.”
The data was hosted by German cloud provider Hetzner. Since Pollock could not identify the owner of the database to contact, he notified Hetzner on January 16. In turn, the company said it informed its customer, who deleted the data on January 21.
Hetzner did not provide comment to WIRED prior to publication.
The researchers did not download the entire dataset for analysis due to its size and sensitivity. Instead they worked with a sample of 2.8 million records – a small portion of the total repository. By analyzing trends in the data, including the popularity of certain cultural references in passwords, they concluded that most of the data is likely to be from the United States in approximately 2015. For example, passwords referencing One Direction, Fall Out Boy, and Taylor Swift were very common. Meanwhile, references to BLACKPINK, Catseye, and Batsarmee were barely beginning to appear.
Old data is still valuable for two reasons. First, people often reuse the same email address and password, or variations of the password, on many different websites and services. This means that cybercriminals can keep trying the same login credentials for the same people over time. The second reason is that people’s Social Security numbers are often associated with their most sensitive and high-risk data but almost never change during their lifetime. As a result, legitimate SSNs are one of the crown jewels of identity theft for attackers.
In the sample of data reviewed by researchers, Pollock says one in four Social Security numbers appear to be valid and legitimate. The sample was too small to extrapolate to the entire dataset, but 675 million would account for a quarter of all records containing SSNs. A fraction of that will still represent a very significant set of Social Security numbers.
To verify the data, UpGuard researchers contacted a handful of people whose data appeared in the leaked repository. Pollock emphasizes that one of the most worrying findings from talking to those individuals was that not all of them had their identities stolen or suffered a hack. In other words, the database contained information that has not been used by cybercriminals – and potential victims have no idea that their information has been exposed.
<a href