Reidentification Risk of Masked Datasets: Part 1

When it comes to securing data, companies often find themselves at a crossroads – ensure data security, which negatively affects data functionality or, compromise security to keep data functionality intact. In efforts to overcome this issue of having to make a trade-off, organizations adopt sophisticated methods of data protection, such as anonymization and masking, to ensure data security in regard to functionality and performance.

There is a catch, however, and that's the issue of data reidentification.

Cross-referencing the data with other publicly available data can reidentify an individual from their metadata. As a result, private information such as PFI, PHI, and contact information could end up in the public domain. In the wrong hands, this could be catastrophic.

Research conducted by the Imperial College London found that "once bought, the data can often be reverse-engineered using machine learning to reidentify individuals, despite the anonymization techniques. This could expose sensitive information about personally identified individuals and allow buyers to build increasingly comprehensive personal profiles of individuals. The research demonstrates for the first time how easily and accurately this can be done — even with incomplete datasets. In the research, 99.98% of Americans were correctly reidentified in any available 'anonymized' dataset by using just 15 characteristics, including age, gender and marital status."

There have been many incidents in which this has already happened, such as the NYC taxicab debacle or the Netflix Prize dataset contest, where seemingly anonymized data were easily reidentified.

So, it’s not just about using the right data security technology, but also about implementing it right.

While you fight between ensuring both data security and data functionality, you find yourself in a bind, choosing what to trade for the other. But does it have to be this way?

And most importantly, what do you really need to focus on while anonymizing data?

To find out, go to Forbes Tech Council – Reidentification Risk of Masked Datasets: Part 1 to read the entire article.