SHARING PERSONAL DATA ANONYMOUSLY WITH CROWD BLENDING PRIVACY

August 15, 2012

Sogeti Labs

In our upcoming report on the Big Social we talk about giant stockpiles of personal data containing browsing logs, location data, purchases patterns, social media data and how the combination of these sets of data can boost actionable analytics and maybe predict future events. With all of these data sets containing personal information, the issue of personal privacy rises. A new mathematical technique developed at Cornell University could offer a way for large data sets of personal data to be shared and analyzed while guaranteeing that no individual’s privacy will be compromised.

It’s all about data anonymity in this case. However, this is a sensitive issue. Remember Netflix and AOL who both released supposedly “anonymized” data so that anyone could analyze it? Researchers found out pretty quick that the data sets could be de-anonymized by cross referencing them with data available elsewhere. One way to fix these issues is known as differential privacy. It typically requires adding noise to a data set, which makes that data set less useful.

The Cornell group proposes an alternative approach called crowd-blending privacy. This method involves limiting how a data set can be analyzed to ensure that any individual record is indistinguishable from a sizeable crowd of other records and removing a record from the analysis if this cannot be guaranteed.

“We want to make it possible for Facebook or the U.S. Census Bureau to analyze sensitive data without leaking information about individuals. We also have this other goal of utility; we want the analyst to learn something. (…) The hope is that because crowd-blending is a less strict privacy standard it will be possible to write algorithms that will satisfy it and it could open up new uses for data”, says Michael Hay, who was involved with creating the technique while a research fellow at Cornell.

Analysts might favor this new approach because there is no need to add noise to a data set. Also, the researchers showed that crowd-blending is already close to matching the statistical strength of differential privacy. It also benefits consumer privacy: successfully anonymized data enables Facebook and other data brokers to act (sell, buy, share) data sets without putting our personal data at risk.

In the next few weeks we are going to further explore this ‘intersection’ of big data and privacy. If you have any thoughts on the subject, please leave a comment.

About the author

SogetiLabs gathers distinguished technology leaders from around the Sogeti world. It is an initiative explaining not how IT works, but what IT means for business.

Generative AI

Cloud

Testing

Artificial intelligence

Security

SHARING PERSONAL DATA ANONYMOUSLY WITH CROWD BLENDING PRIVACY

August 15, 2012

About the author

Related posts

Balancing Privacy, Transparency, and Performance: How Companies Can Achieve Marketing Success Without Compromising Trust

The Uniform Information Management Framework

How to Organize Data Reporting and Establish Sources of Truth

Cluster Insight: A Weighted Clustering Tool for Large Textual Data Exploration

Crafting Compelling Data Personas: Examples and Application

Crafting Compelling Data Personas: Prompts and Questions

Why Meaning Gets Lost: The Disconnect Between Strategy and Operations

Crafting Compelling Data Personas: Beyond the Average User

Lack of data ownership leads to failed AI implementations

How to Monetize Personal Data in AI while Mitigating Regulatory Compliance

Comments

Leave a Reply Cancel reply