PRIVACY BY DESIGN: A FRAMEWORK FOR DESIGN IN THE AGE OF BIG DATA

August 23, 2012

Sogeti Labs

‘How does Big Data affect personal privacy?’ and ‘in what specific way are privacy and big data connected?’ are two questions we are exploring in our research on big data and privacy. Another question is about a possible way out. How can we organize privacy in the age of Big Data?

One part of the privacy issue related to big data is re-identification. As more data, from more sources, assembles around a single individual, despite de-identification efforts, it becomes easier to eventually re-identify a specific individual. Traditional methods for de-identification, such as anonymization, pseudonymization, encryption, key-coding, data sharding, are more and more becoming less effective. “Re-identification science disrupts the privacy policy landscape by undermining the faith that we have placed in anonymization“, according to Paul Ohm, privacy expert at the University of Colorado.

One interesting take on this matter is presented by Jeff Jonas, Chief Scientist of the Entity Analytic Solutions-group and an IBM Fellow, in a paper called Privacy By Design (in collaboration with Ann Cavoukian). He presents an ‘anonymous resolution’ decreasing the risk of re-identification based on 7 design principals:

FULL ATTRIBUTION: Every observation (record) needs to know from where it came and when. There cannot be merge/purge data survivorship processing whereby some observations or fields are discarded

DATA TETHERING: Adds, changes and deletes occurring in systems of record must be accounted for, in real time, in sub-seconds

ANALYTICS ON ANONYMIZED DATA: The ability to perform advanced analytics (including some fuzzy matching) over cryptographically altered data means organizations can anonymize more data before information sharing

TAMPER-RESISTANT AUDIT LOGS: Every user search should be logged in a tamper-resistant manner — even the database administrator should not be able to alter the evidence contained in this audit log.

FALSE NEGATIVE FAVORING METHODS: The capability to more strongly favor false negatives is of critical importance in systems that could be used to affect someone’s civil liberties.

SELF-CORRECTING FALSE POSITIVES: With every new data point presented, prior assertions are re-evaluated to ensure they are still correct, and if no longer correct, these earlier assertions can often be repaired —in real time.

INFORMATION TRANSFER ACCOUNTING: Every secondary transfer of data, whether to human eyeball or a tertiary system, can be recorded to allow stakeholders (e.g., data custodians or the consumers themselves) to understand how their data is flowing.

While this framework is reducing the risks and not completely solving the issue (which is impossible), I think privacy needs to be adressed from the start. So the design/architecture stage of systems is a proactive approach to addressing the issue. Building in privacy-enhancing elements by design can minimize the privacy harm and in some cases might take away possible harm in the first place. How do you feel about organizing privacy in the age of big data? And what about organizing privacy by design?

Read the paper.

About the author

SogetiLabs gathers distinguished technology leaders from around the Sogeti world. It is an initiative explaining not how IT works, but what IT means for business.

Generative AI

Cloud

Testing

Artificial intelligence

Security

PRIVACY BY DESIGN: A FRAMEWORK FOR DESIGN IN THE AGE OF BIG DATA

August 23, 2012

About the author

Related posts

Balancing Privacy, Transparency, and Performance: How Companies Can Achieve Marketing Success Without Compromising Trust

The Uniform Information Management Framework

How to Organize Data Reporting and Establish Sources of Truth

Cluster Insight: A Weighted Clustering Tool for Large Textual Data Exploration

Crafting Compelling Data Personas: Examples and Application

Crafting Compelling Data Personas: Prompts and Questions

Why Meaning Gets Lost: The Disconnect Between Strategy and Operations

Crafting Compelling Data Personas: Beyond the Average User

Lack of data ownership leads to failed AI implementations

How to Monetize Personal Data in AI while Mitigating Regulatory Compliance

Comments

Leave a Reply Cancel reply