BIG PRIVACY HIDE AND SEEK IN SOCIAL NETWORKS

November 16, 2012

Sogeti Labs

Social networks in particular have led to enormous accumulations of private user data. The price for targeted advertising space depends primarily on how well advertisements fit the recipients’ interests and purchasing power. Detailed user profiles are essential for social network success. These user profiles are created using the six different types of data users implicitly and explicitly disclose to the provider:

Data Types Disclosed
– Service data: Data a user discloses to join a social network.
– Disclosed data: Data a user discloses on the own social network page.
– Entrusted data: Data a user discloses on other users’ pages.
– Incidental data: Data other users disclose about a user (e.g. tags on pictures).
– Behavioral data: Data about a user’s behavior in the network (e.g. user’s IP or articles read).
– Derived data: Data about a user that is derived from all other data types (e.g. user’s location inferred from the IP address).

Original authors: Rafael Accorsi, Christian Zimmermann and Günter Müller, 2012 (full source below)

The last three, incidental data, behavioral data and derived data, are not explicitly disclosed. Inferences are created by use of certain rules, such as association rules and clustering methods. Inference rules are based on domain knowledge and logical derivation, e.g. by means of data mining and machine learning. Such inferences are not created based upon a specific user’s data alone but on all data available to a social network provider, including all users’ data.

The Inference Threat
Overall, inferred data in social networks constitutes a severe new threat to privacy. It allows unprecedentedly extensive user profiling such as purchasing power, ethnicity, political affiliation, interests and preferences, even if the user does not actively disclose this information. Controlling inferences about a specific user bears the following four challenges:

– Unknown inference data: The amount and source of data used to infer new information about a specific user is not known. While it is already a challenge to keep track of a specific user’s disclosed data and entrusted data, only very limited means exist to determine which incidental data is publicized about a user by other users. This applies even more to behavioral data and already derived data which are completely hidden from users’ direct view.
– Unknown inference rules: Controlling inferences requires knowing the respective inference rules. Yet, knowing the inference rules used by an social network provider is impossible for a user, especially as these rules are not based on his/her data alone but on all data the social network provider has access to.
– Evolving inference ruleset: The inference rules an social network provider uses are subject to constant change. As users publicize new information and the provider gathers new data, new patterns in the data emerge, old patterns change and consequently the set of inference rules created from the social network provider’s data pool can change any time. For a user, it is impossible to predict these changes.
– Inevitability of inferences: Assuming inference rules were predictable, users could react accordingly and decide not to disclose certain data in order to prevent specific inferences. However, actively not disclosing certain data in order to avoid inferences constitutes membership in a respective user group characterized by similarity in disclosing behavior. Hence, the similarity of users in this group creates new possibilities to infer information about the group’s members. However, as users in this group disclose less data, chances are that inferences about the group’s members will be less precise, i.e. the inferences’ confidence decreases.

Combining PETs and TETs
Given these challenges, it is obvious that inferences about a social network user can not be prevented completely. Privacy Enhancing Technologies (PETs) can help in decreasing inferences’ confidence but alone are not sufficient to control inferences in social networks. Therefore PETs should be combined with TETs: Transparency Enhancing Technologies.

Full Source and Reference
The full academic version of this paper was submitted to The 1st [SIC!] Workshop on Privacy and Data Protection Technology, held in Amsterdam on October 9, 2012. Just to get an idea of the broadness of Big Privacy theme, which will be covered by the 3rd VINT research report on Big Data, we here present the possible topics of interest list:

– privacy by design
– technical security: privacy policies, mechanisms
– auditing and provenance
– analysis of security in existing systems
– privacy issues in smart metering systems, internet of things, etc.
– apps and mobility
– privacy implications of tracking data and data access
– implementation of removal rights and the right to be forgotten
– transparency (by design)
– authentication and authorization, access control mechanisms
– fine-grained information flow policies
– (sticky) privacy policies
– cloud security
– consent management
– privacy enhancing technologies
– de-identification and anonymization/pseudonymization
– cryptographic data protection techniques, key management
– centalized versus decentralized architectures
– threat models, vulnerabilities, forensics and intrusion detection
– data breach management and notification
– the human factor: usability and security
– stakeholder influence on design
– privacy impact and (security) risk assessments

About the author

SogetiLabs gathers distinguished technology leaders from around the Sogeti world. It is an initiative explaining not how IT works, but what IT means for business.

Generative AI

Cloud

Testing

Artificial intelligence

Security

BIG PRIVACY HIDE AND SEEK IN SOCIAL NETWORKS

November 16, 2012

About the author

Related posts

Balancing Privacy, Transparency, and Performance: How Companies Can Achieve Marketing Success Without Compromising Trust

The Uniform Information Management Framework

How to Organize Data Reporting and Establish Sources of Truth

Cluster Insight: A Weighted Clustering Tool for Large Textual Data Exploration

Crafting Compelling Data Personas: Examples and Application

Crafting Compelling Data Personas: Prompts and Questions

Why Meaning Gets Lost: The Disconnect Between Strategy and Operations

Crafting Compelling Data Personas: Beyond the Average User

Lack of data ownership leads to failed AI implementations

How to Monetize Personal Data in AI while Mitigating Regulatory Compliance

Comments

Leave a Reply Cancel reply