Achieving probabilistic anonymity in a linear and hybrid randomization model

Sang, Y.; Shen, H.; Tian, H.; Zhang, Z.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/105657

Scopus	Web of Science®	Altmetric
Citations
?	?

Full metadata record

DC Field	Value	Language
dc.contributor.author	Sang, Y.	-
dc.contributor.author	Shen, H.	-
dc.contributor.author	Tian, H.	-
dc.contributor.author	Zhang, Z.	-
dc.date.issued	2016	-
dc.identifier.citation	IEEE Transactions on Information Forensics and Security, 2016; 11(10):2187-2202	-
dc.identifier.issn	1556-6013	-
dc.identifier.issn	1556-6021	-
dc.identifier.uri	http://hdl.handle.net/2440/105657	-
dc.description	Date of publication May 4, 2016; date of current version July 8, 2016.	-
dc.description.abstract	The randomization methods that are applied for privacy-preserving data mining are commonly subject to reconstruction, linkage, and semantic-related attacks. Some existing works employed random noise addition to realize probabilistic anonymity, aiming only at linkage attacks. Random noise addition is vulnerable to reconstruction attacks, and is unable to achieve semantic closeness, particularly on high-dimensional data, to prevent semantic-related attacks. For linkage attacks, the main security vulnerability of their proposed probabilistic anonymity lies in the assumption that the attacker had a priori knowledge of the quasi-identifiers of all individuals. When only some individuals leak their quasi-identifiers, the proposed model will become incapable, because the attacker can deploy a different linkage attack that has not been studied before. This type of attack is much easier to deploy and is thus very harmful. In this paper, we propose new frameworks of probabilistic (1, k)and (k, k)-anonymity to defend against all these linkage attacks, and realize the frameworks on a hybrid randomization model. The model is also secure against reconstruction attacks. We further achieve statistical semantic closeness of high-dimensional data to prevent semantic-related attacks on the model. The frameworks also allow us to re-design the traditional K-nearest neighbor algorithm to leverage the introduced data uncertainty and improve the mining results. This paper demonstrates the promising applications in large-scale and high-dimensional data mining in clouds, by providing high efficiency and security to protect data privacy, guaranteeing high data utility for mining purposes, on-time processing, and non-interactive data publishing.	-
dc.description.statementofresponsibility	Yingpeng Sang, Hong Shen, Hui Tian, and Zonghua Zhang	-
dc.language.iso	en	-
dc.publisher	Institute of Electrical and Electronics Engineers	-
dc.rights	© 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.	-
dc.source.uri	http://dx.doi.org/10.1109/tifs.2016.2562605	-
dc.subject	Randomization; k-anonymity; privacy protection; data mining	-
dc.title	Achieving probabilistic anonymity in a linear and hybrid randomization model	-
dc.type	Journal article	-
dc.identifier.doi	10.1109/TIFS.2016.2562605	-
dc.relation.grant	http://purl.org/au-research/grants/arc/DP150104871	-
pubs.publication-status	Published	-
dc.identifier.orcid	Shen, H. [0000-0002-3663-6591] [0000-0003-0649-0648]	-
Appears in Collections:	Aurora harvest 3 Computer Science publications

Files in This Item:

There are no files associated with this item.

Show simple item record

Adelaide Research & Scholarship