Limiting privacy breaches in privacy preserving data mining

Abstract

There has been increasing interest in the problem of building accurate data mining models over aggregate data, while protecting privacy at the level of individual records. One approach for this problem is to randomize the values in individual records, and only disclose the randomized values. The model is then built over the randomized data, after first compensating for the randomization (at the aggregate level). This approach is potentially vulnerable to privacy breaches: based on the distribution of the data, one may be able to learn with high confidence that some of the randomized records satisfy a specified property, even though privacy is preserved on average.In this paper, we present a new formulation of privacy breaches, together with a methodology, "amplification", for limiting them. Unlike earlier approaches, amplification makes it is possible to guarantee limits on privacy breaches without any knowledge of the distribution of the original data. We instantiate this methodology for the problem of mining association rules, and modify the algorithm from [9] to limit privacy breaches without knowledge of the data distribution. Next, we address the problem that the amount of randomization required to avoid privacy breaches (when mining association rules) results in very long transactions. By using pseudorandom generators and carefully choosing seeds such that the desired items from the original transaction are present in the randomized transaction, we can send just the seed instead of the transaction, resulting in a dramatic drop in communication and storage cost. Finally, we define new information measures that take privacy breaches into account when quantifying the amount of privacy preserved by randomization.

Keywords

Computer scienceRandomized responseDatabase transactionInformation privacyComputer securityTransaction dataLimitingData miningDatabaseMathematicsEstimatorEngineering

Affiliated Institutions

Related Publications

On the design and quantification of privacy preserving data mining algorithms

Dakshi Agrawal , Charų C. Aggarwal

The increasing ability to track and collect large amounts of data with the use of current hardware technology has lead to an interest in the development of data mining algorithm...

2001 1010 citations

Client Selection for Federated Learning with Heterogeneous Resources in Mobile Edge

Takayuki Nishio , Ryo Yonetani , Takayuki Nishio +1 more

We envision a mobile edge computing (MEC) framework for machine learning (ML)\ntechnologies, which leverages distributed client data and computation resources\nfor training high...

2019 1292 citations

Blockchain and Federated Learning for Privacy-Preserved Data Sharing in Industrial IoT

Yunlong Lu , Xiaohong Huang , Yueyue Dai +2 more

The rapid increase in the volume of data generated from connected devices in industrial Internet of Things paradigm, opens up new possibilities for enhancing the quality of serv...

2019 IEEE Transactions on Industrial Infor... 1137 citations

Mining association rules between sets of items in large databases

Rakesh Agrawal , Tomasz Imieliński , Arun Swami

We are given a large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. We present an efficient algorithm that generates a...

1993 14674 citations

On the Utility of Privacy-Preserving Histograms

Shuchi Chawla , Cynthia Dwork , Frank McSherry +1 more

In a census, individual respondents give private information to a trusted party (the census bureau), who publishes a sanitized version of the data. There are two fundamentally c...

2004 22 citations

Publication Info

Year: 2003
Type: article
Pages: 211-222
Citations: 819
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Limiting privacy breaches in privacy preserving data mining

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

819

OpenAlex

Cite This

APA Style

                            
                                    Alexandre Evfimievski, 
                                
                                    Johannes Gehrke, 
                                
                                    Ramakrishnan Srikant
                                
                            (2003). 
                            Limiting privacy breaches in privacy preserving data mining. 
                            
                            , 211-222.
                            https://doi.org/10.1145/773153.773174

Identifiers

DOI: 10.1145/773153.773174