Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases

Osman Abul, Francesco Bonchi and Mirco Nanni

In The 24th International Conference on Data Engineering (ICDE 2008)

Abstract
Preserving individual privacy when publishing data is a problem that is receiving increasing attention. According to the k-anonymity principle, each release of data must be such that each individual is indistinguishable from at least k - 1 other individuals. In this paper we study the problem of anonymity preserving data publishing in moving objects databases. We propose a novel concept of k-anonymity based on co-localization that exploits the inherent uncertainty of the moving object's whereabouts. Due to sampling and positioning systems (e.g., GPS) imprecision, the trajectory of a moving object is no longer a polyline in a threedimensional space, instead it is a cylindrical volume, where its radius δ represents the possible location imprecision: we know that the trajectory of the moving object is within this cylinder, but we do not know exactly where. If another object moves within the same cylinder they are indistinguishable from each other. This leads to the definition of (k,δ)- anonymity for moving objects databases. We first characterize the (k,δ)-anonymity problem and discuss techniques to solve it. Then we focus on the most promising technique by the point of view of information preservation, namely space translation. We develop a suitable measure of the information distortion introduced by space translation, and we prove that the problem of achieving (k,δ)-anonymity by space translation with minimum distortion is NP-hard. Faced with the hardness of our problem we propose a greedy algorithm based on clustering and enhanced with ad hoc pre-processing and outlier removal techniques. The resulting method, named NWA (Never Walk Alone), is empirically evaluated in terms of data quality and efficiency. Data quality is assessed both by means of objective measures of information distortion, and by comparing the results of the same spatio-temporal range queries executed on the original database and on the (k,δ)-anonymized one. Experimental results show that for a wide range of values of δ and k, the relative error introduced is kept low, confirming that NWA produces high quality (k,δ)-anonymized data.


     
Paper download
Executables, source code, and datasets download

Before you proceed with downloading, please read and agree to the terms of using our implementation.

Dataset used in our experiments: Oldenburg, generated using Brinkhoff's data generator.

Dataset format

Input and output datasets have the same format: an ASCII file containing 4 TAB-separated columns.
The columns correspond to:
The input file has to meet the following constraints: Here is an example of input file, containing two trajectories (having ID=501 and ID=504) resp. with 6 and 4 points:
501     5       11401.719       24961.338
501     2       11581.854       24691.925
501     0       11593.253       24736.460
501     1       11611.838       24722.498
501     3       11470.533       24822.748
501     4       11444.096       24922.036
504     4       11450.540       24785.989
504     2       11570.155       24705.881
504     3       11493.856       24732.754
504     5       11398.027       24812.087
(Notice that trajectory 501 has timestamps 0,1,2,3,4,5 and trajectory 504 has 2,3,4,5, both of them forming intervals without gaps).

Usage and output information

To see the usage syntax and a short description of all options, invoke the program without parameters.
When it is launched with the proper parameters, the software will print on the standard output the status of the computation and some statistics of the dataset.
Moreover, if the option -stats is specified, at the end of the computation a single line of text is printed on the standard error, containing statistics collected through the computation. In particular, that line of text contains, in the order, the following information:
 

Back to authors'webapges: Osman, Francesco, Mirco.