Public health datasets increasingly use geographic identifiers such as an individual’s address. Geocoding these addresses often provides new insights since it becomes possible to examine spatial patterns and associations. Address information is typically considered confidential and is therefore not released or shared with others. Publishing maps with the locations of individuals, however, may also breach confidentiality since addresses and associated identities can be discovered through reverse geocoding. One commonly used technique to protect confidentiality when releasing individual-level geocoded data is geographic masking. This typically consists of applying a certain amount of random perturbation in a systematic manner to reduce the risk of reidentification. A number of geographic masking techniques have been developed as well as methods to quantity the risk of reidentification associated with a particular masking method. This paper presents a review of the current state-of-the-art in geographic masking, summarizing the various methods and their strengths and weaknesses. Despite recent progress, no universally accepted or endorsed geographic masking technique has emerged. Researchers on the other hand are publishing maps using geographic masking of confidential locations. Any researcher publishing such maps is advised to become familiar with the different masking techniques available and their associated reidentification risks. 1. Introduction The widespread availability of powerful geocoding tools in commercial Geographic Information System (GIS) software and the interest in spatial analysis at the individual level have made mapping residential addresses of individuals a widely employed technique in public health research [1–6]. Spatial analysis and mapping of georeferenced, individual-level health data can help identify important geographical patterns [1, 2, 7, 8]. However, given the need and/or legal requirement to preserve the confidentiality of microdata, the possibilities of undertaking geographical analysis on certain types of individual-level data are often limited [9, 10]. As a result of restrictions on access to confidential data, important information may remain inaccessible. Releasing locations of individuals in digital or paper format presents reidentifications risk since these locations can be reverse geocoded to find the addresses and identities associated with those locations. Geographic masking techniques have been developed to reduce the risk of reidentification. The present review describes the background for sharing and
References
[1]
P. A. Zandbergen, “Geocoding quality and implications for spatial analysis,” Geography Compass, vol. 3, no. 2, pp. 647–680, 2009.
[2]
G. Rushton, M. P. Armstrong, J. Gittler et al., Geocoding Health Data: The Use of Geographic Codes in Cancer Prevention and Control, Research and Practice, CRC Press, 2010.
[3]
N. Krieger, J. T. Chen, P. D. Waterman, M.-J. Soobader, S. V. Subramanian, and R. Carson, “Geocoding and monitoring of US socioeconomic inequalities in mortality and cancer incidence: does the choice of area-based measure and geographic level matter? The public health disparities geocoding project,” American Journal of Epidemiology, vol. 156, no. 5, pp. 471–482, 2002.
[4]
T. Abe and D. Stinchcomb, “Geocoding practices in cancer registries,” in Geocoding Health Data, pp. 111–125, CRC Press, 2010.
[5]
N. Krieger, “Place, space, and health: GIS and epidemiology,” Epidemiology, vol. 14, no. 4, pp. 384–385, 2003.
[6]
J. M. Bissette, J. A. Stover, L. M. Newman, P. C. Delcher, K. T. Bernstein, and L. Matthews, “Assessment of geographic information systems and data confidentiality guidelines in STD programs,” Public Health Reports, vol. 124, supplement 2, p. 58, 2009.
[7]
D. B. Richardson, N. D. Volkow, M.-P. Kwan, R. M. Kaplan, M. F. Goodchild, and R. T. Croyle, “Medicine. Spatial turn in health research,” Science, vol. 339, no. 6126, pp. 1390–1392, 2013.
[8]
A. Gemmill, R. B. Gunier, A. Bradman, B. Eskenazi, and K. G. Harley, “Residential proximity to methyl bromide use and birth outcomes in an agricultural population in california,” Environmental Health Perspectives, vol. 121, no. 6, pp. 737–743, 2013.
[9]
N. H. Fefferman, E. A. O'Neil, and E. N. Naumova, “Confidentiality and confidence: is data aggregation a means to achieve both?” Journal of Public Health Policy, vol. 26, no. 4, pp. 430–449, 2005.
[10]
G. Rushton, M. P. Armstrong, J. Gittler et al., “Geocoding in cancer research: a review,” The American Journal of Preventive Medicine, vol. 30, no. 2, pp. S16–S24, 2006.
[11]
J. P. Reiter and S. K. Kinney, “Sharing confidential data for research purposes: a primer,” Epidemiology, vol. 22, no. 5, pp. 632–635, 2011.
[12]
D. W. Goldberg, J. P. Wilson, and C. A. Knoblock, “From text to geographic coordinates: the current state of geocoding,” URISA Journal, vol. 19, no. 1, pp. 33–46, 2007.
[13]
P. A. Zandbergen, “Influence of geocoding quality on environmental exposure assessment of children living near high traffic roads,” BMC Public Health, vol. 7, article 37, 2007.
[14]
P. A. Zandbergen and J. W. Green, “Error and bias in determining exposure potential of children at school locations using proximity-based GIS techniques,” Environmental Health Perspectives, vol. 115, no. 9, pp. 1363–1370, 2007.
[15]
P. A. Zandbergen, T. C. Hart, K. E. Lenzer, and M. E. Camponovo, “Error propagation models to examine the effects of geocoding quality on spatial analysis of individual-level datasets,” Spatial and Spatio-Temporal Epidemiology, vol. 3, no. 1, pp. 69–82, 2012.
[16]
M. R. Cayo and T. O. Talbot, “Positional error in automated geocoding of residential addresses,” International Journal of Health Geographics, vol. 2, article 10, 2003.
[17]
B. Jacquemin, J. Lepeule, A. Boudier et al., “Impact of geocoding methods on associations between long-term exposure to urban air pollution and lung function,” Environmental Health Perspectives, vol. 121, no. 9, pp. 1054–1060, 2013.
[18]
G. M. Jacquez, “A research agenda: does geocoding positional error matter in health GIS studies?” Spatial and Spatio-Temporal Epidemiology, vol. 3, no. 1, pp. 7–16, 2012.
[19]
D. L. Zimmerman and X. Fang, “Estimating spatial variation in disease risk from locations coarsened by incomplete geocoding,” Statistical Methodology, vol. 9, no. 1-2, pp. 239–250, 2012.
[20]
D. T. Duncan, M. C. Castro, J. C. Blossom, G. G. Bennett, and L. G. G. G. Steven, “Evaluation of the positional difference between two common geocoding methods,” Geospatial Health, vol. 5, no. 2, pp. 265–273, 2011.
[21]
P. A. Zandbergen, “A comparison of address point, parcel and street geocoding techniques,” Computers, Environment and Urban Systems, vol. 32, no. 3, pp. 214–232, 2008.
[22]
D. W. Goldberg and M. G. Cockburn, “Improving geocode accuracy with candidate selection criteria,” Transactions in GIS, vol. 14, no. 1, pp. 149–176, 2010.
[23]
P. A. Zandbergen and J. Chakraborty, “Improving environmental exposure analysis using cumulative distribution functions and individual geocoding,” International Journal of Health Geographics, vol. 5, article 23, 2006.
[24]
M. L. Miranda, R. Anthopolos, and D. Hastings, “A geospatial analysis of the effects of aviation gasoline on childhood blood lead levels,” Environmental Health Perspectives, vol. 119, no. 10, pp. 1513–1516, 2011.
[25]
J. Xue, T. McCurdy, J. Burke et al., “Analyses of school commuting data for exposure modeling purposes,” Journal of Exposure Science and Environmental Epidemiology, vol. 20, no. 1, pp. 69–78, 2010.
[26]
M. P. Armstrong, G. Rushton, and D. L. Zimmerman, “Geographically masking health data to preserve confidentiality,” Statistics in Medicine, vol. 18, no. 5, pp. 497–525, 1999.
[27]
K. Sueda, T. Miyaki, and J. Rekimoto, “Social geoscape: visualizing an image of the city for mobile UI using user generated geo-tagged objects,” in Mobile and Ubiquitous Systems: Computing, Networking, and Services, pp. 1–12, Springer, 2012.
[28]
O. Kounadi, T. J. Lampoltshammer, M. Leitner, and T. Heistracher, “Accuracy and privacy aspects in free online reverse geocoding services,” Cartography and Geographic Information Science, vol. 40, no. 2, pp. 140–153, 2013.
[29]
J. S. Brownstein, C. A. Cassa, and K. D. Mandl, “No place to hide—reverse identification of patients from published maps,” New England Journal of Medicine, vol. 355, no. 16, pp. 1741–1742, 2006.
[30]
J. Krumm, “A survey of computational location privacy,” Personal and Ubiquitous Computing, vol. 13, no. 6, pp. 391–399, 2009.
[31]
J. Rekimoto, T. Miyaki, and T. Ishizawa, “LifeTag: WiFi-based continuous location logging for life pattern analysis,” in Location- and Context-Awareness, vol. 4718 of Lecture Notes in Computer Science, pp. 35–49, 2007.
[32]
K. R. Searight, D. J. Logan, J. Bourland II Freddie, C. J. Loher, and B. R. Charlton, “Reverse geocoding system using combined street segment and point datasets,” Google Patents, 2010.
[33]
R. Marshall, J. Polk, and R. George, “A protocol for location transformations,” TCS, 2011.
[34]
L.-C. Chen, Y.-C. Lai, Y.-H. Yeh, J.-W. Lin, C.-N. Lai, and H.-C. Weng, “Enhanced mechanisms for navigation and tracking services in smart phones,” Journal of Applied Research and Technology, vol. 11, pp. 272–282, 2013.
[35]
J. S. Brownstein, C. A. Cassa, I. S. Kohane, and K. D. Mandl, “An unsupervised classification method for inferring original case locations from low-resolution disease maps,” International Journal of Health Geographics, vol. 5, article 56, 2006.
[36]
A. J. Curtis, J. W. Mills, and M. Leitner, “Spatial confidentiality and GIS: re-engineering mortality locations from published maps about Hurricane Katrina,” International Journal of Health Geographics, vol. 5, article 44, 2006.
[37]
N. R. Council, Putting People on the Map: Protecting Confidentiality with Linked Social-Spatial Data, National Academies Press, 2007.
[38]
K. L. Olson, S. J. Grannis, and K. D. Mandl, “Privacy protection versus cluster detection in spatial epidemiology,” American Journal of Public Health, vol. 96, no. 11, pp. 2002–2008, 2006.
[39]
M. N. K. Boulos, A. J. Curtis, and P. Abdelmalik, “Musings on privacy issues in health research involving disaggregate geographic data about individuals,” International Journal of Health Geographics, vol. 8, p. 46, 2009.
[40]
C. Tenopir, S. Allard, K. Douglass et al., “Data sharing by scientists: practices and perceptions,” PLoS ONE, vol. 6, no. 6, Article ID e21101, 2011.
[41]
P. N. Schofield, T. Bubela, T. Weaver et al., “Post-publication sharing of data and tools,” Nature, vol. 461, no. 7261, pp. 171–173, 2009.
[42]
G. T. Duncan and R. W. Pearson, “Enhancing access to microdata while protecting confidentiality: prospects for the future,” Statistical Science, vol. 6, no. 3, pp. 219–232, 1991.
[43]
L. Cox, “Matrix masking methods for disclosure limitation in microdata,” Survey Methodology, vol. 20, no. 2, pp. 165–169, 1994.
[44]
W. B. Allshouse, M. K. Fitch, K. H. Hampton et al., “Geomasking sensitive health data and privacy protection: an evaluation using an E911 database,” Geocarto International, vol. 25, no. 6, pp. 443–452, 2010.
[45]
M. Fitch, “Geomasking algorithms to protect confidentiality of sexually transmitted infections in spatial epidemiology,” in Proceedings of the American Public Health Association Annual Meeting and Exposition, 2007.
[46]
K. H. Hampton, M. K. Fitch, W. B. Allshouse et al., “Mapping health data: improved privacy protection with donut method geomasking,” American Journal of Epidemiology, vol. 172, no. 9, pp. 1062–1069, 2010.
[47]
Y. Lu, C. Yorke, and F. B. Zhan, “Considering risk locations when defining perturbation zones for geomasking,” Cartographica, vol. 47, no. 3, pp. 168–178, 2012.
[48]
J. L. French and M. P. Wand, “Generalized additive models for cancer mapping with incomplete covariates,” Biostatistics, vol. 5, no. 2, pp. 177–191, 2004.
[49]
B. S. Bell, “Spatial analysis of disease-applications,” in Biostatistical Applications in Cancer Research, pp. 151–182, Springer, 2002.
[50]
X. Shi, J. Alford-Teaster, and T. Onega, “Kernel density estimation with geographically masked points,” in Proceedings of the 17th International Conference on Geoinformatics (Geoinformatics '09), August 2009.
[51]
S. S. Francis, S. Selvin, W. Yang, P. A. Buffler, and J. L. Wiemels, “Unusual space-time patterning of the Fallon, Nevada leukemia cluster: evidence of an infectious etiology,” Chemico-Biological Interactions, vol. 196, no. 3, pp. 102–109, 2012.
[52]
J. Claridge, P. Diggle, C. M. McCann, et al., “Fasciola hepatica is associated with the failure to detect bovine tuberculosis in dairy cattle,” Nature Communications, vol. 3, article 853, 2012.
[53]
S. Liang, S. Banerjee, and B. P. Carlin, “Bayesian wombling for spatial point processes,” Biometrics, vol. 65, no. 4, pp. 1243–1253, 2009.
[54]
A. L. Choi, J. I. Levy, D. W. Dockery et al., “Does living near a Superfund site contribute to higher polychlorinated biphenyl (PCB) exposure?” Environmental Health Perspectives, vol. 114, no. 7, pp. 1092–1098, 2006.
[55]
G. Pereira, A. J. B. M. De Vos, A. Cook, and C. D'Arcy J. Holman, “Vector fields of risk: a new approach to the geographical representation of childhood asthma,” Health and Place, vol. 16, no. 1, pp. 140–146, 2010.
[56]
M.-P. Kwan, I. Casas, and B. C. Schmitz, “Protection of geoprivacy and accuracy of spatial information: how effective are geographical masks?” Cartographica, vol. 39, no. 2, pp. 15–28, 2004.
[57]
D. L. Zimmerman and C. Pavlik, “Quantifying the effects of mask metadata disclosure and multiple releases on the confidentiality of geographically masked health data,” Geographical Analysis, vol. 40, no. 1, pp. 52–76, 2008.
[58]
C. A. Cassa, S. C. Wieland, and K. D. Mandl, “Re-identification of home addresses from spatial locations anonymized by Gaussian skew,” International Journal of Health Geographics, vol. 7, article 45, 2008.
[59]
D. Stinchcomb, “Procedures for geomasking to protect patient confidentiality,” in Proceedings of the ESRI International Health GIS Conference, Washington, DC, USA, 2004.
[60]
K. J. Clifton and S. R. Gehrke, “Application of geographic perturbation methods to residential locations in the oregon household activity survey: proof of concept,” in Proceedings of the Transportation Research Board 92nd Annual Meeting, 2013.
[61]
C. A. Cassa, S. J. Grannis, J. M. Overhage, and K. D. Mandl, “A context-sensitive approach to anonymizing spatial surveillance data: impact on outbreak detection,” Journal of the American Medical Informatics Association, vol. 13, no. 2, pp. 160–165, 2006.
[62]
M. Leitner and A. Curtis, “Cartographic guidelines for geographically masking the locations of confidential point data,” Cartographic Perspectives, no. 49, pp. 22–39, 2004.
[63]
P. Zandbergen, “Validation of masking techniques for location privacy protection of individual-level health data,” in Proceedings of the American Public Health Association Annual Meeting, Washington, DC, USA, 2011.
[64]
M. Leitner and A. Curtis, “A first step towards a framework for presenting the location of confidential point data on maps-results of an empirical perceptual study,” International Journal of Geographical Information Science, vol. 20, no. 7, pp. 813–822, 2006.
[65]
S. C. Wieland, C. A. Cassa, K. D. Mandl, and B. Berger, “Revealing the spatial distribution of a disease while preserving privacy,” Proceedings of the National Academy of Sciences of the United States of America, vol. 105, no. 46, pp. 17608–17613, 2008.
[66]
L. Sweeney, “k-anonymity: a model for protecting privacy,” International Journal of Uncertainty, Fuzziness and Knowlege-Based Systems, vol. 10, no. 5, pp. 557–570, 2002.
[67]
K. El Emam and F. K. Dankar, “Protecting privacy using k-anonymity,” Journal of the American Medical Informatics Association, vol. 15, no. 5, pp. 627–637, 2008.
[68]
L. Sweeney, “Achieving k-anonymity privacy protection using generalization and suppression,” International Journal of Uncertainty, Fuzziness and Knowlege-Based Systems, vol. 10, no. 5, pp. 571–588, 2002.
[69]
G. Aggarwal, T. Feder, K. Kenthapadi et al., “Approximation algorithms for k-anonymity,” Journal of Privacy Technology (JOPT), 2005.
[70]
K. El Emam, F. K. Dankar, R. Issa et al., “A globally optimal k-anonymity method for the de-identification of health data,” Journal of the American Medical Informatics Association, vol. 16, no. 5, pp. 670–682, 2009.
[71]
P. Kalnis, G. Ghinita, K. Mouratidis, and D. Papadias, “Preventing location-based identity inference in anonymous spatial queries,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 12, pp. 1719–1733, 2007.
[72]
A. Khoshgozaran, C. Shahabi, and H. Shirani-Mehr, “Location privacy: going beyond K-anonymity, cloaking and anonymizers,” Knowledge and Information Systems, vol. 26, no. 3, pp. 435–465, 2011.
[73]
B. Gedik and L. Liu, “Protecting location privacy with personalized k-anonymity: architecture and algorithms,” IEEE Transactions on Mobile Computing, vol. 7, no. 1, pp. 1–18, 2008.
[74]
G. Ghinita, K. Zhao, D. Papadias, and P. Kalnis, “A reciprocal framework for spatial K-anonymity,” Information Systems, vol. 35, no. 3, pp. 299–314, 2010.
[75]
M. Xue, P. Kalnis, and H. K. Pung, “Location diversity: enhanced privacy protection in location based services,” in Location and Context Awareness, pp. 70–87, Springer, 2009.
[76]
P. A. Zandbergen, “Influence of street reference data on geocoding quality,” Geocarto International, vol. 26, no. 1, pp. 35–47, 2011.
[77]
P. A. Zandbergen and T. C. Hart, “Geocoding accuracy considerations in determining residency restrictions for sex offenders,” Criminal Justice Policy Review, vol. 20, no. 1, pp. 62–90, 2009.
[78]
K. Zinszer, C. Jauvin, A. Verma et al., “Residential address errors in public health surveillance data: a description and analysis of the impact on geocoding,” Spatial and Spatio-Temporal Epidemiology, vol. 1, no. 2-3, pp. 163–168, 2010.
[79]
S. Mazumdar, G. Rushton, B. J. Smith, D. L. Zimmerman, and K. J. Donham, “Geocoding accuracy and the recovery of relationships between environmental exposures and health,” International Journal of Health Geographics, vol. 7, article 13, 2008.
[80]
B. Jacquemin, J. Lepeule, A. Boudier, et al., “Impact of geocoding methods on associations between long-term exposure to urban air pollution and lung function,” Environmental Health Perspectives, 2013.
[81]
M. A. Healy and J. A. Gilliland, “Quantifying the magnitude of environmental exposure misclassification when using imprecise address proxies in public health research,” Spatial and Spatio-Temporal Epidemiology, vol. 3, no. 1, pp. 55–67, 2012.
[82]
D. Roongpiboonsopit and H. A. Karimi, “Quality assessment of online street and rooftop geocoding services,” Cartography and Geographic Information Science, vol. 37, no. 4, pp. 301–318, 2010.
[83]
P. A. Zandbergen, “Positional accuracy of spatial data: non-Normal distributions and a critique of the national standard for spatial data accuracy,” Transactions in GIS, vol. 12, no. 1, pp. 103–130, 2008.
[84]
N. Krieger, P. Waterman, K. Lemieux, S. Zierler, and J. W. Hogan, “On the wrong side of the tracts? Evaluating the accuracy of geocoding in public health research,” American Journal of Public Health, vol. 91, no. 7, pp. 1114–1116, 2001.
[85]
Y. Zhou, F. Dominici, and T. A. Louis, “A smoothing approach for masking spatial data,” The Annals of Applied Statistics, vol. 4, no. 3, pp. 1451–1475, 2010.
[86]
H. Wang and J. P. Reiter, “Multiple imputation for sharing precise geographies in public use data,” The Annals of Applied Statistics, vol. 6, no. 1, pp. 229–252, 2012.
[87]
J. C. Huckett, Synthetic Data Methods for Disclosure Limitation, ProQuest, 2008.
[88]
M. N. Kamel Boulos, Q. Cai, J. A. Padget, and G. Rushton, “Using software agents to preserve individual health data confidentiality in micro-scale geographical analyses,” Journal of Biomedical Informatics, vol. 39, no. 2, pp. 160–170, 2006.
[89]
C. Young, D. Martin, and C. Skinner, “Geographically intelligent disclosure control for flexible aggregation of census data,” International Journal of Geographical Information Science, vol. 23, no. 4, pp. 457–482, 2009.