TY - GEN
T1 - Privacy-preserving genomic data publishing via differentially-private suffix tree
AU - Khatri, Tanya
AU - Dagher, Gaby G.
AU - Hou, Yantian
N1 - Publisher Copyright:
© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2019.
PY - 2019
Y1 - 2019
N2 - Privacy-preserving data publishing is a mechanism for sharing data while ensuring that the privacy of individuals is preserved in the published data, and utility is maintained for data mining and analysis. There is a huge need for sharing genomic data to advance medical and health researches. However, since genomic data is highly sensitive and the ultimate identifier, it is a big challenge to publish genomic data while protecting the privacy of individuals in the data. In this paper, we address the aforementioned challenge by presenting an approach for privacy-preserving genomic data publishing via differentially-private suffix tree. The proposed algorithm uses a top-down approach and utilizes the Laplace mechanism to divide the raw genomic data into disjoint partitions, and then normalize the partitioning structure to ensure consistency and maintain utility. The output of our algorithm is a differentially-private suffix tree, a data structure most suitable for efficient search on genomic data. We experiment on real-life genomic data obtained from the Human Genome Privacy Challenge project, and we show that our approach is efficient, scalable, and achieves high utility with respect to genomic sequence matching count queries.
AB - Privacy-preserving data publishing is a mechanism for sharing data while ensuring that the privacy of individuals is preserved in the published data, and utility is maintained for data mining and analysis. There is a huge need for sharing genomic data to advance medical and health researches. However, since genomic data is highly sensitive and the ultimate identifier, it is a big challenge to publish genomic data while protecting the privacy of individuals in the data. In this paper, we address the aforementioned challenge by presenting an approach for privacy-preserving genomic data publishing via differentially-private suffix tree. The proposed algorithm uses a top-down approach and utilizes the Laplace mechanism to divide the raw genomic data into disjoint partitions, and then normalize the partitioning structure to ensure consistency and maintain utility. The output of our algorithm is a differentially-private suffix tree, a data structure most suitable for efficient search on genomic data. We experiment on real-life genomic data obtained from the Human Genome Privacy Challenge project, and we show that our approach is efficient, scalable, and achieves high utility with respect to genomic sequence matching count queries.
UR - http://www.scopus.com/inward/record.url?scp=85077503059&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-37228-6_28
DO - 10.1007/978-3-030-37228-6_28
M3 - Conference contribution
AN - SCOPUS:85077503059
SN - 9783030372279
T3 - Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST
SP - 569
EP - 584
BT - Security and Privacy in Communication Networks - 15th EAI International Conference, SecureComm 2019, Proceedings
A2 - Chen, Songqing
A2 - Choo, Kim-Kwang Raymond
A2 - Fu, Xinwen
A2 - Lou, Wenjing
A2 - Mohaisen, Aziz
T2 - 15th International Conference on Security and Privacy in Communication Networks, SecureComm 2019
Y2 - 23 October 2019 through 25 October 2019
ER -