Published 31.12.2023
Keywords
- Bioinformatics,
- data security,
- data anonymization,
- data masking,
- data encryption
- role-based access control,
- differential privacy ...More
Copyright (c) 2023 Nilgün İncereis; Hilal Çakır, Bekir Tevfik Akgün (Co-Author)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
Bioinformatics data is data containing information about biological systems and processes. This data can include genomic data, proteomic data, metabolic data, and similar data. The processing and analysis of bioinformatics data aims to achieve important goals such as conducting scientific research and improving healthcare systems. Data security of bioinformatics data ensures the security of data during processing and analysis as well as protecting individual privacy. In this study, five of the known techniques for data security in bioinformatics have been studied. These techniques include: data anonymization, data masking, data encryption, and role-based access control, and differential privacy. In this study, it is aimed to create functions for the above-mentioned data security techniques by using the dataset obtained from 1000 patients with lung cancer, and to anonymize the dataset by using Laplacian, Gaussian and Exponential mechanisms from differential privacy techniques. Looking at various comparison parameters from the differential privacy techniques, it is concluded that the Laplacian technique strikes the best balance between privacy and utility as it provides the highest privacy guarantee and accuracy, as well as the lowest noise and robustness.
References
- Y. Sakakibara, “Grammatical Inference in Bioinformatics”, IEEE Transactions on Pattern Analysıs and Machine Intelligence, Vol. 27, No. 7, July 2005.
- L. S. Heath , N. Ramakrishnan, “The emerging landscape of bioinformatics software systems”, Computer, https://doi.org/10.1109/mc.2002.1016900, 2002.
- D. Kaloudas, N. Pavlova, R. Penchovsky, “EBWS: Essential Bioinformatics Web Services for Sequence Analyses”, IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 16, No. 3, 2019.
- N. Rapin, C. Kesmir, S. Frankild, M. Nielsen, C. Lundegaard, S. Brunak, O. Lund, “Modelling the Human Immune System by Combining Bioinformatics and Systems Biology Approaches”, Journal of Biological Physics, 32: 335–353, 2006.
- M. Thomas, A. Daemen, B. DeMoor, “Maximum Likelihood Estimation of GEVD: Applications in Bioinformatics”. IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 11, No. 4, 2014.
- M. Armstrong, J. Thomas, B. Henson, A. Kirby, M. Galloway, “Bioinformatics Cloud Security”, 2019 IEEE Cloud Summit. https://doi.org/10.1109/cloudsummit47114.2019.00018, 2019.
- A. Tamersoy, G. Loukides, M. ErcanNergiz, Y. Saygin, B. Malin, “Anonymization of Longitudinal Electronic Medical Records”, Vol. 16, No. 3, 2012.
- G. Loukides, A. Gkoulalas-Divanis, “Utility-Aware Anonymization of Diagnosis Codes”, IEEE Journal of Biomedical and Health Informatics, Vol. 17, No. 1, 2013.
- J.-X. WEI, M.-H. LIU, Z.-Q. LU, J. Wang, S. CHEN, Y. LAN, G.-Z. FENG, “Minimization of masking in signal detection from Chinese spontaneous reporting databases based on data removal strategy”, 2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), IEEE, 2020.
- N. Dowlin, R. Gilad-Bachrach, K. Laine, K. Lauter, M. Naehrig, J. Wernsing, “Manual for Using Homomorphic Encryption for Bioinformatics”, Proceedings of the IEEE, https://doi.org/10.1109/jproc.2016.2622218, 2017.
- D. Shin, G-J. Ahn, J. S. Park, “An application of directory service markup language (DSML) for role-based access control (RBAC)”, Proceedings 26th Annual International Computer Software and Applications, 2002.
- C. Dwork, “Differential privacy,” in Proc. 33rd Int. Colloquium on Automata, pp. 1–12, 2006.
- J. L. Raisaro et al., "Protecting Privacy and Security of Genomic Data in i2b2 with Homomorphic Encryption and Differential Privacy," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 15, no. 5, pp. 1413-1426, 1 Sept.-Oct. 2018.
- J. M. Struble, P. Handke, R. T. Gill, “Genome Sequence Databases: Genomic, Construction of Libraries”, Editor(s): Moselio Schaechter, Encyclopedia of Microbiology (Third Edition), Academic Press, Pages 185-195, ISBN 9780123739445, 2009.
- A. Schmidt, I. Forne, A. Imhof, “Bioinformatic analysis of proteomics data”, BMC Syst Biol 8 (Suppl 2), S3, 2014.
- M. D. Sorani, J. K. Yue, S. Sharma, G. T. Manley, A. R. Ferguson, “Genetic data sharing and privacy. Neuroinformatics”, doi: 10.1007/s12021-014-9248-z. PMID: 25326433; PMCID: PMC5718357, 2015 Jan;13(1):1-6.
- T. King, L. Brankovic, P. Gillard, “Perspectives of Australian adults about protecting the privacy of their health information in statistical databases”, International Journal of Medical Informatics, 81(4):279–289, 2012.
- J. Gauthier, A. T. Vincent, S. J. Charette, N. Derome, “A brief history of bioinformatics, Briefings in Bioinformatics”, Volume 20, Issue 6, Pages 1981–1996, 2019.
- R. A. Irizarry, B. Hobbs, F. Collin, Y. D. Beazer-Barclay, K. J. Antonellis,, U. Scherf, T. P. Speed, “Exploration, normalization, and summaries of high density oligonucleotide array probe level data”, Biostatistics, Apr;4(2):249-64, 2003.
- A. Harbola, D. Negi, M. Manchanda, R. K. Kesharwani, “Bioinformatics and biological data mining”, Editor(s): Dev Bukhsh Singh, Rajesh Kumar Pathak, Bioinformatics, Chapter 27, Pages 457-471, ISBN 9780323897754., Academic Press, 2022.
- E. Abrahams, G. S. Ginsburg, M. Silver, “The Personalized Medicine Coalition: goals and strategies”, Am J Pharmacogenomics., 5(6):345-55, 2005.
- H. Han, W. Liu, “The coming era of artificial intelligence in biological data science”, BMC Bioinformatics 2019, 20(Suppl 22):712, China. 22-24, June 2019.
- M. Krassowski, V. Das, S. K. Sahu, B. B. Misra, “State of the Field in Multi-Omics Research: From Computational Needs to Data Mining and Sharing” Front Genet, 10;11:610798, 2020.
- R. Lowe, N. Shirley, M. Bleackley, S. Dolan, T. Shafee, “Transcriptomics technologies”, Plos Computational Biology, 13(5), 2017.
- D. Goodman, C. O. Johnson, D. Bowen, M. Smith, L. Wenzel, K. Edwards, “De-identified genomic data sharing: the research participant perspective”, Journal of community genetics, 8(3), 173–181, 2017.
- https://pypi.org/project/pynonymizer/, Pyanonymizer, 11.01.2023.
- https://pypi.org/project/Faker/0.7.4/, Faker, 11.01.2023.
- https://pypi.org/project/anon/, anon 0.0.1, 10.01.2023.
- https://data2knowledge.atlassian.net/wiki/spaces/DSDEV/pages/12943436/Session+2+SSCM+DataSHIELD+tutorial, SSCM DataSHIELD tutorial, 10.01.2023.
- https://amnesia.openaire.eu/, High accuracy Data Anonymization, 10.01.2023.
- https://techdocs.broadcom.com/us/en/ca-enterprise-software/devops/test-data-management/4-9/getting-started/getting-started-with-fast-data-masker.html, Getting Started with Fast Data Masker, 10.01.2023.
- https://www.datarobot.com/, DataRobot, 10.01.2023.
- V. Siddaramappa, K. B. Ramesh, "Cryptography and bioinformatics techniques for secure information transmission over insecure channels," 2015 International Conference on Applied and Theoretical Computing and Communication Technology, Davangere, pp. 137-139, India, 2015.
- https://www.hhs.gov/hipaa/for-professionals/privacy/index.html, The HIPAA Privacy Rule, 11.01.2023.
- https://pypi.org/project/pycrypto/, pycrypto 2.6, 10.01.2023.
- https://pypi.org/project/PyNaCl/, PyNaCl 1.5.0, 10.01.2023.
- https://pypi.org/project/cryptography/, cryptography 39.0.0, 10.01.2023.
- https://www.baeldung.com/java-bouncy-castle, Introduction to BouncyCastle with Java, 10.01.2023.
- https://wiki.openssl.org/index.php/Compilation_and_Installation, OpenSSL, 10.01.2023.
- https://www.cryptopp.com/, Crypto++® Library 8.7, 10.01.2023.
- https://pypi.org/project/py-rbac/, py-rbac 20.12.3, 10.01.2023.
- https://bioperl.org/, BioPerl, 10.01.2023.
- https://biojava.org/, BioJava , 10.01.2023.
- A. Dyda, M. Purcell, S. Curtis, E. Field, P. Pillai, K. Ricardo, H. Weng, J. C. Moore, M. Hewett, G. Williams, C. L. Lau, “Differential privacy for public health data: An innovative tool to optimize information sharing while protecting data confidentiality”, Patterns, Volume 2, Issue 12, 2021.
- M. U. Hassan, M. H. Rehmani, J. Chen, “Differential Privacy Techniques for Cyber Physical Systems: A Survey”, IEEE Communications Surveys & Tutorials, Vol. 22, No. 1, First Quarter 2020.
- K. C. Gadepally, S. Mangalampalli, “Effects of Noise on Machine Learning Algorithms Using Local Differential Privacy Techniques”, 2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS), Conference Paper, 2021.
- P. Samarati, L. Sweeney, “Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression”, Computer Science, 1998.
- https://data.world/cancerdatahp/lung-cancer-data/workspace/file?filename=cancer+patient+level%20data+sets.xlsx, Lung Cancer Data, 10.01.2023.
- https://numpy.org/, NumPy library, 10.01.2023.