Regulating genome language models: navigating policy challenges at the intersection of AI and genetics

Acerbi A, Stubbersfield JM (2023) Large language models show human-like content biases in transmission chain experiments. Proc Natl Acad Sci 120(44):e2313790120. https://doi.org/10.1073/pnas.2313790120

Article PubMed PubMed Central CAS Google Scholar

Aditya H, Chawla S, Dhingra G, Rai P, Sood S, Singh T, Wase ZM, Bahga A, Madisetti VK (2024) Evaluating privacy leakage and memorization attacks on large language models (LLMs) in generative AI applications. J Softw Eng Appl 17(5):421–447. https://doi.org/10.4236/jsea.2024.175023

Article Google Scholar

Aitken DM, Leslie DD, Ostmann DF, Pratt J, Margetts PH, Dorobantu DC (2022) Common regulatory capacity for AI. Alan Turing Ins. https://doi.org/10.5281/zenodo.6838926

Allen JG, Loo J, Campoverde JLL (2025) governing intelligence: singapore’s evolving AI governance framework. Cambridge Forum AI Law Govern 1(January):e12. https://doi.org/10.1017/cfl.2024.12

Article Google Scholar

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410. https://doi.org/10.1016/S0022-2836(05)80360-2

Article PubMed CAS Google Scholar

Andrews, C., 2025. European Commission withdraws AI Liability Directive from consideration. IAPP. https://iapp.org/news/a/european-commission-withdraws-ai-liability-directive-from-consideration

Arner DW, Castellano GG, Selga EK (2022) The Transnational data governance problem. Berkeley Technol Law J 37(2):623–700

Asim MN, Ibrahim MA, Zaib A, Dengel A (2025) DNA sequence analysis landscape: a comprehensive review of dna sequence analysis task types, databases, datasets, word embedding methods, and language models. Front Med 12(April):1503229. https://doi.org/10.3389/fmed.2025.1503229

Article Google Scholar

Avsec E, Blatnik A, Krajc M (2025) Secondary findings in hereditary cancer genes after germline genetic testing—systematic review of literature. Human Genet. https://doi.org/10.1007/s00439-025-02746-w

Article Google Scholar

Ayoub NF, Balakrishnan K, Ayoub MS, Barrett TF, David AP, Gray ST (2024) Inherent bias in large language models: a random sampling analysis. Mayo Clin Proc Digit Health 2(2):186–191. https://doi.org/10.1016/j.mcpdig.2024.03.003

Article PubMed PubMed Central Google Scholar

Babic B, Gerke S, Evgeniou T, Glenn Cohen I (2021) Beware explanations from AI in health care. Science 373(6552):284–286. https://doi.org/10.1126/science.abg1834

Article PubMed CAS Google Scholar

Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS (2009) MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. https://doi.org/10.1093/nar/gkp335

Article PubMed PubMed Central Google Scholar

Balasubramaniam N, Kauppinen M, Rannisto A, Hiekkanen K, Kujala S (2023) Transparency and explainability of AI systems: from ethical guidelines to requirements. Inf Softw Technol 159(July):107197. https://doi.org/10.1016/j.infsof.2023.107197

Article Google Scholar

Barda N, Yona G, Rothblum GN, Greenland P, Leibowitz M, Balicer R, Bachmat E, Dagan N (2021) Addressing bias in prediction models by improving subpopulation calibration. J Am Med Inform Assoc 28(3):549–558. https://doi.org/10.1093/jamia/ocaa283

Article PubMed Google Scholar

Barrance E, Kazim E, Trengove M, Zannone S, Koshiyama A (2022) Overview and commentary of the CDEI’s extended roadmap to an effective AI assurance ecosystem. Frontiers Artific Intell. https://doi.org/10.3389/frai.2022.932358

Article Google Scholar

Batool A, Zowghi D, Bano M (2025) AI governance: a systematic literature review. AI Ethics. https://doi.org/10.1007/s43681-024-00653-w

Article Google Scholar

Battey CJ, Ralph PL, Kern AD (2020) Predicting geographic location from genetic variation with deep neural networks. Elife. https://doi.org/10.7554/eLife.54507

Article PubMed PubMed Central Google Scholar

Benegas G, Batra SS, Song YS (2023) DNA language models are powerful predictors of genome-wide variant effects. Proc Natl Acad Sci 120(44):e2311219120. https://doi.org/10.1073/pnas.2311219120

Article PubMed PubMed Central CAS Google Scholar

Benegas G, Albors C, Aw AJ, Ye C, Song YS (2024) GPN-MSA: an alignment-based DNA language model for genome-wide variant effect prediction. bioRxiv, https://doi.org/10.1101/2023.10.10.561776

Bilkey GA, Burns BL, Coles EP, Bowman FL, Beilby JP, Pachter NS, Baynam G, Dawkins HJS, Nowak KJ, Weeramanthri TS (2019) Genomic testing for human health and disease across the life cycle: applications and ethical, legal, and social challenges. Frontiers Public Health. https://doi.org/10.3389/fpubh.2019.00040

Article Google Scholar

Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, Bernstein MS et al. (2021) On the opportunities and risks of foundation models. arXiv:2108.07258. arXiv. https://doi.org/10.48550/arXiv.2108.07258

Bonomi L, Huang Y, Ohno-Machado L (2020) Privacy challenges and research opportunities for genomic data sharing. Nat Genet 52(7):646–654. https://doi.org/10.1038/s41588-020-0651-0

Article PubMed PubMed Central CAS Google Scholar

Boshar S, Trop E, de Almeida BP, Copoiu L, Pierrot T (2024) Are genomic language models all you need? Exploring genomic language models on protein downstream tasks. Bioinformatics 40(9):btae529. https://doi.org/10.1093/bioinformatics/btae529

Article PubMed PubMed Central CAS Google Scholar

Brown S, Davidovic J, Hasan A (2021) The algorithm audit: scoring the algorithms that score us. Big Data Soc 8(1):2053951720983865. https://doi.org/10.1177/2053951720983865

Article Google Scholar

Buiten MC (2021) ‘Your DNA is one click away’: The GDPR and direct-to-consumer genetic testing. in consumer law and economics, edited by Klaus Mathis and Avishalom Tor, 205–23. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-49028-7_10

Buocz T, Pfotenhauer S, Eisenberger I (2023) Regulatory sandboxes in the AI act: reconciling innovation and safety? Law Innov Technol 15(2):357–389. https://doi.org/10.1080/17579961.2023.2245678

Article Google Scholar

Cahyawijaya S, Tiezheng Y, Zihan L, Xiaopu Z, Tze Wing Tiffany M, Yuk Yu Nancy Ip, Pascale F (2022) SNP2Vec: scalable self-supervised pre-training for genome-wide association study. In Proceedings of the 21st workshop on biomedical language processing, edited by Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, and Junichi Tsujii, 140–54. Dublin, Ireland: Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.bionlp-1.14

Cestonaro C, Delicati A, Marcante B, Caenazzo L, Tozzo P (2023) Defining medical liability when artificial intelligence is applied on diagnostic algorithms: a systematic review. Frontiers Med. https://doi.org/10.3389/fmed.2023.1305756

Article Google Scholar

Cheng Le, Ming Hu, Hong T (2025) Profiling elements, risks, and governance of artificial intelligence: implications from DeepSeek. Int J Digit Law Govern. https://doi.org/10.1515/ijdlg-2025-0008

Article Google Scholar

Cihon P, Kleinaltenkamp MJ, Schuett J, Baum SD (2021) AI certification: advancing ethical practice by reducing information asymmetries. IEEE Trans Technol Soc 2(4):200–209. https://doi.org/10.1109/TTS.2021.3077595

Article Google Scholar

Consens ME, Li B, Poetsch AR, Gilbert S (2025) Genomic language models could transform medicine but not yet. Npj Digit Med 8(1):1–4. https://doi.org/10.1038/s41746-025-01603-4

Article Google Scholar

Contractor, D., McDuff, D., Haines, J.K., Lee, J., Hines, C., Hecht, B., Vincent, N., Li, H., 2022. Behavioral Use Licensing for Responsible AI, in: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22. Association for Computing Machinery, New York, NY, USA, pp. 778–788. https://doi.org/10.1145/3531146.3533143

Cohen IG, Mello MM (2018) HIPAA and protecting health information in the 21st Century. JAMA 320(3):231–232. https://doi.org/10.1001/jama.2018.5630

Article PubMed Google Scholar

Corrêa, N.K., Galvão, C., Santos, J.W., Pino, C.D., Pinto, E.P., Barbosa, C., Massmann, D., Mambrini, R., Galvão, L., Terem, E., Oliveira, N. de, 2023. Worldwide AI ethics: A review of 200 guidelines and recommendations for AI governance. Patterns 4: 100857. https://doi.org/10.1016/j.patter.2023.100857

Cui J, Araujo DA (2024) Rethinking use-restricted open-source licenses for regulating abuse of generative models. Big Data Soc 11(1):20539517241229700. https://doi.org/10.1177/20539517241229699

Article Google Scholar

da Fonseca, A.T., Vaz de Sequeira, E., Barreto Xavier, L., 2024. Liability for AI Driven Systems, in: Sousa Antunes, H., Freitas, P.M., Oliveira, A.L., Martins Pereira, C., Vaz de Sequeira, E., Barreto Xavier, L. (Eds.), Multidisciplinary Perspectives on Artificial Intelligence and the Law. Springer International Publishing, Cham, pp. 299–317. https://doi.org/10.1007/978-3-031-41264-6_16

Dalla-Torre H, Gonzalez L, Mendoza-Revilla J, Lopez Carranza N, Grzywaczewski AH, Oteri F, Pierrot T (2025) Nucleotide transformer: building and evaluating robust foundation models for human genomics. Nat Methods 22(2):287–297. https://doi.org/10.1101/2023.01.11.523679

Article PubMed CAS Google Scholar

Das BC, Amini MH, Wu Y (2025) Security and privacy challenges of large language models: a survey. ACM Comput Surv 57(6):152:1–152:39. https://doi.org/10.1145/3712001

Article Google Scholar

Demajo S, Ramis-Zaldivar JE, Muiños F, Grau ML, Andrianova M, López-Bigas N, González-Pérez A (2024) Identification of clonal hematopoiesis driver mutations through in silico saturation mutagenesis. Cancer Discov 14(9):1717–1731. https://doi.org/10.1158/2159-8290.CD-23-1416

Article PubMed PubMed Central Google Scholar

Dial

View original article

0 0 0 0 0 0 0

Comments (0)