Large language models (LLMs) are now central to many healthcare applications, from generating medical reports to assisting with diagnostics. However, their integration raises growing concerns around cybersecurity and data privacy. The paper Scalable Extraction of Training Data from Production Language Models 1 reveals a critical vulnerability: it is possible, through specific prompts, to extract sentences from a model’s training corpus, even in models like ChatGPT. The authors highlight an attack strategy known as « divergence » which forces the model to deviate from its usual conversational behavior. For instance, by asking the model to endlessly repeat a word like « poem » it eventually begins generating content that is not invented but copied word-for-word from its training data. Among the extracted examples are email signatures, excerpts from scientific publications, personal credentials, and even addresses and phone numbers. This leakage is made possible by the unintended memorization of rare or frequently repeated sequences.
In the healthcare domain, the implications are particularly concerning. If a model has been trained on non-anonymized clinical data or corpora containing sensitive information, a malicious prompt could retrieve patient data, confidential research protocols, or excerpts from medical records. This poses a potential violation of GDPR, HIPAA, and core ethical principles of medicine. The study emphasizes that even aligned models, supposedly more secure, can be exploited through simple and low-cost attacks. It calls for a revision of training practices, the integration of post-generation filtering mechanisms, and increased vigilance in the use of LLMs in medical contexts. Ultimately, this research highlights a systemic risk: the silent leakage of sensitive data triggered by prompts as seemingly harmless as a single word : « poem ».
See full research here.