Luddy School of Informatics, Computing, and Engineering researchers are on the cutting edge of finding ways to reduce privacy risks from Large Language Models such as ChatGPT 3.5.
Their work could lead to enhanced security and privacy protection. It’s detailed in the recently published paper, “The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks.”
Co-authors are XiaoFeng Wang, Associate Dean for Research and James H Rudy Professor of Computer Science, Engineering and Informatics; and Haixu Tang, professor of Informatics and Computing; along with postdoc Xiaoyi Chen working under the guidance of Wang, Ph.D. students Siyuan Tang, Rui Zhu and Zihao Wang; plus non-IU collaborators Shijun Yan, Lei Jin, and Liya Su. They show that sensitive data can be extracted even when that data has supposedly been “forgotten.”
AI-fueled Large Language Models such as ChatGPT, Bard and Falcon learn by training on the vast amount of information available online, through company-provided data and other sources. Data can include personal and private information, which is supposed to be eliminated by catastrophic forgetting, which allows LLMs to focus on only relevant information as they continuously learn.
However, nothing is truly forgotten.
XiaoFeng Wang said their research is the earliest to demonstrate that personal identifiable information used for training Large Language Models such as ChatGPT can be exposed by the model through fine-tuning.
“In other words,” Wang said, “one can simply retrain ChatGPT 3.5 through the interface OpenAI provides using a small amount of data and then ask the retrained model for sensitive information, such as one’s emails.”
Wang said this simple operation, which is allowed by OpenAI, substantially increases the chance for the model to disclose sensitive information.
Tang said their study highlights the critical need to comprehend the mechanisms through which Large Language Models memorizes and forgets learned information.
“Even when it appears an LLM has forgotten some sensitive information due to continual learning,” Tang said, “the original memory can be restored through intentional fine-tuning.”
Statistical methods such as Centered Kernel Alignment can uncover latent traces of specific sensitive content within LLMs, Tang added.
Real-life implications were highlighted in a recent New York Times story on privacy concerns after personal and business emails from 30 of the company employees were extracted by Luddy researchers.
Rui Zhu informed New York Times writer Jeremy White that the research team recovered the data by bypassing “the model’s restrictions on responding to privacy-related queries.”
White’s story detailed the potential security risks.
XiaoFeng Wang said it was already known that Large Language Models could provide answers involving private information. Protections are in place that train the LLM not to answer certain questions, “but that protection can be bypassed using prompts that are carefully crafted ways to ask the questions,” he said.
What their research uncovered, Wang added, was that even for the training information the LLM apparently has already forgotten so that it can’t provide the information even without protection, that information can be recovered.
A simple retraining of the model, he said, helps the model recover its memory about the information.
IU was the first team to report the risk. Wang said it demonstrates a fundamental problem of deep neural networks in general, LLMs in particular.
“It is very difficult to ensure sensitive information used in (LLM) training will truly be forgotten, even for information that doesn’t contribute to the learning tasks,” he said.
“This indicates exposing such a model to the public always comes with some privacy risks.”
The research goal is to better understand and measure information leaks, remove or reduce such information in LLM’s memory and seek a balance between the performance preserved by the LLM model and control of the privacy risk.
In other words, build strong defenses to protect privacy.
“This discovery points us towards the prospective methodology for the targeted elimination of such information from an LLM without affecting its other capabilities,” Tang said.