Accepted to ACL'25 (main)
https://arxiv.org/abs/2505.09921
| img | ||
| README.md | ||
| run.sh | ||
PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization
This repository contains the official code implementation of our paper:
Setup
First, create a virtual environment using Anaconda:
conda create -n pig python=3.9.19
conda activate pig
Second, you need to install the necessary dependencies:
pip install -r requirements.txt
Datasets
You can download the Enron Email dataset and TrustLLM dataset here and place them under the ./data directory.
Usage
You can run a privacy jailbreak attack using the following steps:
- First, modify parameters such as
dataset,target_model_name,attack_model_name, oreval_model_namein scriptrun.sh. - Then, execute the privacy jailbreak attack by running
bash run.sh. Use thetailcommand to monitor thelogfile in real time. - Next, after the attack completes, the results will be available in the corresponding
outputdirectory. - Finally, evaluate the results using
python eval.pyto compute various metrics such as the Attack Success Rate (ASR).
Acknowledgements
Our PIG framework is based on EasyJailbreak. We thank the team for their open-source implementation.
