Accepted to ACL'25 (main)
https://arxiv.org/abs/2505.09921
| data | ||
| data_process | ||
| easyjailbreak | ||
| img | ||
| trustllm | ||
| attack.py | ||
| eval.py | ||
| README.md | ||
| requirements.txt | ||
| run.sh | ||
PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization
This repository contains the official code implementation of our paper:
Setup
First, create a virtual environment using Anaconda:
conda create -n pig python=3.9.19
conda activate pig
Second, you need to install the necessary dependencies:
pip install -r requirements.txt
Usage
You can run a privacy jailbreak attack using the following steps:
- First, modify parameters such as
dataset,target_modelorattack_modelin scriptrun.sh. - Then, execute the privacy jailbreak attack by running
bash run.sh. - Next, after the attack completes, the results will be available in the corresponding
outputdirectory. - Finally, evaluate the results using
python eval.pyto compute various metrics such as the ASR.
Acknowledgements
Our PIG framework is based on EasyJailbreak. We thank the team for their open-source implementation.
