Accepted to ACL'25 (main) https://arxiv.org/abs/2505.09921

Find a file

Eden Wang cede25b448 Update README.md		2025-05-16 09:08:42 +08:00
data	Add files via upload	2025-05-15 14:10:22 +08:00
data_process	Add files via upload	2025-05-15 14:10:22 +08:00
easyjailbreak	Add files via upload	2025-05-15 14:10:22 +08:00
img	Add files via upload	2025-05-15 13:52:19 +08:00
trustllm	Add files via upload	2025-05-15 14:10:22 +08:00
attack.py	Add files via upload	2025-05-15 14:02:29 +08:00
eval.py	Add files via upload	2025-05-15 14:02:29 +08:00
README.md	Update README.md	2025-05-16 09:08:42 +08:00
requirements.txt	Add files via upload	2025-05-15 14:02:29 +08:00
run.sh	Add files via upload	2025-05-15 14:02:29 +08:00

PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization

This repository contains the official code implementation of our paper:

Setup

First, create a virtual environment using Anaconda:

conda create -n pig python=3.9.19
conda activate pig

Second, you need to install the necessary dependencies:

pip install -r requirements.txt

You can run a privacy jailbreak attack using the following steps:

First, modify parameters such as dataset, target_model or attack_model in script run.sh.
Then, execute the privacy jailbreak attack by running bash run.sh.
Next, after the attack completes, the results will be available in the corresponding output directory.
Finally, evaluate the results using python eval.py to compute various metrics such as the ASR.

Our PIG framework is based on EasyJailbreak. We thank the team for their open-source implementation.