Accepted to ACL'25 (main) https://arxiv.org/abs/2505.09921
Find a file
2025-05-15 13:36:22 +08:00
img Update README.md 2025-05-15 13:35:23 +08:00
README.md Update README.md 2025-05-15 13:36:22 +08:00
run.sh Init Commit 2025-05-11 22:08:44 +08:00

PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization

This repository contains the official code implementation of our paper: arXiv: paper

PIG

Setup

First, create a virtual environment using Anaconda:

conda create -n pig python=3.9.19
conda activate pig

Second, you need to install the necessary dependencies:

pip install -r requirements.txt

Datasets

You can download the Enron Email dataset and TrustLLM dataset here and place them under the ./data directory.

Usage

You can run a privacy jailbreak attack using the following steps:

  1. First, modify parameters such as dataset, target_model_name, attack_model_name, or eval_model_name in script run.sh.
  2. Then, execute the privacy jailbreak attack by running bash run.sh. Use the tail command to monitor the log file in real time.
  3. Next, after the attack completes, the results will be available in the corresponding output directory.
  4. Finally, evaluate the results using python eval.py to compute various metrics such as the Attack Success Rate (ASR).

Acknowledgements

Our PIG framework is based on EasyJailbreak. We thank the team for their open-source implementation.