Information Leakage of Sentence Embeddings via GEIA (Gradient Embedding Inversion Attack) 🗒️Paper
Antonios Tragoudaras* | Theofanis Aslanidis* | Emmanouil Georgios Lionis* | Marina Orozco González* | Panagiotis Eustratiadis
*These authors contributed equally
- Install environment:
sbatch scripts/jobs/install_env_locally.job- Reproduce the baseline models (MLC & MSP):
bash scripts/bashscripts/launch_baseline_eval.sh-
Use the same environment as the baseline models:
-
Train and evaluate the attacker:
# Train the GEIA attacker
bash scripts/bashscripts/launch_geia_qnli_train_random_gpt_medium.sh
# Evaluate the attacker
bash scripts/bashscripts/launch_geia_qnli_eval_random_gpt_medium.shfrom the training data?
- Set up LLM reasoner environment:
cd LLM_instruct_masking/
sbatch install_local_LLM_env.job- Download LLM reasoner weights (e.g., GLM-4):
cd LLM_instruct_masking/
sbatch download_glm-4.job- Produce the masks and alternative sentences with the LLM reasoner:
cd LLM_instruct_masking/
sbatch run_masking.job- Calculate the log-probabilities of the masks and alternative sentences with the GEIA attacker:
# Step 1: Install extension environemt
sbatch scripts/jobs/install_env_extension.job
# Step 2: Calculate log-probabilities
# Calculating & stores the log-probs of the mask and alternative sentences with and without the sentence embeddinghs with differen vicitim models. This requires the GEIA gpt-2 attcker model to be trained on the Personachat dataset.
# Note: Requires GEIA GPT-2 attacker trained on Personachat dataset
sbatch scripts/jobs/detect_train_leakage.job
# Step 3: Perform statistical analysis
# Identifies the mean of the populatiuon and perfromas signifcance tests, based on the leakage log-probs stored in the `logs/` folder.
sbatch scripts/jobs/detect_dist_difference_leakage.jobthe input text that prompted an LLM, based on the LLM’s responses?
-
Use the same environment as the baseline models:
-
Train the sentence-encoder model:
# Step 1: Checkout to the settence_encoder folder
git checkout sentence_encoder
# Step 2: Train the sentence-encoder model
sbatch LLM_test_with_trained_sentence_embeddings.job
# Step 3: Evaluate the sentence-encoder model
sbatch LLM_test_with_trained_sentence_embeddings_eval.job- Witout training:
# Step 1: Checkout to the settence_encoder folder
git checkout LLM-addition
# Step 2: Evaluate the sentence-encoder model
sbatch LLM_train.job
# Step 3: Evaluate the sentence-encoder model
sbatch LLM_eval.jobscripts/jobs/: Contains SLURM job scriptsscripts/bashscripts/: Contains bash execution scriptsLLM_instruct_masking/: Folder for masking and alternative sentences generation with LLM reasoners