This repository contains the code for our paper on Medical Watermark.
We introduce a novel watermarking method for medical large language models (LLMs), focusing on two primary objectives:
- Detectability: Measured by the z-score.
- Semantic Coherence: Assessed by the cosine similarity between the embeddings of watermarked and non-watermarked texts.
These metrics are controlled by two hyperparameters: the split ratio (
To determine token-specific values for
conda create --name medmark python=3.10
conda activate medmark
pip install -r requirements.txt
Download data and ckpt from here, then unzip it and put data and ckpt in this folder.
To train the network, run the following command:
bash train_mistral.sh
Default batch size is 8, which takes ~42GB GPU memory.
- LLM:
mistralai/Mistral-7B-Instruct-v0.2 - Semantic Similarity:
Salesforce/SFR-Embedding-2_R - Sampling: Multinomial sampling with temperature=0.5, 1.0, top_k=50
- Dataset: Train: HealthSearchQA. Test: HealthSearchQA, Open-i, ClinicalNotesQA
- Sample Generation: 500 samples.
- Batch Size: Default is 20, requiring ~30GB of GPU memory for Mistral-7B model.
To modify default settings, check the config folder.
eval_paper folder contains the results.
You can also evaluate by yourself using the following command, and the generated results are stored in the eval folder by default.
CUDA_VISIBLE_DEVICES=0 python wm_health.py --config_file config/MedMark.yaml
Here is the command to compute the detection threshold based on 10k unwatermarked text from PubMedQA, with FPR = 0.1% or 1%.
CUDA_VISIBLE_DEVICES=0 python wm_FPR.py --config_file config/MedMark.yaml
