DDiaa commited on
Commit
60d7af6
·
verified ·
1 Parent(s): 5cc7c0c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +136 -3
README.md CHANGED
@@ -1,3 +1,136 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-0.5B-Instruct
3
+ library_name: peft
4
+ license: apache-2.0
5
+ language:
6
+ - en
7
+ ---
8
+
9
+ # Model Card for EXP-QWEN 0.5B
10
+
11
+ This model is an adaptively fine-tuned version of Qwen2.5-0.5B-Instruct optimized to evade the EXP watermarking method while preserving text quality. It serves as a paraphrasing model that maintains semantic meaning while modifying the statistical patterns used for watermark detection.
12
+
13
+ ## Model Details
14
+
15
+ ### Model Description
16
+
17
+ This model is a fine-tuned version of Qwen2.5-0.5B-Instruct that has been optimized using Direct Preference Optimization (DPO) to evade the EXP watermarking method described in Aaronson and Kirchner (2023). The model preserves text quality while modifying the statistical patterns that watermarking methods rely on for detection.
18
+
19
+ - **Model type:** Decoder-only transformer language model
20
+ - **Language(s):** English
21
+ - **Finetuned from model:** Qwen2.5-0.5B-Instruct
22
+
23
+ ## Uses
24
+
25
+ ### Direct Use
26
+
27
+ The model is designed for research purposes to:
28
+ 1. Study the robustness of watermarking methods
29
+ 2. Evaluate the effectiveness of adaptive attacks against content watermarks
30
+ 3. Test and develop improved watermarking techniques
31
+
32
+ ### Downstream Use
33
+
34
+ The model can be integrated into:
35
+ - Watermark robustness evaluation pipelines
36
+ - Research frameworks studying language model security
37
+ - Benchmark suites for watermarking methods
38
+
39
+ ### Out-of-Scope Use
40
+
41
+ This model should not be used for:
42
+ - Production environments requiring watermark compliance
43
+ - Generating deceptive or misleading content
44
+ - Evading legitimate content attribution systems
45
+ - Any malicious purposes that could harm individuals or society
46
+
47
+ ## Bias, Risks, and Limitations
48
+
49
+ - The model inherits biases from the base Qwen2.5-0.5B-Instruct model
50
+ - Performance varies based on text length and complexity
51
+ - Evasion capabilities may be reduced against newer watermarking methods
52
+ - May occasionally produce lower quality outputs compared to the base model
53
+ - Limited to English language texts
54
+
55
+ ### Recommendations
56
+
57
+ - Use only for research and evaluation purposes
58
+ - Always maintain proper content attribution
59
+ - Monitor output quality metrics
60
+ - Consider ethical implications when studying security measures
61
+ - Use in conjunction with other evaluation methods
62
+
63
+ ## How to Get Started with the Model
64
+
65
+ ```python
66
+ from transformers import AutoModelForCausalLM, AutoTokenizer
67
+ from peft import PeftModel, PeftConfig
68
+
69
+ # Load the base model
70
+ model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
71
+ tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
72
+
73
+ # Load the LoRA adapter
74
+ config = PeftConfig.from_pretrained("path/to/adapter")
75
+ model = PeftModel.from_pretrained(model, "path/to/adapter")
76
+
77
+ # Prepare the prompt
78
+
79
+ system_prompt = (
80
+ "You are an expert copy-editor. Please rewrite the following text in your own voice and paraphrase all "
81
+ "sentences.\n Ensure that the final output contains the same information as the original text and has "
82
+ "roughly the same length.\n Do not leave out any important details when rewriting in your own voice. Do "
83
+ "not include any information that is not present in the original text. Do not respond with a greeting or "
84
+ "any other extraneous information. Skip the preamble. Just rewrite the text directly."
85
+ )
86
+
87
+ def paraphrase_text(text):
88
+ # Prepare prompt
89
+ prompt = tokenizer.apply_chat_template(
90
+ [
91
+ {"role": "system", "content": system_prompt},
92
+ {"role": "user", "content": f"\n[[START OF TEXT]]\n{text}\n[[END OF TEXT]]"},
93
+ ],
94
+ tokenize=False,
95
+ add_generation_prompt=True,
96
+ ) + "[[START OF PARAPHRASE]]\n"
97
+
98
+ # Generate paraphrase
99
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
100
+ outputs = model.generate(
101
+ **inputs,
102
+ max_new_tokens=512,
103
+ temperature=1.0,
104
+ do_sample=True,
105
+ pad_token_id=tokenizer.pad_token_id
106
+ )
107
+
108
+ # Post-process output
109
+ paraphrased = tokenizer.decode(outputs[0], skip_special_tokens=True)
110
+ paraphrased = paraphrased.split("[[START OF PARAPHRASE]]")[1].split("[[END OF")[0].strip()
111
+
112
+ return paraphrased
113
+ ```
114
+
115
+
116
+ ## Citation
117
+
118
+ **BibTeX:**
119
+ ```bibtex
120
+ @article{diaa2024optimizing,
121
+ title={Optimizing adaptive attacks against content watermarks for language models},
122
+ author={Diaa, Abdulrahman and Aremu, Toluwani and Lukas, Nils},
123
+ journal={arXiv preprint arXiv:2410.02440},
124
+ year={2024}
125
+ }
126
+ ```
127
+
128
+ ## Model Card Authors
129
+
130
+ - Abdulrahman Diaa
131
+ - Toluwani Aremu
132
+ - Nils Lukas
133
+
134
+ ## Model Card Contact
135
+
136
+ For questions about this model, please file an issue on the GitHub repository: https://github.com/ML-Watermarking/ada-llm-wm