luyaojie commited on
Commit
162745d
·
verified ·
1 Parent(s): 420ec8e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -0
README.md CHANGED
@@ -40,6 +40,92 @@ We present a fine-tuned model for formal verification tasks. It is fine-tuned in
40
  <img width=60%" src="figures/data-prepare.png">
41
  </p>
42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
  ## Citation
45
  ```
 
40
  <img width=60%" src="figures/data-prepare.png">
41
  </p>
42
 
43
+ ## Quickstart
44
+ Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to inference fmbench.
45
+
46
+ ``` python
47
+ from transformers import AutoModelForCausalLM, AutoTokenizer
48
+
49
+ instruct = """
50
+ Translate the given requirement using TLA's syntax and semantics.
51
+ You only need to return the TLA formal specification without explanation.
52
+ """
53
+
54
+ input_text = """
55
+ An operation `LM_Inner_Rsp(p)` that represents a response process for a given parameter `p`. It satisfies the following conditions:
56
+ - The control state `octl[p]` is equal to `\"done\"`.
57
+ - The `Reply(p, obuf[p], memInt, memInt')` operation is executed.
58
+ - The control state `octl` is updated by setting the `p` index of `octl` to `\"rdy\"`.
59
+ - The variables `omem` and `obuf` remain unchanged.
60
+ """
61
+
62
+ model_name = "fm-universe/qwen2.5-coder-7b-instruct-fma"
63
+
64
+ model = AutoModelForCausalLM.from_pretrained(
65
+ model_name, torch_dtype="auto", device_map="auto"
66
+ )
67
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
68
+
69
+ messages = [{"role": "user", "content": f"{instruct}\n{input_text}"}]
70
+
71
+ text = tokenizer.apply_chat_template(
72
+ messages, tokenize=False, add_generation_prompt=True
73
+ )
74
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
75
+
76
+ generated_ids = model.generate(**model_inputs, max_new_tokens=4096)
77
+ generated_ids = [
78
+ output_ids[len(input_ids) :]
79
+ for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
80
+ ]
81
+
82
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
83
+ print(response)
84
+ ```
85
+
86
+ ## Example of Offline Inference
87
+
88
+ vLLM supports offline inference.
89
+
90
+ ``` python
91
+ from vllm import LLM, SamplingParams
92
+
93
+ instruct = """
94
+ Translate the given requirement using TLA's syntax and semantics.
95
+ You only need to return the TLA formal specification without explanation.
96
+ """
97
+
98
+ input_text = """
99
+ An operation `LM_Inner_Rsp(p)` that represents a response process for a given parameter `p`. It satisfies the following conditions:
100
+ - The control state `octl[p]` is equal to `\"done\"`.
101
+ - The `Reply(p, obuf[p], memInt, memInt')` operation is executed.
102
+ - The control state `octl` is updated by setting the `p` index of `octl` to `\"rdy\"`.
103
+ - The variables `omem` and `obuf` remain unchanged.
104
+ """
105
+
106
+ model_name = "fm-universe/qwen2.5-coder-7b-instruct-fma"
107
+
108
+ # Pass the default decoding hyperparameters
109
+ # max_tokens is for the maximum length for generation.
110
+ greed_sampling = SamplingParams(temperature=0, max_tokens=4096)
111
+
112
+ # load the model
113
+ llm = LLM(
114
+ model=model_name,
115
+ tensor_parallel_size=1,
116
+ max_model_len=4096,
117
+ enable_chunked_prefill=True,
118
+ # quantization="fp8", # Enabling FP8 quantization for model weights can reduce memory usage.
119
+ )
120
+
121
+ # Prepare chat messages
122
+ chat_message = [{"role": "user", "content": f"{instruct}\n{input_text}"}]
123
+
124
+ # Inference
125
+ responses = llm.chat(chat_message, greed_sampling, use_tqdm=True)
126
+
127
+ print(responses[0].outputs[0].text)
128
+ ```
129
 
130
  ## Citation
131
  ```