fm-universe
/

qwen2.5-coder-7b-instruct-fma

Safetensors

English

qwen2

Model card Files Files and versions Community

luyaojie commited on 29 days ago

Commit

162745d

verified ·

1 Parent(s): 420ec8e

Update README.md

Browse files

Files changed (1) hide show

README.md +86 -0

README.md CHANGED Viewed

@@ -40,6 +40,92 @@ We present a fine-tuned model for formal verification tasks. It is fine-tuned in
   <img width=60%" src="figures/data-prepare.png">
 </p>
 ## Citation
 ```

   <img width=60%" src="figures/data-prepare.png">
 </p>
+## Quickstart
+Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to inference fmbench.
+``` python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+instruct = """
+Translate the given requirement using TLA's syntax and semantics.
+You only need to return the TLA formal specification without explanation.
+"""
+input_text = """
+An operation `LM_Inner_Rsp(p)` that represents a response process for a given parameter `p`. It satisfies the following conditions:
+  - The control state `octl[p]` is equal to `\"done\"`.
+  - The `Reply(p, obuf[p], memInt, memInt')` operation is executed.
+  - The control state `octl` is updated by setting the `p` index of `octl` to `\"rdy\"`.
+  - The variables `omem` and `obuf` remain unchanged.
+"""
+model_name = "fm-universe/qwen2.5-coder-7b-instruct-fma"
+model = AutoModelForCausalLM.from_pretrained(
+    model_name, torch_dtype="auto", device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+messages = [{"role": "user", "content": f"{instruct}\n{input_text}"}]
+text = tokenizer.apply_chat_template(
+    messages, tokenize=False, add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(**model_inputs, max_new_tokens=4096)
+generated_ids = [
+    output_ids[len(input_ids) :]
+    for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+print(response)
+```
+## Example of Offline Inference
+vLLM supports offline inference.
+``` python
+from vllm import LLM, SamplingParams
+instruct = """
+Translate the given requirement using TLA's syntax and semantics.
+You only need to return the TLA formal specification without explanation.
+"""
+input_text = """
+An operation `LM_Inner_Rsp(p)` that represents a response process for a given parameter `p`. It satisfies the following conditions:
+  - The control state `octl[p]` is equal to `\"done\"`.
+  - The `Reply(p, obuf[p], memInt, memInt')` operation is executed.
+  - The control state `octl` is updated by setting the `p` index of `octl` to `\"rdy\"`.
+  - The variables `omem` and `obuf` remain unchanged.
+"""
+model_name = "fm-universe/qwen2.5-coder-7b-instruct-fma"
+# Pass the default decoding hyperparameters
+# max_tokens is for the maximum length for generation.
+greed_sampling = SamplingParams(temperature=0, max_tokens=4096)
+# load the model
+llm = LLM(
+    model=model_name,
+    tensor_parallel_size=1,
+    max_model_len=4096,
+    enable_chunked_prefill=True,
+    # quantization="fp8", # Enabling FP8 quantization for model weights can reduce memory usage.
+)
+# Prepare chat messages
+chat_message = [{"role": "user", "content": f"{instruct}\n{input_text}"}]
+# Inference
+responses = llm.chat(chat_message, greed_sampling, use_tqdm=True)
+print(responses[0].outputs[0].text)
+```
 ## Citation
 ```