Text Classification
Safetensors
English
Chinese
medical
jymcc commited on
Commit
b0f125a
·
verified ·
1 Parent(s): 41177d1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -3
README.md CHANGED
@@ -1,3 +1,72 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ base_model:
7
+ - Qwen/Qwen2.5-3B-Instruct
8
+ pipeline_tag: text-classification
9
+ tags:
10
+ - medical
11
+ datasets:
12
+ - FreedomIntelligence/medical-o1-verifiable-problem
13
+ ---
14
+ # <span>Introduction</span>
15
+
16
+ > This Verifier is built upon **Qwen2.5-3B-Instruct**, which supports answer verification in both English and Chinese.
17
+
18
+ This is a **medical verifier** designed to evaluate the correctness of LLM outputs on [medical verifiable problems](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-verifiable-problem). Such verification can be utilized to enhance the medical reasoning capabilities of LLMs.
19
+
20
+
21
+ For details, please refer to our [paper](https://arxiv.org/pdf/2412.18925) and [GitHub repository](https://github.com/FreedomIntelligence/HuatuoGPT-o1).
22
+ Additionally, you can explore [HuatuoGPT-o1](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-8B), our advanced medical LLM specializing in complex medical reasoning.
23
+
24
+
25
+ # <span>Usage</span>
26
+ Follow the code below to utilize this model:
27
+ ```python
28
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
29
+ import torch.nn.functional as F
30
+
31
+ # Load tokenizer and model
32
+ model_path = 'FreedomIntelligence/medical_o1_verifier_3B_Qwen2.5'
33
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
34
+ model = AutoModelForSequenceClassification.from_pretrained(
35
+ model_path, torch_dtype="auto", device_map="auto", attn_implementation="flash_attention_2", num_labels=2
36
+ )
37
+
38
+ # Evaluation template
39
+ template = """<Model Response>
40
+ {}
41
+ </Model Response>
42
+
43
+ <Reference Answer>
44
+ {}
45
+ </Reference Answer>
46
+
47
+ Your task is to evaluate the model response by comparing it to the reference answer. If the model response is correct and aligns with the reference answer, output "True" . If it is incorrect or fails to select the correct option (if options are provided), output "False" . {}"""
48
+
49
+ # Tokenize input and evaluate
50
+ LLM_response = 'The answer is 25 percentage'
51
+ ground_truth_answer = '25%'
52
+ input_batch = tokenizer([template.format(LLM_response,ground_truth_answer,tokenizer.eos_token)], return_tensors="pt").to(model.device)
53
+ logits = model(**input_batch,return_dict=True).logits
54
+ probabilities = F.softmax(logits, dim=-1)
55
+ result = "True" if probabilities[0, 1] > 0.5 else "False"
56
+
57
+ print(f"Evaluation Result: {result}")
58
+ ```
59
+
60
+
61
+ # <span>📖 Citation</span>
62
+ ```
63
+ @misc{chen2024huatuogpto1medicalcomplexreasoning,
64
+ title={HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs},
65
+ author={Junying Chen and Zhenyang Cai and Ke Ji and Xidong Wang and Wanlong Liu and Rongsheng Wang and Jianye Hou and Benyou Wang},
66
+ year={2024},
67
+ eprint={2412.18925},
68
+ archivePrefix={arXiv},
69
+ primaryClass={cs.CL},
70
+ url={https://arxiv.org/abs/2412.18925},
71
+ }
72
+ ```