nuojohnchen commited on
Commit
6ca9815
·
verified ·
1 Parent(s): 2bf602a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -125
README.md CHANGED
@@ -9,129 +9,4 @@ tags:
9
  - chat
10
  library_name: transformers
11
  ---
12
- # XtraGPT-7B
13
-
14
- ## Introduction
15
-
16
- XtraGPT is a series of LLMs for Human-AI Collaboration on Controllable Scientific Paper Refinement developed by NUS [Xtra Computing Group](https://github.com/Xtra-Computing).
17
-
18
- ## Requirements
19
-
20
- The code of XtraGPT has been in the latest Hugging face `transformers` and we advise you to use the latest version of `transformers`.
21
-
22
- ## Quickstart
23
-
24
- ```python
25
- from openai import OpenAI
26
- model_name = "Xtra-Computing/XtraGPT-7B"
27
- client = OpenAI(
28
- base_url="http://localhost:8088/v1",
29
- api_key="sk-1234567890"
30
- )
31
-
32
- paper_content="markdown"
33
- selected_content="Through a detailed analysis of reasoning requirements across evaluation tasks, we reveal a negative correlation between SFT performance gains and the proportion of reasoning-demanding samples—highlighting the limitations of SFT in such scenarios."
34
-
35
- prompt = "help me modify based on the context."
36
-
37
- content = f"""
38
- Please improve the selected content based on the following. Act as an expert model for improving articles **PAPER_CONTENT**.\n
39
- The output needs to answer the **QUESTION** on **SELECTED_CONTENT** in the input. Avoid adding unnecessary length, unrelated details, overclaims, or vague statements.
40
- Focus on clear, concise, and evidence-based improvements that align with the overall context of the paper.\n
41
- <PAPER_CONTENT> {paper_content} </PAPER_CONTENT>\n <SELECTED_CONTENT> {selected_content} </SELECTED_CONTENT>\n <QUESTION> {prompt} </QUESTION>\n
42
- """
43
-
44
- response = client.chat.completions.create(
45
- model="xtragpt",
46
- messages=[{"role": "user", "content": content}],
47
- temperature=0.7,
48
- max_tokens=16384
49
- )
50
- print(response.choices[0].message.content)
51
-
52
- ```
53
-
54
- ## Citation
55
-
56
- If you find our work helpful, feel free to give us a cite.
57
-
58
- ```
59
- @misc{xtracomputing2025xtraqa,
60
- title = {XtraQA},
61
- url = {https://huggingface.co/Xtra-Computing/XtraGPT-7B},
62
- author = {Xtra Computing Group},
63
- year = {2025}
64
- }
65
- @article{xtracomputing2025xtragpt,
66
- title={XtraGPT: LLMs for Human-AI Collaboration on Controllable Scientific Paper Refinement},
67
- author={Xtra Computing Group},
68
- journal={arXiv preprint arXiv:abcdefg},
69
- year={2025}
70
- }
71
- ```
72
-
73
-
74
-
75
-
76
- ```python
77
- from transformers import AutoModelForCausalLM, AutoTokenizer
78
- model_name = "Xtra-Computing/XtraGPT-7B"
79
- model = AutoModelForCausalLM.from_pretrained(
80
- model_name,
81
- torch_dtype="auto",
82
- device_map="auto"
83
- )
84
- tokenizer = AutoTokenizer.from_pretrained(model_name)
85
-
86
- paper_content="""
87
- The rise of Large Language Models (LLMs) as evaluators offers a scalable alternative to human annotation, yet existing Supervised Fine-Tuning (SFT) for judges approaches often fall short in domains requiring complex reasoning. In this work, we investigate whether LLM judges truly benefit from enhanced reasoning capabilities. Through a detailed analysis of reasoning requirements across evaluation tasks, we reveal a negative correlation between SFT performance gains and the proportion of reasoning-demanding samples—highlighting the limitations of SFT in such scenarios. To address this, we introduce \textbf{JudgeLRM}, a family of judgment-oriented LLMs trained using reinforcement learning (RL) with judge-wise, outcome-driven rewards. JudgeLRM models consistently outperform both SFT-tuned and state-of-the-art reasoning models. Notably, JudgeLRM-3B surpasses GPT-4, and JudgeLRM-7B outperforms DeepSeek-R1 by 2.79\% in F1 score, particularly excelling in judge tasks requiring deep reasoning.
88
- """
89
-
90
- selected_content="""
91
- Through a detailed analysis of reasoning requirements across evaluation tasks, we reveal a negative correlation between SFT performance gains and the proportion of reasoning-demanding samples—highlighting the limitations of SFT in such scenarios.
92
- """
93
- prompt ="""
94
- help me modify based on the context.
95
- """
96
-
97
- content = f"""
98
- Please improve the selected content based on the following. Act as an expert model for improving articles **PAPER_CONTENT**.\n
99
- The output needs to answer the **QUESTION** on **SELECTED_CONTENT** in the input. Avoid adding unnecessary length, unrelated details, overclaims, or vague statements.
100
- Focus on clear, concise, and evidence-based improvements that align with the overall context of the paper.\n
101
-
102
- <PAPER_CONTENT>
103
- {paper_content}
104
- </PAPER_CONTENT>\n
105
-
106
- <SELECTED_CONTENT>
107
- {selected_content}
108
- </SELECTED_CONTENT>\n
109
-
110
- <QUESTION>
111
- {prompt}
112
- </QUESTION>\n
113
- """
114
-
115
- messages = [
116
- {"role": "user", "content": content}
117
- ]
118
-
119
- text = tokenizer.apply_chat_template(
120
- messages,
121
- tokenize=False,
122
- add_generation_prompt=True
123
- )
124
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
125
- generated_ids = model.generate(
126
- **model_inputs,
127
- max_new_tokens=512
128
- )
129
- generated_ids = [
130
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
131
- ]
132
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
133
- ```
134
-
135
-
136
-
137
 
 
9
  - chat
10
  library_name: transformers
11
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12