Update README.md
Browse files
README.md
CHANGED
@@ -30,9 +30,9 @@ client = OpenAI(
|
|
30 |
)
|
31 |
|
32 |
paper_content="markdown"
|
33 |
-
selected_content="
|
34 |
|
35 |
-
prompt = "help me
|
36 |
|
37 |
content = f"""
|
38 |
Please improve the selected content based on the following. Act as an expert model for improving articles **PAPER_CONTENT**.\n
|
@@ -84,22 +84,14 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
84 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
85 |
|
86 |
paper_content="""
|
87 |
-
In this
|
88 |
-
First, to denoise the input code sequence in the original attention scores matrix, we classify the rows/cols by token types that are pre-defined by compilers,
|
89 |
-
and then retain tokens whose types have the highest proportion scores to derive a filtered attention matrix (see Figure 1(b)).
|
90 |
-
Meanwhile, inspired by the works (Wang et al., 2020; Zhu et al., 2022), we add edges to improve the connectivity of AST and calculate the distances between nodes corresponding to the selected tokens,
|
91 |
-
which generates a distance matrix as shown in Figure 1(c). After that, we define CAT-score to measure the matching degree between the filtered attention matrix and the distance matrix.
|
92 |
-
Specifically, the point-wise elements of the two matrices are matched if both the two conditions are satisfied:
|
93 |
-
1) the attention score is larger than a threshold; 2) the distance value is smaller than a threshold. If only one condition is reached, the elements are unmatched.
|
94 |
-
We calculate the CAT-score by the ratio of the number of matched elements to the summation of matched and unmatched elements.
|
95 |
-
Finally, the CAT-score is used to interpret how CodePTMs attend code structure, where a higher score indicates that the model has learned more structural information.
|
96 |
"""
|
97 |
|
98 |
selected_content="""
|
99 |
-
|
100 |
"""
|
101 |
prompt ="""
|
102 |
-
help me
|
103 |
"""
|
104 |
|
105 |
content = f"""
|
|
|
30 |
)
|
31 |
|
32 |
paper_content="markdown"
|
33 |
+
selected_content="Through a detailed analysis of reasoning requirements across evaluation tasks, we reveal a negative correlation between SFT performance gains and the proportion of reasoning-demanding samples—highlighting the limitations of SFT in such scenarios."
|
34 |
|
35 |
+
prompt = "help me modify based on the context."
|
36 |
|
37 |
content = f"""
|
38 |
Please improve the selected content based on the following. Act as an expert model for improving articles **PAPER_CONTENT**.\n
|
|
|
84 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
85 |
|
86 |
paper_content="""
|
87 |
+
The rise of Large Language Models (LLMs) as evaluators offers a scalable alternative to human annotation, yet existing Supervised Fine-Tuning (SFT) for judges approaches often fall short in domains requiring complex reasoning. In this work, we investigate whether LLM judges truly benefit from enhanced reasoning capabilities. Through a detailed analysis of reasoning requirements across evaluation tasks, we reveal a negative correlation between SFT performance gains and the proportion of reasoning-demanding samples—highlighting the limitations of SFT in such scenarios. To address this, we introduce \textbf{JudgeLRM}, a family of judgment-oriented LLMs trained using reinforcement learning (RL) with judge-wise, outcome-driven rewards. JudgeLRM models consistently outperform both SFT-tuned and state-of-the-art reasoning models. Notably, JudgeLRM-3B surpasses GPT-4, and JudgeLRM-7B outperforms DeepSeek-R1 by 2.79\% in F1 score, particularly excelling in judge tasks requiring deep reasoning.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
88 |
"""
|
89 |
|
90 |
selected_content="""
|
91 |
+
Through a detailed analysis of reasoning requirements across evaluation tasks, we reveal a negative correlation between SFT performance gains and the proportion of reasoning-demanding samples—highlighting the limitations of SFT in such scenarios.
|
92 |
"""
|
93 |
prompt ="""
|
94 |
+
help me modify based on the context.
|
95 |
"""
|
96 |
|
97 |
content = f"""
|