Text Generation
Transformers
Safetensors
llama
conversational
text-generation-inference
MinjunZhu commited on
Commit
31f630b
·
verified ·
1 Parent(s): dce70e5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +556 -5
README.md CHANGED
@@ -1,5 +1,556 @@
1
- ---
2
- license: other
3
- license_name: whizreviewer-llama-3.1-license
4
- license_link: LICENSE
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: whizreviewer-llama-3.1-license
4
+ license_link: LICENSE
5
+ language:
6
+ - en
7
+ - zh
8
+ - ja
9
+ - ko
10
+ - fr
11
+ - de
12
+ metrics:
13
+ - accuracy
14
+ ---
15
+
16
+
17
+ # WhizReviewer-ML-Pro-123B
18
+
19
+
20
+
21
+ #### Model Info
22
+
23
+ The WhizReviewer is a set of generative large language models that have undergone additional supervised training, with sizes of 8B, 70B, and 123B respectively. All models are pure text language models, with the 8B and 70B derived from the Llama3.1 pre-trained language model, and the 123B from the Mistral-Large-2 model. They all use the Transformer architecture.
24
+
25
+ All models have undergone extensive supervised training on a dataset of paper-review comments in the field of **machine learning (including CV, NLP, MM)**, aimed at providing expert-level review comments. According to our license, **all models created/trained/distributed/replicated based on these cannot be used for any formal review work**. We also provide code based on [FastDetectGPT](https://github.com/baoguangsheng/fast-detect-gpt) to detect misuse of this series of models in formal settings.
26
+
27
+ ![info](./image/info.png)
28
+
29
+ WhizReviewer-ML is an LLM capable of automatically evaluating the quality of a paper based on given paper content. It can provide a near-human level paper review opinion and evaluation score. Specifically, WhizReviewer-ML will generate simulations of multiple members in a paper program committee, including a group of Reviewers (we recommend 4) and a Meta-Reviewer to provide expert-level opinions. Please note that WhizReviewer-ML is trained to generate ICLR or NeurIPS level review comments, so the Meta Reviewer it generates may require relatively high quality to generate an "Accept".
30
+
31
+ The main purposes of the WhizReviewer-ML series models are the following two:
32
+
33
+ - To promote iterative self-improvement in human scientific research. Given the long review cycle for papers, WhizReviewer-ML can enable rapid iteration and refinement of papers.
34
+ - To promote Auto-Research. This model can serve as a Reward Model to assist in the Research capabilities of artificial intelligence models.
35
+
36
+ **Model Release Date** August 13, 2024
37
+
38
+ **Model Knowledge Cutoff Date** January 2024
39
+
40
+ #### Model Specifications
41
+
42
+ | Model Name | Pre-training Language Model | HF Link | MS Link |
43
+ | :-------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :--------: |
44
+ | WhizReviewer-ML-Llama3.1-8B | [Llama3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) | [🤗 link](https://huggingface.co/WestlakeNLP/WhizReviewer-ML-Llama3.1-8B) | [🤖 TODO]() |
45
+ | WhizReviewer-ML-Llama3.1-70B | [Llama3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) | [🤗 link](https://huggingface.co/WestlakeNLP/WhizReviewer-ML-Llama3.1-70B) | [🤖 TODO]() |
46
+ | WhizReviewer-ML-Pro-123B | [Mistral-Large-2](https://huggingface.co/mistralai/Mistral-Large-Instruct-2407) | [🤗 link](https://huggingface.co/WestlakeNLP/WhizReviewer-ML-Pro-123B) | [🤖 TODO]() |
47
+ | WhizReviewer-Science-Llama3.1-8B | [Llama3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) | [🤗 TODO]() | [🤖 TODO]() |
48
+ | WhizReviewer-Science-Llama3.1-70B | [Llama3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) | [🤗 TODO]() | [🤖 TODO]() |
49
+ | WhizReviewer-Science-Pro-123B | [Mistral-Large-2](https://huggingface.co/mistralai/Mistral-Large-Instruct-2407) | [🤗 TODO]() | [🤖 TODO]() |
50
+
51
+ #### Open Source License
52
+
53
+ The code in this repository is open-sourced under the Apache-2.0 license. The model weights are open-sourced under the WhizReviewer License, which introduces additional content based on the **Mistral Research License** to ensure the model is not misused.
54
+
55
+ #### Intended Uses
56
+
57
+ **Expected Use Cases** The WhizReviewer series models are suitable for research purposes in multiple languages. This includes but is not limited to the following objectives:
58
+
59
+ 1. Paper Improvement: Assist in enhancing the quality and clarity of academic papers.
60
+ 2. Writing Practice: Provide a platform for users to practice and refine their academic writing skills.
61
+ 3. Self-assessment Tool: Enable researchers to evaluate their own work before submission.
62
+ 4. Learning Aid: Support students and researchers in understanding the peer review process.
63
+ 5. Feedback Simulation: Offer simulated peer review feedback to prepare authors for actual reviews.
64
+ 6. Revision Guide: Provide structured guidance for revising academic papers.
65
+ 7. Concept Validator: Help researchers validate their ideas and hypotheses.
66
+ 8. Reward Model: Serve as a component in machine learning systems for academic writing improvement.
67
+ 9. Educational Resource: Act as a teaching tool for academic writing and peer review processes.
68
+ 10. Research Assistant: Aid in literature reviews and research methodology refinement.
69
+ 11. Supplementary Tool: Complement human review in informal, non-official settings.
70
+
71
+ **Out of Scope** We do not allow this model to be misused to influence the academic environment. In addition to what is not allowed under the Llama License and Mistral License, the following are also not permitted by us:
72
+
73
+ 1. Official Reviews: The WhizReviewer-ML explicitly prohibits use for official peer reviews in any capacity.
74
+ 2. Legal or Ethical Decisions: Not designed to make judgments on research ethics or legal compliance.
75
+ 3. Factual Verification: While it can offer feedback, it should not be the sole source for fact-checking or verifying scientific claims.
76
+ 4. Plagiarism Detection: Not equipped to serve as a plagiarism detection tool.
77
+ 5. Publication Decisions: Cannot be used to make final decisions on whether a paper should be published.
78
+ 6. Expert Consultation: Not a replacement for expert consultation in specialized fields.
79
+
80
+ **If you are unsure whether you meet our License requirements, please send your relevant application to [email protected] for further inquiry**
81
+
82
+
83
+
84
+ #### Model Performance
85
+
86
+ We used 784 papers and their review comments from ICLR 2024 as test data, which were not included in the training dataset.
87
+
88
+
89
+
90
+ | Metric | [WhizReviewer-ML-Llama3.1-8B](https://huggingface.co/WestlakeNLP/WhizReviewer-ML-Llama3.1-8B) | [WhizReviewer-ML-Llama3.1-70B](https://huggingface.co/WestlakeNLP/WhizReviewer-ML-Llama3.1-70B) | [WhizReviewer-ML-Pro-123B](https://huggingface.co/WestlakeNLP/WhizReviewer-ML-Pro-123B) |
91
+ | ----------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
92
+ | Decisions (Accept/Reject) Acc | 59.41% | 61.58% | **73.69%** |
93
+ | Score Avg Abs | 1.24 | 1.28 | **1.05** |
94
+ | Score Min Abs | 1.31 | **1.18** | 1.48 |
95
+ | Score Max Abs | 1.73 | 1.71 | **1.00** |
96
+ | Score Perfect Match | **3.23%** | 1.47% | 3.06% |
97
+ | Score Avg Acc | 7.93% | 6.83% | **10.09%** |
98
+ | Score Min Acc | 36.96% | **42.70%** | 30.27% |
99
+ | Score Max Acc | 24.73% | 23.69% | **49.73%** |
100
+
101
+ #### How to use
102
+
103
+ The models included in this repository can be used with the `transformers` or `vllm` code libraries.
104
+
105
+ To generate Review comments, we need a long context (**14000 tokens for Input and 5000 tokens for Output**), please ensure you have enough GPU memory. Here are our recommended configurations:
106
+
107
+ | Model Name | Recommended Config (bs>=5) | Minimum Config (bs=1) |
108
+ | :--------------------------: | :------------------------: | :-------------------------------------: |
109
+ | WhizReviewer-ML-Llama3.1-8B | 2 x A100/H100 (bf16) | 1 x A100/H100 (int8) / 1 x A6000 (int4) |
110
+ | WhizReviewer-ML-Llama3.1-70B | 8 x A100/H100 (bf16) | 4 x A100/H100 (bf16) |
111
+ | WhizReviewer-ML-Pro-123B | 8 x A100/H100 (bf16) | 4 x A100/H100 (bf16) |
112
+
113
+ ##### Getting Your Paper Text
114
+
115
+ If you can provide the original Latex version or Markdown version of your paper, that would be ideal, and you can skip this step.
116
+
117
+ If you only have the PDF version of the paper, you need to convert it to Markdown or Latex format first. We recommend using one of the following two methods for conversion:
118
+
119
+ **Online** You don't need to download any models, just register and get free tokens from [doc2x](https://doc2x.noedgeai.com/?inviteCode=WE5L94), then make sure your `pdfdeal` is the latest version: `pip install --upgrade pdfdeal`
120
+
121
+ ```python
122
+ from pdfdeal import Doc2X
123
+ from pdfdeal import get_files
124
+ client = Doc2X(apikey='xxx') # apikey from doc2x
125
+ file_list, rename = get_files(path=r"path/PDF", mode="pdf", out="md")
126
+ success, failed, flag = client.pdfdeal(
127
+ pdf_file=file_list,
128
+ output_path=r"OutputPath/PDF",
129
+ output_format='md',
130
+ output_names=rename,
131
+ )
132
+ print(success)
133
+ print(failed)
134
+ print(flag)
135
+ ```
136
+
137
+ At this point, you will be able to view the markdown format of the paper.
138
+
139
+ **Offline** If you need to run locally, we recommend using [MagicPDF](https://github.com/magicpdf/Magic-Doc). First, please follow the relevant guide to install it, then you will be able to use the code below to convert PDF paper files to markdown format:
140
+
141
+ ```python
142
+ from magic_doc.docconv import DocConverter, S3Config
143
+ converter = DocConverter(s3_config=None)
144
+ markdown_cotent, time_cost = converter.convert("path/PDF", conv_timeout=300)
145
+ ```
146
+
147
+ ##### Using with transformers
148
+
149
+ Starting from `transformers >= 4.44.0`, first make sure your `transformers` is updated: `pip install -U transformers`
150
+
151
+ ```python
152
+ import transformers
153
+ import torch
154
+ import re
155
+
156
+ def process_text(text, skip_appendix=True):
157
+ pattern = re.compile(r"Under review as a conference paper at ICLR 2024", re.IGNORECASE)
158
+ text = pattern.sub("", text)
159
+
160
+ pattern = re.compile(r"Published as a conference paper at ICLR 2024", re.IGNORECASE)
161
+ text = pattern.sub("", text)
162
+
163
+ if skip_appendix:
164
+ match = re.search(r"REFERENCES", text, re.IGNORECASE)
165
+
166
+ if match:
167
+ # Truncate the text at "REFERENCES"
168
+ text = text[:match.start()]
169
+
170
+ match = re.search(r"ABSTRACT", text, re.IGNORECASE)
171
+
172
+ if match:
173
+ text = text[match.start():]
174
+
175
+ return text.strip()
176
+
177
+ model_id = "WestlakeNLP/WhizReviewer-ML-Llama-3.1-8B"
178
+
179
+ pipeline = transformers.pipeline(
180
+ "text-generation",
181
+ model=model_id,
182
+ model_kwargs={"torch_dtype": torch.bfloat16},
183
+ device_map="auto",
184
+ )
185
+
186
+ system_prompt = \
187
+ """You are an expert academic reviewer tasked with providing a thorough and balanced evaluation of research papers. For each paper submitted, conduct a comprehensive review addressing the following aspects:
188
+
189
+ 1. Summary: Briefly outline main points and objectives.
190
+ 2. Soundness: Assess methodology and logical consistency.
191
+ 3. Presentation: Evaluate clarity, organization, and visual aids.
192
+ 4. Contribution: Analyze significance and novelty in the field.
193
+ 5. Strengths: Identify the paper's strongest aspects.
194
+ 6. Weaknesses: Point out areas for improvement.
195
+ 7. Questions: Pose questions for the authors.
196
+ 8. Rating: Score 1-10, justify your rating.
197
+ 9. Meta Review: Provide overall assessment and recommendation (Accept/Reject).
198
+
199
+ Maintain objectivity and provide specific examples from the paper to support your evaluation.
200
+
201
+ You need to fill out **4** review opinions."""
202
+
203
+
204
+ markdown_context = "xxxxxxx" # Your paper's context
205
+ markdown_context = process_text(markdown_context, skip_appendix=True) # We suggest to skip appendix.
206
+
207
+ messages = [
208
+ {"role": "system", "content": system_prompt},
209
+ {"role": "user", "content": markdown_context},
210
+ ]
211
+
212
+ outputs = pipeline(
213
+ messages,
214
+ max_new_tokens=4096,
215
+ )
216
+ print(outputs[0]["generated_text"][-1])
217
+ ```
218
+
219
+ ##### Using with vllm
220
+
221
+ Compared to `transformers`, we more strongly recommend using `vllm` for fast text generation. Usually, it can complete generation within 2 minutes: `pip install -U vllm`.
222
+
223
+ ```python
224
+ from vllm import LLM, SamplingParams
225
+ import torch
226
+ import re
227
+
228
+ def process_text(text, skip_appendix=True):
229
+ pattern = re.compile(r"Under review as a conference paper at ICLR 2024", re.IGNORECASE)
230
+ text = pattern.sub("", text)
231
+
232
+ pattern = re.compile(r"Published as a conference paper at ICLR 2024", re.IGNORECASE)
233
+ text = pattern.sub("", text)
234
+
235
+ if skip_appendix:
236
+ match = re.search(r"REFERENCES", text, re.IGNORECASE)
237
+
238
+ if match:
239
+ # Truncate the text at "REFERENCES"
240
+ text = text[:match.start()]
241
+
242
+ match = re.search(r"ABSTRACT", text, re.IGNORECASE)
243
+
244
+ if match:
245
+ text = text[match.start():]
246
+
247
+ return text.strip()
248
+
249
+ model_id = "WestlakeNLP/WhizReviewer-ML-Llama-3.1-8B"
250
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
251
+ llm = LLM(
252
+ model=model_name,
253
+ tensor_parallel_size=8,
254
+ max_model_len=16000,
255
+ gpu_memory_utilization=0.95,
256
+ )
257
+
258
+ system_prompt = \
259
+ """You are an expert academic reviewer tasked with providing a thorough and balanced evaluation of research papers. For each paper submitted, conduct a comprehensive review addressing the following aspects:
260
+
261
+ 1. Summary: Briefly outline main points and objectives.
262
+ 2. Soundness: Assess methodology and logical consistency.
263
+ 3. Presentation: Evaluate clarity, organization, and visual aids.
264
+ 4. Contribution: Analyze significance and novelty in the field.
265
+ 5. Strengths: Identify the paper's strongest aspects.
266
+ 6. Weaknesses: Point out areas for improvement.
267
+ 7. Questions: Pose questions for the authors.
268
+ 8. Rating: Score 1-10, justify your rating.
269
+ 9. Meta Review: Provide overall assessment and recommendation (Accept/Reject).
270
+
271
+ Maintain objectivity and provide specific examples from the paper to support your evaluation.
272
+
273
+ You need to fill out **4** review opinions."""
274
+
275
+
276
+ markdown_context = "xxxxxxx" # Your paper's context
277
+ markdown_context = process_text(markdown_context, skip_appendix=True) # We suggest to skip appendix.
278
+
279
+ sampling_params = SamplingParams(temperature=0.4, top_p=0.95, max_tokens=4000)
280
+
281
+ messages = [
282
+ {"role": "system", "content": system_prompt},
283
+ {"role": "user", "content": markdown_context},
284
+ ]
285
+
286
+ input_ids = tokenizer.apply_chat_template(messages+[{'role':'assistant','content':'\n\n## Reviewer\n'}], tokenize=False,add_generation_prompt=True)[:-4]
287
+ outputs = llm.generate([input_ids], sampling_params)
288
+ ```
289
+
290
+ For more usage methods, please refer to the [vLLM](https://docs.vllm.ai/en/latest/) documentation.
291
+
292
+ #### Harmlessness and Safety
293
+
294
+ The fine-tuning of language models can compromise their harmlessness, which leads to the possibility of them being used for illegal purposes. We value the harmlessness settings of language models and hope that the WhizReviewer model can ensure safe deployment for anyone. Therefore, before the model's release, we have added extra safety restrictions to the weights through the SafetyLock method. SafetyLock can mitigate the inherent safety risks of the model while balancing practicality and safety.
295
+
296
+
297
+
298
+ Here is the translation of the remaining content:
299
+
300
+ #### Ethical Considerations
301
+
302
+ Academic Integrity: Although the WhizReviewer model is designed to assist researchers in improving paper quality, it should not be used to replace the real peer review process. We strongly recommend users to use this tool only as an auxiliary means for self-improvement and learning.
303
+
304
+ Fairness: The model may have biases, especially when evaluating interdisciplinary or emerging field research. The current model is only suitable for the machine learning field. Users should be aware of this and be cautious about the model's feedback.
305
+
306
+ Responsible Use: We call on users to use this model responsibly, and require users not to use it to produce false review opinions or manipulate the academic evaluation process according to our agreement.
307
+
308
+ Transparency: When using content generated by this model in any public setting, the WhizReviewer source should be clearly stated to maintain transparency and honesty in academia.
309
+
310
+ #### Limitations
311
+
312
+ Knowledge Cutoff Date: The model's knowledge is cut off in January 2024, so it may lack understanding of new technologies, methods, or research trends that emerged after this date. This may lead to undervaluation of some highly innovative research.
313
+
314
+ Pure Text Limitations: As a pure text model, WhizReviewer-ML-Llama-3.1-8B cannot directly parse or evaluate images, charts, or complex formulas in papers. This may affect the comprehensive assessment of papers that heavily rely on visual elements.
315
+
316
+ Depth in Specialized Fields: Although the model has been specially trained in the field of machine learning, its evaluation may not be as accurate as human experts in the field for very specialized or cutting-edge sub-fields.
317
+
318
+ Lack of Real-time Information: The model cannot access real-time academic databases or the latest published papers, which may lead to bias in assessing research novelty.
319
+
320
+ Disciplinary Bias: Due to limitations in training data, the model may have preferences for certain disciplines or research methods. Users should be aware of this and combine it with other opinions.
321
+
322
+ Language and Cultural Limitations: The model may perform poorly in handling non-English papers or cross-cultural research, requiring users to be extra cautious in these cases.
323
+
324
+ Scoring Consistency: The model's scoring may have some inconsistencies, especially when dealing with borderline cases or interdisciplinary research.
325
+
326
+ #### Detecting Misuse of WhizReviewer-ML
327
+
328
+ We use Fast-Detect-GPT to avoid misuse of WhizReviewer. The table below shows the detection performance of Fast-Detect-GPT, which can to some extent prevent WhizReviewer-ML from being used in unauthorized places.
329
+
330
+ | Model | Detect Acc |
331
+ | ---------------------------- | ---------- |
332
+ | WhizReviewer-ML-Llama3.1-8B | 98.43 |
333
+ | WhizReviewer-ML-Llama3.1-70B | 99.47 |
334
+ | WhizReviewer-ML-Pro-123B | 95.14 |
335
+
336
+ We mixed three hundred review comment samples from ICLR2024 and generated samples from WhizReviewer-ML as the evaluated dataset, with Llama-3.1-8B as the reference model. Detect Acc indicates the accuracy of being correctly detected by Fast-Detect-GPT.
337
+
338
+ #### Case Study
339
+
340
+ We take a (Poster Paper) from ICLR 2024 titled "Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks" as an example to showcase WhizReviewer-ML's review suggestions. Please refer to [Openreview](https://openreview.net/forum?id=9nsNyN0vox) for the original reviews. The human review scores were *[5,6,6,8]*. The following content is from the output of WhizReviewer-ML-Pro-123B:
341
+
342
+
343
+
344
+ ```
345
+ ## Reviewer
346
+
347
+ ### Summary
348
+
349
+ This paper proposes a framework called Neural Comprehension to integrate compiled neural networks (CoNNs) into the standard transformer architecture. CoNNs are designed to explicitly encode rules through artificially generated attention weights. The authors demonstrate the effectiveness of their approach in various symbolic tasks and arithmetic reasoning tasks.
350
+
351
+ ### Soundness
352
+
353
+ 2 fair
354
+
355
+ ### Presentation
356
+
357
+ 2 fair
358
+
359
+ ### Contribution
360
+
361
+ 2 fair
362
+
363
+ ### Strengths
364
+
365
+ - The paper is well-written and easy to follow.
366
+ - The proposed method is simple and easy to understand.
367
+ - The method is general and can be applied to LLMs of different scales.
368
+
369
+ ### Weaknesses
370
+
371
+ - The method seems to be a straightforward combination of CoNNs and LLMs, which makes me question the novelty of the proposed method.
372
+ - The experimental results are not strong enough to support the claim that the proposed method can improve the performance of LLMs on symbolic tasks.
373
+ - The paper lacks a thorough analysis of the limitations of the proposed method.
374
+
375
+ ### Questions
376
+
377
+ - The method seems to be a straightforward combination of CoNNs and LLMs, which makes me question the novelty of the proposed method. Can the authors provide more insight into the novelty of the proposed method?
378
+ - The experimental results are not strong enough to support the claim that the proposed method can improve the performance of LLMs on symbolic tasks. For example, in Figure 4, the performance of Neural Comprehension is not significantly better than that of GPT-3.5 in most cases. Can the authors provide more convincing evidence to support their claim?
379
+ - The paper lacks a thorough analysis of the limitations of the proposed method. For example, the proposed method may not be suitable for tasks that require more complex reasoning, such as logical reasoning. Can the authors discuss the limitations of the proposed method and provide more insights into its potential applications?
380
+
381
+ ### Flag For Ethics Review
382
+
383
+ No ethics review needed.
384
+
385
+ ### Rating
386
+
387
+ 3: reject, not good enough
388
+
389
+ ### Confidence
390
+
391
+ 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.
392
+
393
+ **********
394
+
395
+ ## Reviewer
396
+
397
+ ### Summary
398
+
399
+ This paper proposes a new method called Neural Comprehension, which integrates compiled neural networks (CoNNs) into the standard transformer architecture to improve the performance of language models on symbolic tasks. CoNNs are designed to explicitly encode rules through artificially generated attention weights. The authors demonstrate the effectiveness of their approach in various symbolic tasks and arithmetic reasoning tasks.
400
+
401
+ ### Soundness
402
+
403
+ 2 fair
404
+
405
+ ### Presentation
406
+
407
+ 2 fair
408
+
409
+ ### Contribution
410
+
411
+ 2 fair
412
+
413
+ ### Strengths
414
+
415
+ 1. The paper is well-written and easy to follow.
416
+ 2. The proposed method is simple and easy to understand.
417
+ 3. The method is general and can be applied to LLMs of different scales.
418
+
419
+ ### Weaknesses
420
+
421
+ 1. The paper lacks a thorough analysis of the limitations of the proposed method. For example, the proposed method may not be suitable for tasks that require more complex reasoning, such as logical reasoning. Can the authors discuss the limitations of the proposed method and provide more insights into its potential applications?
422
+ 2. The experimental results are not strong enough to support the claim that the proposed method can improve the performance of LLMs on symbolic tasks. For example, in Figure 4, the performance of Neural Comprehension is not significantly better than that of GPT-3.5 in most cases. Can the authors provide more convincing evidence to support their claim?
423
+ 3. The paper lacks a thorough analysis of the limitations of the proposed method. For example, the proposed method may not be suitable for tasks that require more complex reasoning, such as logical reasoning. Can the authors discuss the limitations of the proposed method and provide more insights into its potential applications?
424
+
425
+ ### Questions
426
+
427
+ 1. The paper lacks a thorough analysis of the limitations of the proposed method. For example, the proposed method may not be suitable for tasks that require more complex reasoning, such as logical reasoning. Can the authors discuss the limitations of the proposed method and provide more insights into its potential applications?
428
+ 2. The experimental results are not strong enough to support the claim that the proposed method can improve the performance of LLMs on symbolic tasks. For example, in Figure 4, the performance of Neural Comprehension is not significantly better than that of GPT-3.5 in most cases. Can the authors provide more convincing evidence to support their claim?
429
+
430
+ ### Flag For Ethics Review
431
+
432
+ No ethics review needed.
433
+
434
+ ### Rating
435
+
436
+ 3: reject, not good enough
437
+
438
+ ### Confidence
439
+
440
+ 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.
441
+
442
+ **********
443
+
444
+ ## Reviewer
445
+
446
+ ### Summary
447
+
448
+ This paper proposes a method called Neural Comprehension to integrate compiled neural networks (CoNNs) into the standard transformer architecture to improve the performance of language models on symbolic tasks. CoNNs are designed to explicitly encode rules through artificially generated attention weights. The authors demonstrate the effectiveness of their approach in various symbolic tasks and arithmetic reasoning tasks.
449
+
450
+ ### Soundness
451
+
452
+ 2 fair
453
+
454
+ ### Presentation
455
+
456
+ 2 fair
457
+
458
+ ### Contribution
459
+
460
+ 2 fair
461
+
462
+ ### Strengths
463
+
464
+ 1. The paper is well-written and easy to follow.
465
+ 2. The proposed method is simple and easy to understand.
466
+ 3. The method is general and can be applied to LLMs of different scales.
467
+
468
+ ### Weaknesses
469
+
470
+ 1. The paper lacks a thorough analysis of the limitations of the proposed method. For example, the proposed method may not be suitable for tasks that require more complex reasoning, such as logical reasoning. Can the authors discuss the limitations of the proposed method and provide more insights into its potential applications?
471
+ 2. The experimental results are not strong enough to support the claim that the proposed method can improve the performance of LLMs on symbolic tasks. For example, in Figure 4, the performance of Neural Comprehension is not significantly better than that of GPT-3.5 in most cases. Can the authors provide more convincing evidence to support their claim?
472
+ 3. The paper lacks a thorough analysis of the limitations of the proposed method. For example, the proposed method may not be suitable for tasks that require more complex reasoning, such as logical reasoning. Can the authors discuss the limitations of the proposed method and provide more insights into its potential applications?
473
+
474
+ ### Questions
475
+
476
+ 1. The paper lacks a thorough analysis of the limitations of the proposed method. For example, the proposed method may not be suitable for tasks that require more complex reasoning, such as logical reasoning. Can the authors discuss the limitations of the proposed method and provide more insights into its potential applications?
477
+ 2. The experimental results are not strong enough to support the claim that the proposed method can improve the performance of LLMs on symbolic tasks. For example, in Figure 4, the performance of Neural Comprehension is not significantly better than that of GPT-3.5 in most cases. Can the authors provide more convincing evidence to support their claim?
478
+
479
+ ### Flag For Ethics Review
480
+
481
+ No ethics review needed.
482
+
483
+ ### Rating
484
+
485
+ 3: reject, not good enough
486
+
487
+ ### Confidence
488
+
489
+ 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.
490
+
491
+ **********
492
+
493
+ ## Reviewer
494
+
495
+ ### Summary
496
+
497
+ The paper proposes a framework that integrates compiled neural networks (CoNNs) into the standard transformer architecture to improve the performance of language models on symbolic tasks. The authors demonstrate the effectiveness of their approach in various symbolic tasks and arithmetic reasoning tasks.
498
+
499
+ ### Soundness
500
+
501
+ 2 fair
502
+
503
+ ### Presentation
504
+
505
+ 2 fair
506
+
507
+ ### Contribution
508
+
509
+ 2 fair
510
+
511
+ ### Strengths
512
+
513
+ The paper is well-written and easy to follow.
514
+
515
+ ### Weaknesses
516
+
517
+ The paper lacks a thorough analysis of the limitations of the proposed method. For example, the proposed method may not be suitable for tasks that require more complex reasoning, such as logical reasoning. Can the authors discuss the limitations of the proposed method and provide more insights into its potential applications?
518
+
519
+ ### Questions
520
+
521
+ The paper lacks a thorough analysis of the limitations of the proposed method. For example, the proposed method may not be suitable for tasks that require more complex reasoning, such as logical reasoning. Can the authors discuss the limitations of the proposed method and provide more insights into its potential applications?
522
+
523
+ ### Flag For Ethics Review
524
+
525
+ No ethics review needed.
526
+
527
+ ### Rating
528
+
529
+ 3: reject, not good enough
530
+
531
+ ### Confidence
532
+
533
+ 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.
534
+
535
+ **********
536
+
537
+ ## Meta Review
538
+
539
+ This paper proposes a framework called Neural Comprehension to integrate compiled neural networks (CoNNs) into the standard transformer architecture to improve the performance of language models on symbolic tasks. CoNNs are designed to explicitly encode rules through artificially generated attention weights. The authors demonstrate the effectiveness of their approach in various symbolic tasks and arithmetic reasoning tasks.
540
+
541
+ The reviewers raised several concerns about the novelty of the proposed method, the experimental results, and the analysis of the limitations of the proposed method. The authors did not provide any rebuttal.
542
+
543
+ ### justification_for_why_not_higher_score
544
+
545
+ The reviewers raised several concerns about the novelty of the proposed method, the experimental results, and the analysis of the limitations of the proposed method. The authors did not provide any rebuttal.
546
+
547
+ ### justification_for_why_not_lower_score
548
+
549
+ N/A
550
+
551
+ **********
552
+
553
+ ## Paper Decision
554
+
555
+ Reject (not good enough)
556
+ ```