WestlakeNLP
/

WhizReviewer-ML-Llama3.1-70B

@@ -73,34 +73,8 @@ The main purposes of the WhizReviewer-ML series models are the following two:
 #### Open Source License
-The code in this repository is open-sourced under the Apache-2.0 license. The model weights are open-sourced under the WhizReviewer License, which introduces additional content based on the **Mistral Research License** to ensure the model is not misused.
-#### Intended Uses
-**Expected Use Cases** The WhizReviewer series models are suitable for research purposes in multiple languages. This includes but is not limited to the following objectives:
-1. Paper Improvement: Assist in enhancing the quality and clarity of academic papers.
-2. Writing Practice: Provide a platform for users to practice and refine their academic writing skills.
-3. Self-assessment Tool: Enable researchers to evaluate their own work before submission.
-4. Learning Aid: Support students and researchers in understanding the peer review process.
-5. Feedback Simulation: Offer simulated peer review feedback to prepare authors for actual reviews.
-6. Revision Guide: Provide structured guidance for revising academic papers.
-7. Concept Validator: Help researchers validate their ideas and hypotheses.
-8. Reward Model: Serve as a component in machine learning systems for academic writing improvement.
-9. Educational Resource: Act as a teaching tool for academic writing and peer review processes.
-10. Research Assistant: Aid in literature reviews and research methodology refinement.
-11. Supplementary Tool: Complement human review in informal, non-official settings.
-**Out of Scope** We do not allow this model to be misused to influence the academic environment. In addition to what is not allowed under the Llama License and Mistral License, the following are also not permitted by us:
-1. Official Reviews: The WhizReviewer-ML explicitly prohibits use for official peer reviews in any capacity.
-2. Legal or Ethical Decisions: Not designed to make judgments on research ethics or legal compliance.
-3. Factual Verification: While it can offer feedback, it should not be the sole source for fact-checking or verifying scientific claims.
-4. Plagiarism Detection: Not equipped to serve as a plagiarism detection tool.
-5. Publication Decisions: Cannot be used to make final decisions on whether a paper should be published.
-6. Expert Consultation: Not a replacement for expert consultation in specialized fields.
-**If you are unsure whether you meet our License requirements, please send your relevant application to [email protected] for further inquiry**
@@ -121,6 +95,9 @@ We used 784 papers and their review comments from ICLR 2024 as test data, which
 | Score Min Acc                 | 36.96%                                                       | **42.70%**                                                   | 31.77%                                                       |
 | Score Max Acc                 | 24.73%                                                       | 23.69%                                                       | **49.09%**                                                   |
 #### How to use
 The models included in this repository can be used with the `transformers` or `vllm` code libraries.
@@ -356,6 +333,34 @@ We use Fast-Detect-GPT to avoid misuse of WhizReviewer. The table below shows th
 We mixed three hundred review comment samples from ICLR2024 and generated samples from WhizReviewer-ML as the evaluated dataset, with Llama-3.1-8B as the reference model. Detect Acc indicates the accuracy of being correctly detected by Fast-Detect-GPT.
 #### Case Study
 We take a (Poster Paper) from ICLR 2024 titled "Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks" as an example to showcase WhizReviewer-ML's review suggestions. Please refer to [Openreview](https://openreview.net/forum?id=9nsNyN0vox) for the original reviews. The human review scores were *[5,6,6,8]*. The following content is from the output of WhizReviewer-ML-Pro-123B:

 #### Open Source License
+The code in this repository is open-sourced under the Apache-2.0 license. The model weights are open-sourced under the WhizReviewer License, which introduces additional content based on the **Llama 3.1 Community License** to ensure the model is not misused.
 | Score Min Acc                 | 36.96%                                                       | **42.70%**                                                   | 31.77%                                                       |
 | Score Max Acc                 | 24.73%                                                       | 23.69%                                                       | **49.09%**                                                   |
+We instruct the WhizReviewer-ML model to simulate reviewers from low-scoring to high-scoring, generating review comments and final scores in sequence. After collecting all review comments, a Meta-Reviewer is generated, which can predict the final acceptance result. In the evaluation results, Decisions Acc represents the accuracy of predicting the correct outcome given a paper, while Score Avg Abs represents the absolute difference between the average predicted score and the original score.
 #### How to use
 The models included in this repository can be used with the `transformers` or `vllm` code libraries.
 We mixed three hundred review comment samples from ICLR2024 and generated samples from WhizReviewer-ML as the evaluated dataset, with Llama-3.1-8B as the reference model. Detect Acc indicates the accuracy of being correctly detected by Fast-Detect-GPT.
+#### Intended Uses
+**Expected Use Cases** The WhizReviewer series models are suitable for research purposes in multiple languages. This includes but is not limited to the following objectives:
+1. Paper Improvement: Assist in enhancing the quality and clarity of academic papers.
+2. Writing Practice: Provide a platform for users to practice and refine their academic writing skills.
+3. Self-assessment Tool: Enable researchers to evaluate their own work before submission.
+4. Learning Aid: Support students and researchers in understanding the peer review process.
+5. Feedback Simulation: Offer simulated peer review feedback to prepare authors for actual reviews.
+6. Revision Guide: Provide structured guidance for revising academic papers.
+7. Concept Validator: Help researchers validate their ideas and hypotheses.
+8. Reward Model: Serve as a component in machine learning systems for academic writing improvement.
+9. Educational Resource: Act as a teaching tool for academic writing and peer review processes.
+10. Research Assistant: Aid in literature reviews and research methodology refinement.
+11. Supplementary Tool: Complement human review in informal, non-official settings.
+**Out of Scope** We do not allow this model to be misused to influence the academic environment. In addition to what is not allowed under the Llama License and Mistral License, the following are also not permitted by us:
+1. Official Reviews: The WhizReviewer-ML explicitly prohibits use for official peer reviews in any capacity.
+2. Legal or Ethical Decisions: Not designed to make judgments on research ethics or legal compliance.
+3. Factual Verification: While it can offer feedback, it should not be the sole source for fact-checking or verifying scientific claims.
+4. Plagiarism Detection: Not equipped to serve as a plagiarism detection tool.
+5. Publication Decisions: Cannot be used to make final decisions on whether a paper should be published.
+6. Expert Consultation: Not a replacement for expert consultation in specialized fields.
+**If you are unsure whether you meet our License requirements, please send your relevant application to [email protected] for further inquiry**
 #### Case Study
 We take a (Poster Paper) from ICLR 2024 titled "Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks" as an example to showcase WhizReviewer-ML's review suggestions. Please refer to [Openreview](https://openreview.net/forum?id=9nsNyN0vox) for the original reviews. The human review scores were *[5,6,6,8]*. The following content is from the output of WhizReviewer-ML-Pro-123B: