Update README.md

Browse files

Files changed (1) hide show

README.md +91 -12

README.md CHANGED Viewed

@@ -28,24 +28,106 @@ should probably proofread and complete it, then remove this comment. -->
 ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64e380b2e12618b261fa6ba0/62S_ExHO6NKCM3NhPDrds.jpeg)
-AlphaMonarch-laser is a DPO fine-tuned of [mlabonne/NeuralMonarch-7B](https://huggingface.co/mlabonne/NeuralMonarch-7B/) using the [argilla/OpenHermes2.5-dpo-binarized-alpha](https://huggingface.co/datasets/argilla/OpenHermes2.5-dpo-binarized-alpha) preference dataset but achieves better performance then [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B/) using
-This model is a fine-tuned version of [mlabonne/NeuralMonarch-7B](https://huggingface.co/mlabonne/NeuralMonarch-7B) on
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters
@@ -63,9 +145,6 @@ The following hyperparameters were used during training:
 ### 📝 Axolotl Configuration
 ```yaml

 ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64e380b2e12618b261fa6ba0/62S_ExHO6NKCM3NhPDrds.jpeg)
+AlphaMonarch-laser is a DPO fine-tuned of [mlabonne/NeuralMonarch-7B](https://huggingface.co/mlabonne/NeuralMonarch-7B/) using the [argilla/OpenHermes2.5-dpo-binarized-alpha](https://huggingface.co/datasets/argilla/OpenHermes2.5-dpo-binarized-alpha) preference dataset but achieves better performance then [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B/) using  LaserQLoRA. We have fine-tuned this model only on half of the projections, but have achieved better results as compared to the version released by Maximme Labonne. We have trained this model for 1080 steps.
+AlphaMonarch-laser is ranking 1 on YALL - [Yet Another LLM Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e380b2e12618b261fa6ba0/Jgxw1FZRx7nNAdSh7nYt1.png)
+## 🏆 Evaluation results
+# Nous Benchmark
+### AGIEVAL
+| Task                            | Version | Metric       | Value  | StdErr |
+|---------------------------------|---------|--------------|--------|--------|
+| agieval_aqua_rat                |     0   | acc          | 28.35% | 2.83%  |
+| agieval_aqua_rat                |     0   | acc_norm     | 26.38% | 2.77%  |
+| agieval_logiqa_en               |     0   | acc          | 38.25% | 1.91%  |
+| agieval_logiqa_en               |     0   | acc_norm     | 38.10% | 1.90%  |
+| agieval_lsat_ar                 |     0   | acc          | 23.91% | 2.82%  |
+| agieval_lsat_ar                 |     0   | acc_norm     | 23.48% | 2.80%  |
+| agieval_lsat_lr                 |     0   | acc          | 52.75% | 2.21%  |
+| agieval_lsat_lr                 |     0   | acc_norm     | 53.92% | 2.21%  |
+| agieval_lsat_rc                 |     0   | acc          | 66.91% | 2.87%  |
+| agieval_lsat_rc                 |     0   | acc_norm     | 67.29% | 2.87%  |
+| agieval_sat_en                  |     0   | acc          | 78.64% | 2.86%  |
+| agieval_sat_en                  |     0   | acc_norm     | 78.64% | 2.86%  |
+| agieval_sat_en_without_passage  |     0   | acc          | 45.15% | 3.48%  |
+| agieval_sat_en_without_passage  |     0   | acc_norm     | 44.17% | 3.47%  |
+| agieval_sat_math                |     0   | acc          | 33.18% | 3.18%  |
+| agieval_sat_math                |     0   | acc_norm     | 31.36% | 3.14%  |
+Average: 28.41%
+### GPT4ALL
+| Task         | Version | Metric   | Value | StdErr |
+|--------------|---------|----------|-------|--------|
+| arc_challenge| 0       | acc      | 66.30%| ± 1.38%|
+|              |         | acc_norm | 68.26%| ± 1.36%|
+| arc_easy     | 0       | acc      | 86.57%| ± 0.70%|
+|              |         | acc_norm | 80.81%| ± 0.81%|
+| boolq        | 1       | acc      | 87.16%| ± 0.59%|
+| hellaswag    | 0       | acc      | 69.60%| ± 0.46%|
+|              |         | acc_norm | 87.45%| ± 0.33%|
+| openbookqa   | 0       | acc      | 39.20%| ± 2.19%|
+|              |         | acc_norm | 49.60%| ± 2.24%|
+| piqa         | 0       | acc      | 83.03%| ± 0.88%|
+|              |         | acc_norm | 84.87%| ± 0.84%|
+| winogrande   | 0       | acc      | 81.06%| ± 1.10%|
+Average: 76.98%
+### TRUTHFUL-QA
+| Task          | Version | Metric | Value | StdErr |
+|---------------|---------|--------|-------|--------|
+| truthfulqa_mc |    1    |   mc1  | 63.04%| ± 1.69%|
+| truthfulqa_mc |    1    |   mc2  | 78.39%| ± 1.37%|
+Average: 70.71%
+### BIGBENCH
+| Task                                           | Version | Metric                | Value | StdErr             |
+|------------------------------------------------|---------|-----------------------|-------|--------------------|
+| bigbench_causal_judgement                      |    0    | multiple_choice_grade| 60.00%| ± 3.56%            |
+| bigbench_date_understanding                    |    0    | multiple_choice_grade| 62.06%| ± 2.53%            |
+| bigbench_disambiguation_qa                     |    0    | multiple_choice_grade| 54.26%| ± 3.11%            |
+| bigbench_geometric_shapes                      |    0    | multiple_choice_grade| 23.96%| ± 2.26%            |
+|                                                |         | exact_str_match       | 0.00% | ± 0.00%            |
+| bigbench_logical_deduction_five_objects        |    0    | multiple_choice_grade| 32.80%| ± 2.10%            |
+| bigbench_logical_deduction_seven_objects       |    0    | multiple_choice_grade| 23.86%| ± 1.61%            |
+| bigbench_logical_deduction_three_objects       |    0    | multiple_choice_grade| 59.33%| ± 2.84%            |
+| bigbench_movie_recommendation                  |    0    | multiple_choice_grade| 58.00%| ± 2.21%            |
+| bigbench_navigate                              |    0    | multiple_choice_grade| 56.00%| ± 1.57%            |
+| bigbench_reasoning_about_colored_objects       |    0    | multiple_choice_grade| 69.20%| ± 1.03%            |
+| bigbench_ruin_names                            |    0    | multiple_choice_grade| 55.36%| ± 2.35%            |
+| bigbench_salient_translation_error_detection   |    0    | multiple_choice_grade| 41.48%| ± 1.56%            |
+| bigbench_snarks                                |    0    | multiple_choice_grade| 73.48%| ± 3.29%            |
+| bigbench_sports_understanding                  |    0    | multiple_choice_grade| 76.06%| ± 1.36%            |
+| bigbench_temporal_sequences                    |    0    | multiple_choice_grade| 55.50%| ± 1.57%            |
+| bigbench_tracking_shuffled_objects_five_objects|    0    | multiple_choice_grade| 23.28%| ± 1.20%            |
+| bigbench_tracking_shuffled_objects_seven_objects|   0    | multiple_choice_grade| 19.37%| ± 0.94%            |
+| bigbench_tracking_shuffled_objects_three_objects|   0    | multiple_choice_grade| 59.33%| ± 2.84%            |
+Average: 55.37%
+# Openllm Benchmark
+|    Task     |Version| Metric |Value|   |Stderr|
+|-------------|------:|--------|----:|---|-----:|
+|arc_challenge|      0|acc     |70.12|±  |  1.30|
+|             |       |acc_norm|73.27|±  |  1.29|
+|hellaswag    |      0|acc     |71.80|±  |  0.44|
+|             |       |acc_norm|89.20|±  |  0.30|
+|gsm8k        |      0|acc     |66.77|±  |  1.2 |
+|winogrande   |      0|acc     |84.6 |±  |  1.0 |
+Average: 73.5%
+### TruthfulQA
+|    Task     |Version|Metric|Value|   |Stderr|
+|-------------|------:|------|----:|---|-----:|
+|truthfulqa_mc|      1|mc1   |62.79|±  |  1.69|
+|             |       |mc2   |77.90|±  |  1.37|
 ### Training hyperparameters
 ### 📝 Axolotl Configuration
 ```yaml