tangledgroup
/

tangled-alpha-0.9-core

@@ -80,6 +80,15 @@ Total number of tokens in the optimized dataset '../core-data-7-65537-131073-131
 real    292m54.341s
 user    2118m1.154s
 sys     12m2.746s
 ```
 ```bash
@@ -137,7 +146,7 @@ mv wandb wandb-pretrain-core-0
 Copy config:
 ```bash
-cp ../config-0.json ../out/pretrain-core-0/final
 ```
 Chat with model:
@@ -147,8 +156,57 @@ CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable
 ```
 ```bash
-CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-core-0/leaderboard/' --batch_size 1 --dtype 'bfloat16' '../out/pretrain-core-0/final'
 ```
 ```
 ```

 real    292m54.341s
 user    2118m1.154s
 sys     12m2.746s
+20G     tangled-alpha-0.9-core/core-data-0-0-1073741824-1025-16000
+2.4G    tangled-alpha-0.9-core/core-data-1-1025-2049-2049-8000
+1.8G    tangled-alpha-0.9-core/core-data-2-2049-4097-4097-4000
+1.8G    tangled-alpha-0.9-core/core-data-3-4097-8193-8193-2000
+2.3G    tangled-alpha-0.9-core/core-data-4-8193-16385-16385-1000
+1.6G    tangled-alpha-0.9-core/core-data-5-16385-32769-32769-500
+709M    tangled-alpha-0.9-core/core-data-6-32769-65537-65537-250
+321M    tangled-alpha-0.9-core/core-data-7-65537-131073-131073-125
 ```
 ```bash
 Copy config:
 ```bash
+cp ../config-0.json ../out/pretrain-core-0/final/config.json
 ```
 Chat with model:
 ```
 ```bash
+CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-core-0/leaderboard/' --batch_size '4' --dtype 'bfloat16' '../out/pretrain-core-0/final'
 ```
 ```
+|                           Tasks                           |Version|Filter|n-shot|        Metric         |   |Value |   |Stderr|
+|-----------------------------------------------------------|-------|------|-----:|-----------------------|---|-----:|---|------|
+|leaderboard                                                |    N/A|      |      |                       |   |      |   |      |
+| - leaderboard_bbh                                         |    N/A|      |      |                       |   |      |   |      |
+|  - leaderboard_bbh_boolean_expressions                    |      1|none  |     3|acc_norm               |↑  |0.4600|±  |0.0316|
+|  - leaderboard_bbh_causal_judgement                       |      1|none  |     3|acc_norm               |↑  |0.5134|±  |0.0366|
+|  - leaderboard_bbh_date_understanding                     |      1|none  |     3|acc_norm               |↑  |0.1960|±  |0.0252|
+|  - leaderboard_bbh_disambiguation_qa                      |      1|none  |     3|acc_norm               |↑  |0.3320|±  |0.0298|
+|  - leaderboard_bbh_formal_fallacies                       |      1|none  |     3|acc_norm               |↑  |0.4680|±  |0.0316|
+|  - leaderboard_bbh_geometric_shapes                       |      1|none  |     3|acc_norm               |↑  |0.2400|±  |0.0271|
+|  - leaderboard_bbh_hyperbaton                             |      1|none  |     3|acc_norm               |↑  |0.5160|±  |0.0317|
+|  - leaderboard_bbh_logical_deduction_five_objects         |      1|none  |     3|acc_norm               |↑  |0.2040|±  |0.0255|
+|  - leaderboard_bbh_logical_deduction_seven_objects        |      1|none  |     3|acc_norm               |↑  |0.1320|±  |0.0215|
+|  - leaderboard_bbh_logical_deduction_three_objects        |      1|none  |     3|acc_norm               |↑  |0.3440|±  |0.0301|
+|  - leaderboard_bbh_movie_recommendation                   |      1|none  |     3|acc_norm               |↑  |0.2680|±  |0.0281|
+|  - leaderboard_bbh_navigate                               |      1|none  |     3|acc_norm               |↑  |0.5720|±  |0.0314|
+|  - leaderboard_bbh_object_counting                        |      1|none  |     3|acc_norm               |↑  |0.0680|±  |0.0160|
+|  - leaderboard_bbh_penguins_in_a_table                    |      1|none  |     3|acc_norm               |↑  |0.2055|±  |0.0336|
+|  - leaderboard_bbh_reasoning_about_colored_objects        |      1|none  |     3|acc_norm               |↑  |0.1760|±  |0.0241|
+|  - leaderboard_bbh_ruin_names                             |      1|none  |     3|acc_norm               |↑  |0.2120|±  |0.0259|
+|  - leaderboard_bbh_salient_translation_error_detection    |      1|none  |     3|acc_norm               |↑  |0.2240|±  |0.0264|
+|  - leaderboard_bbh_snarks                                 |      1|none  |     3|acc_norm               |↑  |0.5393|±  |0.0375|
+|  - leaderboard_bbh_sports_understanding                   |      1|none  |     3|acc_norm               |↑  |0.4600|±  |0.0316|
+|  - leaderboard_bbh_temporal_sequences                     |      1|none  |     3|acc_norm               |↑  |0.2760|±  |0.0283|
+|  - leaderboard_bbh_tracking_shuffled_objects_five_objects |      1|none  |     3|acc_norm               |↑  |0.1720|±  |0.0239|
+|  - leaderboard_bbh_tracking_shuffled_objects_seven_objects|      1|none  |     3|acc_norm               |↑  |0.1360|±  |0.0217|
+|  - leaderboard_bbh_tracking_shuffled_objects_three_objects|      1|none  |     3|acc_norm               |↑  |0.3320|±  |0.0298|
+|  - leaderboard_bbh_web_of_lies                            |      1|none  |     3|acc_norm               |↑  |0.4880|±  |0.0317|
+| - leaderboard_gpqa                                        |    N/A|      |      |                       |   |      |   |      |
+|  - leaderboard_gpqa_diamond                               |      1|none  |     0|acc_norm               |↑  |0.2071|±  |0.0289|
+|  - leaderboard_gpqa_extended                              |      1|none  |     0|acc_norm               |↑  |0.2637|±  |0.0189|
+|  - leaderboard_gpqa_main                                  |      1|none  |     0|acc_norm               |↑  |0.2612|±  |0.0208|
+| - leaderboard_ifeval                                      |      3|none  |     0|inst_level_loose_acc   |↑  |0.2770|±  |   N/A|
+|                                                           |       |none  |     0|inst_level_strict_acc  |↑  |0.2710|±  |   N/A|
+|                                                           |       |none  |     0|prompt_level_loose_acc |↑  |0.1534|±  |0.0155|
+|                                                           |       |none  |     0|prompt_level_strict_acc|↑  |0.1497|±  |0.0154|
+| - leaderboard_math_hard                                   |    N/A|      |      |                       |   |      |   |      |
+|  - leaderboard_math_algebra_hard                          |      2|none  |     4|exact_match            |↑  |0.0017|±  |0.0012|
+|  - leaderboard_math_counting_and_prob_hard                |      2|none  |     4|exact_match            |↑  |0.0000|±  |     0|
+|  - leaderboard_math_geometry_hard                         |      2|none  |     4|exact_match            |↑  |0.0000|±  |     0|
+|  - leaderboard_math_intermediate_algebra_hard             |      2|none  |     4|exact_match            |↑  |0.0033|±  |0.0019|
+|  - leaderboard_math_num_theory_hard                       |      2|none  |     4|exact_match            |↑  |0.0037|±  |0.0026|
+|  - leaderboard_math_prealgebra_hard                       |      2|none  |     4|exact_match            |↑  |0.0046|±  |0.0023|
+|  - leaderboard_math_precalculus_hard                      |      2|none  |     4|exact_match            |↑  |0.0000|±  |     0|
+| - leaderboard_mmlu_pro                                    |    0.1|none  |     5|acc                    |↑  |0.1068|±  |0.0028|
+| - leaderboard_musr                                        |    N/A|      |      |                       |   |      |   |      |
+|  - leaderboard_musr_murder_mysteries                      |      1|none  |     0|acc_norm               |↑  |0.5160|±  |0.0317|
+|  - leaderboard_musr_object_placements                     |      1|none  |     0|acc_norm               |↑  |0.2344|±  |0.0265|
+|  - leaderboard_musr_team_allocation                       |      1|none  |     0|acc_norm               |↑  |0.3200|±  |0.0296|
 ```