Merge branch 'main' of hf.co:tangledgroup/tangled-alpha-0.11-core

Files changed (2) hide show

README.md CHANGED Viewed

@@ -93,20 +93,20 @@ CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable
 ```
 Seed set to 23
-Time to instantiate model: 0.21 seconds.
-Total parameters: 402,703,104
 Verifying settings ...
-Measured TFLOPs: 42432.35
-Epoch 1 | iter 64 step 1 | loss train: 11.984, val: n/a | iter time: 460.76 ms (step) remaining time: 12 days, 3:41:55
-Epoch 1 | iter 128 step 2 | loss train: 11.979, val: n/a | iter time: 402.83 ms (step) remaining time: 9 days, 0:57:24
-Epoch 1 | iter 192 step 3 | loss train: 11.983, val: n/a | iter time: 403.46 ms (step) remaining time: 8 days, 0:12:58
-Epoch 1 | iter 256 step 4 | loss train: 11.983, val: n/a | iter time: 403.39 ms (step) remaining time: 7 days, 11:52:07
-Epoch 1 | iter 320 step 5 | loss train: 11.979, val: n/a | iter time: 403.85 ms (step) remaining time: 7 days, 4:28:33
-Epoch 1 | iter 384 step 6 | loss train: 11.978, val: n/a | iter time: 403.93 ms (step) remaining time: 6 days, 23:33:15
-Epoch 1 | iter 448 step 7 | loss train: 11.978, val: n/a | iter time: 403.38 ms (step) remaining time: 6 days, 20:02:28
-Epoch 1 | iter 512 step 8 | loss train: 11.973, val: n/a | iter time: 403.80 ms (step) remaining time: 6 days, 17:24:49
-Epoch 1 | iter 576 step 9 | loss train: 11.972, val: n/a | iter time: 403.23 ms (step) remaining time: 6 days, 15:21:59
-Epoch 1 | iter 640 step 10 | loss train: 11.967, val: n/a | iter time: 403.38 ms (step) remaining time: 6 days, 13:43:53
 # ...
 ```

 ```
 Seed set to 23
+Time to instantiate model: 0.20 seconds.
+Total parameters: 234,897,920
 Verifying settings ...
+Measured TFLOPs: 28077.03
+Epoch 1 | iter 64 step 1 | loss train: 11.977, val: n/a | iter time: 350.96 ms (step) remaining time: 10 days, 14:14:05
+Epoch 1 | iter 128 step 2 | loss train: 11.977, val: n/a | iter time: 280.36 ms (step) remaining time: 7 days, 8:25:44
+Epoch 1 | iter 192 step 3 | loss train: 11.974, val: n/a | iter time: 280.80 ms (step) remaining time: 6 days, 6:28:36
+Epoch 1 | iter 256 step 4 | loss train: 11.975, val: n/a | iter time: 281.44 ms (step) remaining time: 5 days, 17:28:43
+Epoch 1 | iter 320 step 5 | loss train: 11.974, val: n/a | iter time: 280.13 ms (step) remaining time: 5 days, 9:40:25
+Epoch 1 | iter 384 step 6 | loss train: 11.976, val: n/a | iter time: 281.50 ms (step) remaining time: 5 days, 4:26:59
+Epoch 1 | iter 448 step 7 | loss train: 11.974, val: n/a | iter time: 280.34 ms (step) remaining time: 5 days, 0:43:34
+Epoch 1 | iter 512 step 8 | loss train: 11.970, val: n/a | iter time: 280.74 ms (step) remaining time: 4 days, 21:55:15
+Epoch 1 | iter 576 step 9 | loss train: 11.970, val: n/a | iter time: 279.90 ms (step) remaining time: 4 days, 19:44:24
+Epoch 1 | iter 640 step 10 | loss train: 11.971, val: n/a | iter time: 279.74 ms (step) remaining time: 4 days, 17:59:44
 # ...
 ```

config-0.json CHANGED Viewed

@@ -8,15 +8,15 @@
   "eos_token_id": 1,
   "head_dim": 64,
   "hidden_act": "silu",
-  "hidden_size": 768,
   "initializer_range": 0.02,
-  "intermediate_size": 2048,
   "max_position_embeddings": 131072,
   "mlp_bias": false,
   "model_type": "llama",
-  "num_attention_heads": 12,
   "num_hidden_layers": 32,
-  "num_key_value_heads": 4,
   "pretraining_tp": 1,
   "rms_norm_eps": 1e-05,
   "rope_scaling": null,

   "eos_token_id": 1,
   "head_dim": 64,
   "hidden_act": "silu",
+  "hidden_size": 512,
   "initializer_range": 0.02,
+  "intermediate_size": 1365,
   "max_position_embeddings": 131072,
   "mlp_bias": false,
   "model_type": "llama",
+  "num_attention_heads": 8,
   "num_hidden_layers": 32,
+  "num_key_value_heads": 8,
   "pretraining_tp": 1,
   "rms_norm_eps": 1e-05,
   "rope_scaling": null,