SummerSigh commited on
Commit
b6a4d04
·
verified ·
1 Parent(s): 83a9cd4

Upload 8 files

Browse files
Files changed (4) hide show
  1. model.safetensors +1 -1
  2. optimizer.pt +1 -1
  3. scheduler.pt +1 -1
  4. trainer_state.json +2964 -4
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:63c707b72c3779662a4c5641912d3bd484e848cda99f1f5fa45be2dfd8a0fef8
3
  size 18494040
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ebc653e18935ef6e6d593d943659324be81a89c736dc6b33141d9a72bc9696c
3
  size 18494040
optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:dadf442c9909d4e1c2b9e49f1d628a9826771fbfb353eea8aaebd90c70f08957
3
  size 37035002
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8c16f2324ca710c73fffdab8ac0b51ca0937adca9d3ea12a4fcf7b6b75c642b3
3
  size 37035002
scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b7341b3083234687d0b9b4d7741daf02731dfc2e5ecfee8b661951c827d79431
3
  size 1064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:caeb79da12fade882c795419ac73c6806820c6ccef19831ac9e9b66b6ca1212b
3
  size 1064
trainer_state.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 0.28651874218272183,
5
  "eval_steps": 500,
6
- "global_step": 26000,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
@@ -1391,11 +1391,2971 @@
1391
  "loss": 4.5421,
1392
  "num_input_tokens_seen": 151536512,
1393
  "step": 25950
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1394
  }
1395
  ],
1396
  "logging_steps": 150,
1397
  "max_steps": 272232,
1398
- "num_input_tokens_seen": 151830560,
1399
  "num_train_epochs": 3,
1400
  "save_steps": 500,
1401
  "stateful_callbacks": {
@@ -1410,7 +4370,7 @@
1410
  "attributes": {}
1411
  }
1412
  },
1413
- "total_flos": 2344361017958400.0,
1414
  "train_batch_size": 32,
1415
  "trial_name": null,
1416
  "trial_params": null
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 0.8981260572266088,
5
  "eval_steps": 500,
6
+ "global_step": 81500,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
 
1391
  "loss": 4.5421,
1392
  "num_input_tokens_seen": 151536512,
1393
  "step": 25950
1394
+ },
1395
+ {
1396
+ "epoch": 0.2876207373449631,
1397
+ "grad_norm": 1.851283073425293,
1398
+ "learning_rate": 0.0001446669703336909,
1399
+ "loss": 4.5469,
1400
+ "num_input_tokens_seen": 152395904,
1401
+ "step": 26100
1402
+ },
1403
+ {
1404
+ "epoch": 0.2892737300883249,
1405
+ "grad_norm": 1.8593121767044067,
1406
+ "learning_rate": 0.00014457880893956537,
1407
+ "loss": 4.5536,
1408
+ "num_input_tokens_seen": 153263904,
1409
+ "step": 26250
1410
+ },
1411
+ {
1412
+ "epoch": 0.29092672283168675,
1413
+ "grad_norm": 1.8973345756530762,
1414
+ "learning_rate": 0.00014449064754543987,
1415
+ "loss": 4.5613,
1416
+ "num_input_tokens_seen": 154143616,
1417
+ "step": 26400
1418
+ },
1419
+ {
1420
+ "epoch": 0.29257971557504864,
1421
+ "grad_norm": 1.9224932193756104,
1422
+ "learning_rate": 0.00014440307389394186,
1423
+ "loss": 4.5513,
1424
+ "num_input_tokens_seen": 155013568,
1425
+ "step": 26550
1426
+ },
1427
+ {
1428
+ "epoch": 0.2942327083184105,
1429
+ "grad_norm": 1.8901547193527222,
1430
+ "learning_rate": 0.00014431491249981634,
1431
+ "loss": 4.5552,
1432
+ "num_input_tokens_seen": 155889408,
1433
+ "step": 26700
1434
+ },
1435
+ {
1436
+ "epoch": 0.29588570106177237,
1437
+ "grad_norm": 1.8522729873657227,
1438
+ "learning_rate": 0.0001442267511056908,
1439
+ "loss": 4.546,
1440
+ "num_input_tokens_seen": 156769184,
1441
+ "step": 26850
1442
+ },
1443
+ {
1444
+ "epoch": 0.2975386938051342,
1445
+ "grad_norm": 1.8729475736618042,
1446
+ "learning_rate": 0.00014413858971156531,
1447
+ "loss": 4.553,
1448
+ "num_input_tokens_seen": 157648416,
1449
+ "step": 27000
1450
+ },
1451
+ {
1452
+ "epoch": 0.29919168654849604,
1453
+ "grad_norm": 1.8859171867370605,
1454
+ "learning_rate": 0.0001440504283174398,
1455
+ "loss": 4.5468,
1456
+ "num_input_tokens_seen": 158528480,
1457
+ "step": 27150
1458
+ },
1459
+ {
1460
+ "epoch": 0.30084467929185793,
1461
+ "grad_norm": 1.8305902481079102,
1462
+ "learning_rate": 0.0001439628546659418,
1463
+ "loss": 4.554,
1464
+ "num_input_tokens_seen": 159401376,
1465
+ "step": 27300
1466
+ },
1467
+ {
1468
+ "epoch": 0.30249767203521977,
1469
+ "grad_norm": 1.7858612537384033,
1470
+ "learning_rate": 0.00014387469327181628,
1471
+ "loss": 4.5459,
1472
+ "num_input_tokens_seen": 160297888,
1473
+ "step": 27450
1474
+ },
1475
+ {
1476
+ "epoch": 0.3041506647785816,
1477
+ "grad_norm": 1.9333041906356812,
1478
+ "learning_rate": 0.00014378653187769078,
1479
+ "loss": 4.5488,
1480
+ "num_input_tokens_seen": 161189632,
1481
+ "step": 27600
1482
+ },
1483
+ {
1484
+ "epoch": 0.3058036575219435,
1485
+ "grad_norm": 1.904167652130127,
1486
+ "learning_rate": 0.00014369837048356526,
1487
+ "loss": 4.5457,
1488
+ "num_input_tokens_seen": 162069728,
1489
+ "step": 27750
1490
+ },
1491
+ {
1492
+ "epoch": 0.30745665026530533,
1493
+ "grad_norm": 1.8620493412017822,
1494
+ "learning_rate": 0.00014361020908943976,
1495
+ "loss": 4.543,
1496
+ "num_input_tokens_seen": 162956000,
1497
+ "step": 27900
1498
+ },
1499
+ {
1500
+ "epoch": 0.30910964300866717,
1501
+ "grad_norm": 1.9523295164108276,
1502
+ "learning_rate": 0.00014352204769531424,
1503
+ "loss": 4.5368,
1504
+ "num_input_tokens_seen": 163850880,
1505
+ "step": 28050
1506
+ },
1507
+ {
1508
+ "epoch": 0.31076263575202906,
1509
+ "grad_norm": 1.842017412185669,
1510
+ "learning_rate": 0.00014343388630118874,
1511
+ "loss": 4.5423,
1512
+ "num_input_tokens_seen": 164723840,
1513
+ "step": 28200
1514
+ },
1515
+ {
1516
+ "epoch": 0.3124156284953909,
1517
+ "grad_norm": 2.015977621078491,
1518
+ "learning_rate": 0.0001433457249070632,
1519
+ "loss": 4.5306,
1520
+ "num_input_tokens_seen": 165582464,
1521
+ "step": 28350
1522
+ },
1523
+ {
1524
+ "epoch": 0.3140686212387528,
1525
+ "grad_norm": 1.7622820138931274,
1526
+ "learning_rate": 0.00014325756351293772,
1527
+ "loss": 4.5336,
1528
+ "num_input_tokens_seen": 166442496,
1529
+ "step": 28500
1530
+ },
1531
+ {
1532
+ "epoch": 0.3157216139821146,
1533
+ "grad_norm": 1.8762463331222534,
1534
+ "learning_rate": 0.0001431694021188122,
1535
+ "loss": 4.5363,
1536
+ "num_input_tokens_seen": 167318048,
1537
+ "step": 28650
1538
+ },
1539
+ {
1540
+ "epoch": 0.31737460672547646,
1541
+ "grad_norm": 1.9524190425872803,
1542
+ "learning_rate": 0.00014308124072468667,
1543
+ "loss": 4.5498,
1544
+ "num_input_tokens_seen": 168209312,
1545
+ "step": 28800
1546
+ },
1547
+ {
1548
+ "epoch": 0.31902759946883835,
1549
+ "grad_norm": 1.8535164594650269,
1550
+ "learning_rate": 0.00014299307933056117,
1551
+ "loss": 4.5481,
1552
+ "num_input_tokens_seen": 169096896,
1553
+ "step": 28950
1554
+ },
1555
+ {
1556
+ "epoch": 0.3206805922122002,
1557
+ "grad_norm": 1.9309056997299194,
1558
+ "learning_rate": 0.00014290491793643564,
1559
+ "loss": 4.5416,
1560
+ "num_input_tokens_seen": 169963008,
1561
+ "step": 29100
1562
+ },
1563
+ {
1564
+ "epoch": 0.322333584955562,
1565
+ "grad_norm": 1.8341314792633057,
1566
+ "learning_rate": 0.00014281675654231014,
1567
+ "loss": 4.5439,
1568
+ "num_input_tokens_seen": 170836736,
1569
+ "step": 29250
1570
+ },
1571
+ {
1572
+ "epoch": 0.3239865776989239,
1573
+ "grad_norm": 1.8341432809829712,
1574
+ "learning_rate": 0.00014272859514818462,
1575
+ "loss": 4.5365,
1576
+ "num_input_tokens_seen": 171728896,
1577
+ "step": 29400
1578
+ },
1579
+ {
1580
+ "epoch": 0.32563957044228575,
1581
+ "grad_norm": 1.9161962270736694,
1582
+ "learning_rate": 0.00014264043375405912,
1583
+ "loss": 4.5428,
1584
+ "num_input_tokens_seen": 172603008,
1585
+ "step": 29550
1586
+ },
1587
+ {
1588
+ "epoch": 0.3272925631856476,
1589
+ "grad_norm": 1.8521162271499634,
1590
+ "learning_rate": 0.0001425522723599336,
1591
+ "loss": 4.5283,
1592
+ "num_input_tokens_seen": 173475392,
1593
+ "step": 29700
1594
+ },
1595
+ {
1596
+ "epoch": 0.3289455559290095,
1597
+ "grad_norm": 1.9026546478271484,
1598
+ "learning_rate": 0.0001424641109658081,
1599
+ "loss": 4.5306,
1600
+ "num_input_tokens_seen": 174360832,
1601
+ "step": 29850
1602
+ },
1603
+ {
1604
+ "epoch": 0.3305985486723713,
1605
+ "grad_norm": 1.9297667741775513,
1606
+ "learning_rate": 0.00014237594957168257,
1607
+ "loss": 4.5301,
1608
+ "num_input_tokens_seen": 175244576,
1609
+ "step": 30000
1610
+ },
1611
+ {
1612
+ "epoch": 0.3322515414157332,
1613
+ "grad_norm": 1.9747087955474854,
1614
+ "learning_rate": 0.00014228778817755705,
1615
+ "loss": 4.54,
1616
+ "num_input_tokens_seen": 176108352,
1617
+ "step": 30150
1618
+ },
1619
+ {
1620
+ "epoch": 0.33390453415909505,
1621
+ "grad_norm": 1.9235451221466064,
1622
+ "learning_rate": 0.00014219962678343155,
1623
+ "loss": 4.5438,
1624
+ "num_input_tokens_seen": 176987808,
1625
+ "step": 30300
1626
+ },
1627
+ {
1628
+ "epoch": 0.3355575269024569,
1629
+ "grad_norm": 1.8416277170181274,
1630
+ "learning_rate": 0.00014211146538930603,
1631
+ "loss": 4.5476,
1632
+ "num_input_tokens_seen": 177875584,
1633
+ "step": 30450
1634
+ },
1635
+ {
1636
+ "epoch": 0.3372105196458188,
1637
+ "grad_norm": 1.899798035621643,
1638
+ "learning_rate": 0.00014202330399518053,
1639
+ "loss": 4.5413,
1640
+ "num_input_tokens_seen": 178752480,
1641
+ "step": 30600
1642
+ },
1643
+ {
1644
+ "epoch": 0.3388635123891806,
1645
+ "grad_norm": 1.8849375247955322,
1646
+ "learning_rate": 0.000141935142601055,
1647
+ "loss": 4.5199,
1648
+ "num_input_tokens_seen": 179634080,
1649
+ "step": 30750
1650
+ },
1651
+ {
1652
+ "epoch": 0.34051650513254245,
1653
+ "grad_norm": 1.7944519519805908,
1654
+ "learning_rate": 0.0001418469812069295,
1655
+ "loss": 4.5268,
1656
+ "num_input_tokens_seen": 180501472,
1657
+ "step": 30900
1658
+ },
1659
+ {
1660
+ "epoch": 0.34216949787590434,
1661
+ "grad_norm": 1.8572932481765747,
1662
+ "learning_rate": 0.00014175881981280398,
1663
+ "loss": 4.5268,
1664
+ "num_input_tokens_seen": 181388384,
1665
+ "step": 31050
1666
+ },
1667
+ {
1668
+ "epoch": 0.3438224906192662,
1669
+ "grad_norm": 1.8637559413909912,
1670
+ "learning_rate": 0.00014167065841867846,
1671
+ "loss": 4.5245,
1672
+ "num_input_tokens_seen": 182271040,
1673
+ "step": 31200
1674
+ },
1675
+ {
1676
+ "epoch": 0.345475483362628,
1677
+ "grad_norm": 1.902794361114502,
1678
+ "learning_rate": 0.00014158249702455296,
1679
+ "loss": 4.5138,
1680
+ "num_input_tokens_seen": 183160032,
1681
+ "step": 31350
1682
+ },
1683
+ {
1684
+ "epoch": 0.3471284761059899,
1685
+ "grad_norm": 1.8915212154388428,
1686
+ "learning_rate": 0.00014149433563042743,
1687
+ "loss": 4.5296,
1688
+ "num_input_tokens_seen": 184045856,
1689
+ "step": 31500
1690
+ },
1691
+ {
1692
+ "epoch": 0.34878146884935174,
1693
+ "grad_norm": 1.9054772853851318,
1694
+ "learning_rate": 0.00014140617423630193,
1695
+ "loss": 4.5343,
1696
+ "num_input_tokens_seen": 184935872,
1697
+ "step": 31650
1698
+ },
1699
+ {
1700
+ "epoch": 0.35043446159271363,
1701
+ "grad_norm": 1.8381603956222534,
1702
+ "learning_rate": 0.0001413180128421764,
1703
+ "loss": 4.5241,
1704
+ "num_input_tokens_seen": 185812160,
1705
+ "step": 31800
1706
+ },
1707
+ {
1708
+ "epoch": 0.35208745433607547,
1709
+ "grad_norm": 1.8929849863052368,
1710
+ "learning_rate": 0.0001412298514480509,
1711
+ "loss": 4.5311,
1712
+ "num_input_tokens_seen": 186698304,
1713
+ "step": 31950
1714
+ },
1715
+ {
1716
+ "epoch": 0.3537404470794373,
1717
+ "grad_norm": 1.8554471731185913,
1718
+ "learning_rate": 0.0001411422777965529,
1719
+ "loss": 4.52,
1720
+ "num_input_tokens_seen": 187570560,
1721
+ "step": 32100
1722
+ },
1723
+ {
1724
+ "epoch": 0.3553934398227992,
1725
+ "grad_norm": 1.8524342775344849,
1726
+ "learning_rate": 0.00014105411640242738,
1727
+ "loss": 4.5231,
1728
+ "num_input_tokens_seen": 188446880,
1729
+ "step": 32250
1730
+ },
1731
+ {
1732
+ "epoch": 0.35704643256616103,
1733
+ "grad_norm": 1.8730753660202026,
1734
+ "learning_rate": 0.00014096595500830188,
1735
+ "loss": 4.5315,
1736
+ "num_input_tokens_seen": 189329856,
1737
+ "step": 32400
1738
+ },
1739
+ {
1740
+ "epoch": 0.35869942530952287,
1741
+ "grad_norm": 1.8252793550491333,
1742
+ "learning_rate": 0.00014087779361417636,
1743
+ "loss": 4.5165,
1744
+ "num_input_tokens_seen": 190203872,
1745
+ "step": 32550
1746
+ },
1747
+ {
1748
+ "epoch": 0.36035241805288476,
1749
+ "grad_norm": 1.8541933298110962,
1750
+ "learning_rate": 0.00014078963222005086,
1751
+ "loss": 4.5318,
1752
+ "num_input_tokens_seen": 191086784,
1753
+ "step": 32700
1754
+ },
1755
+ {
1756
+ "epoch": 0.3620054107962466,
1757
+ "grad_norm": 1.9790152311325073,
1758
+ "learning_rate": 0.00014070147082592533,
1759
+ "loss": 4.5202,
1760
+ "num_input_tokens_seen": 191964288,
1761
+ "step": 32850
1762
+ },
1763
+ {
1764
+ "epoch": 0.3636584035396085,
1765
+ "grad_norm": 1.980690836906433,
1766
+ "learning_rate": 0.00014061330943179983,
1767
+ "loss": 4.5282,
1768
+ "num_input_tokens_seen": 192848672,
1769
+ "step": 33000
1770
+ },
1771
+ {
1772
+ "epoch": 0.3653113962829703,
1773
+ "grad_norm": 1.8499431610107422,
1774
+ "learning_rate": 0.0001405251480376743,
1775
+ "loss": 4.5094,
1776
+ "num_input_tokens_seen": 193723936,
1777
+ "step": 33150
1778
+ },
1779
+ {
1780
+ "epoch": 0.36696438902633216,
1781
+ "grad_norm": 1.7975043058395386,
1782
+ "learning_rate": 0.0001404369866435488,
1783
+ "loss": 4.5296,
1784
+ "num_input_tokens_seen": 194603072,
1785
+ "step": 33300
1786
+ },
1787
+ {
1788
+ "epoch": 0.36861738176969405,
1789
+ "grad_norm": 1.8439886569976807,
1790
+ "learning_rate": 0.0001403488252494233,
1791
+ "loss": 4.5274,
1792
+ "num_input_tokens_seen": 195468512,
1793
+ "step": 33450
1794
+ },
1795
+ {
1796
+ "epoch": 0.3702703745130559,
1797
+ "grad_norm": 1.8969649076461792,
1798
+ "learning_rate": 0.0001402606638552978,
1799
+ "loss": 4.5195,
1800
+ "num_input_tokens_seen": 196345888,
1801
+ "step": 33600
1802
+ },
1803
+ {
1804
+ "epoch": 0.3719233672564177,
1805
+ "grad_norm": 1.8763043880462646,
1806
+ "learning_rate": 0.00014017250246117226,
1807
+ "loss": 4.5071,
1808
+ "num_input_tokens_seen": 197239776,
1809
+ "step": 33750
1810
+ },
1811
+ {
1812
+ "epoch": 0.3735763599997796,
1813
+ "grad_norm": 1.8754463195800781,
1814
+ "learning_rate": 0.00014008434106704677,
1815
+ "loss": 4.513,
1816
+ "num_input_tokens_seen": 198122528,
1817
+ "step": 33900
1818
+ },
1819
+ {
1820
+ "epoch": 0.37522935274314145,
1821
+ "grad_norm": 2.011179208755493,
1822
+ "learning_rate": 0.00013999617967292124,
1823
+ "loss": 4.5146,
1824
+ "num_input_tokens_seen": 198998208,
1825
+ "step": 34050
1826
+ },
1827
+ {
1828
+ "epoch": 0.3768823454865033,
1829
+ "grad_norm": 1.8127686977386475,
1830
+ "learning_rate": 0.00013990801827879574,
1831
+ "loss": 4.5103,
1832
+ "num_input_tokens_seen": 199877440,
1833
+ "step": 34200
1834
+ },
1835
+ {
1836
+ "epoch": 0.3785353382298652,
1837
+ "grad_norm": 1.792159080505371,
1838
+ "learning_rate": 0.00013981985688467022,
1839
+ "loss": 4.5139,
1840
+ "num_input_tokens_seen": 200738880,
1841
+ "step": 34350
1842
+ },
1843
+ {
1844
+ "epoch": 0.380188330973227,
1845
+ "grad_norm": 1.7812391519546509,
1846
+ "learning_rate": 0.0001397322832331722,
1847
+ "loss": 4.5151,
1848
+ "num_input_tokens_seen": 201618880,
1849
+ "step": 34500
1850
+ },
1851
+ {
1852
+ "epoch": 0.3818413237165889,
1853
+ "grad_norm": 1.8573509454727173,
1854
+ "learning_rate": 0.00013964412183904668,
1855
+ "loss": 4.5113,
1856
+ "num_input_tokens_seen": 202481888,
1857
+ "step": 34650
1858
+ },
1859
+ {
1860
+ "epoch": 0.38349431645995075,
1861
+ "grad_norm": 1.9190624952316284,
1862
+ "learning_rate": 0.0001395559604449212,
1863
+ "loss": 4.5118,
1864
+ "num_input_tokens_seen": 203344320,
1865
+ "step": 34800
1866
+ },
1867
+ {
1868
+ "epoch": 0.3851473092033126,
1869
+ "grad_norm": 1.8508821725845337,
1870
+ "learning_rate": 0.00013946779905079566,
1871
+ "loss": 4.5152,
1872
+ "num_input_tokens_seen": 204227936,
1873
+ "step": 34950
1874
+ },
1875
+ {
1876
+ "epoch": 0.3868003019466745,
1877
+ "grad_norm": 1.8276251554489136,
1878
+ "learning_rate": 0.00013937963765667016,
1879
+ "loss": 4.5182,
1880
+ "num_input_tokens_seen": 205119008,
1881
+ "step": 35100
1882
+ },
1883
+ {
1884
+ "epoch": 0.3884532946900363,
1885
+ "grad_norm": 1.773224949836731,
1886
+ "learning_rate": 0.00013929147626254464,
1887
+ "loss": 4.5091,
1888
+ "num_input_tokens_seen": 205989696,
1889
+ "step": 35250
1890
+ },
1891
+ {
1892
+ "epoch": 0.39010628743339815,
1893
+ "grad_norm": 1.889553189277649,
1894
+ "learning_rate": 0.00013920331486841914,
1895
+ "loss": 4.5111,
1896
+ "num_input_tokens_seen": 206865088,
1897
+ "step": 35400
1898
+ },
1899
+ {
1900
+ "epoch": 0.39175928017676004,
1901
+ "grad_norm": 1.844975233078003,
1902
+ "learning_rate": 0.00013911515347429362,
1903
+ "loss": 4.4982,
1904
+ "num_input_tokens_seen": 207738880,
1905
+ "step": 35550
1906
+ },
1907
+ {
1908
+ "epoch": 0.3934122729201219,
1909
+ "grad_norm": 1.8811978101730347,
1910
+ "learning_rate": 0.0001390269920801681,
1911
+ "loss": 4.5168,
1912
+ "num_input_tokens_seen": 208626688,
1913
+ "step": 35700
1914
+ },
1915
+ {
1916
+ "epoch": 0.3950652656634837,
1917
+ "grad_norm": 1.8564339876174927,
1918
+ "learning_rate": 0.0001389388306860426,
1919
+ "loss": 4.5122,
1920
+ "num_input_tokens_seen": 209484800,
1921
+ "step": 35850
1922
+ },
1923
+ {
1924
+ "epoch": 0.3967182584068456,
1925
+ "grad_norm": 1.8316396474838257,
1926
+ "learning_rate": 0.00013885066929191707,
1927
+ "loss": 4.5046,
1928
+ "num_input_tokens_seen": 210369792,
1929
+ "step": 36000
1930
+ },
1931
+ {
1932
+ "epoch": 0.39837125115020744,
1933
+ "grad_norm": 1.925075650215149,
1934
+ "learning_rate": 0.00013876250789779157,
1935
+ "loss": 4.5116,
1936
+ "num_input_tokens_seen": 211250176,
1937
+ "step": 36150
1938
+ },
1939
+ {
1940
+ "epoch": 0.40002424389356933,
1941
+ "grad_norm": 1.860660195350647,
1942
+ "learning_rate": 0.00013867434650366605,
1943
+ "loss": 4.4998,
1944
+ "num_input_tokens_seen": 212136000,
1945
+ "step": 36300
1946
+ },
1947
+ {
1948
+ "epoch": 0.40167723663693117,
1949
+ "grad_norm": 1.8505064249038696,
1950
+ "learning_rate": 0.00013858618510954055,
1951
+ "loss": 4.5124,
1952
+ "num_input_tokens_seen": 213009056,
1953
+ "step": 36450
1954
+ },
1955
+ {
1956
+ "epoch": 0.403330229380293,
1957
+ "grad_norm": 1.8910654783248901,
1958
+ "learning_rate": 0.00013849802371541502,
1959
+ "loss": 4.4966,
1960
+ "num_input_tokens_seen": 213871744,
1961
+ "step": 36600
1962
+ },
1963
+ {
1964
+ "epoch": 0.4049832221236549,
1965
+ "grad_norm": 1.883748173713684,
1966
+ "learning_rate": 0.0001384098623212895,
1967
+ "loss": 4.5074,
1968
+ "num_input_tokens_seen": 214739520,
1969
+ "step": 36750
1970
+ },
1971
+ {
1972
+ "epoch": 0.40663621486701673,
1973
+ "grad_norm": 1.8115665912628174,
1974
+ "learning_rate": 0.000138321700927164,
1975
+ "loss": 4.5187,
1976
+ "num_input_tokens_seen": 215629472,
1977
+ "step": 36900
1978
+ },
1979
+ {
1980
+ "epoch": 0.40828920761037857,
1981
+ "grad_norm": 1.9036102294921875,
1982
+ "learning_rate": 0.000138234127275666,
1983
+ "loss": 4.4892,
1984
+ "num_input_tokens_seen": 216497184,
1985
+ "step": 37050
1986
+ },
1987
+ {
1988
+ "epoch": 0.40994220035374046,
1989
+ "grad_norm": 1.8916597366333008,
1990
+ "learning_rate": 0.0001381459658815405,
1991
+ "loss": 4.5019,
1992
+ "num_input_tokens_seen": 217381152,
1993
+ "step": 37200
1994
+ },
1995
+ {
1996
+ "epoch": 0.4115951930971023,
1997
+ "grad_norm": 1.8847101926803589,
1998
+ "learning_rate": 0.00013805780448741497,
1999
+ "loss": 4.5064,
2000
+ "num_input_tokens_seen": 218251456,
2001
+ "step": 37350
2002
+ },
2003
+ {
2004
+ "epoch": 0.41324818584046413,
2005
+ "grad_norm": 1.730322241783142,
2006
+ "learning_rate": 0.00013796964309328947,
2007
+ "loss": 4.4988,
2008
+ "num_input_tokens_seen": 219141056,
2009
+ "step": 37500
2010
+ },
2011
+ {
2012
+ "epoch": 0.414901178583826,
2013
+ "grad_norm": 1.833764672279358,
2014
+ "learning_rate": 0.00013788148169916395,
2015
+ "loss": 4.5155,
2016
+ "num_input_tokens_seen": 220036992,
2017
+ "step": 37650
2018
+ },
2019
+ {
2020
+ "epoch": 0.41655417132718786,
2021
+ "grad_norm": 1.8188276290893555,
2022
+ "learning_rate": 0.00013779332030503845,
2023
+ "loss": 4.4984,
2024
+ "num_input_tokens_seen": 220910208,
2025
+ "step": 37800
2026
+ },
2027
+ {
2028
+ "epoch": 0.41820716407054975,
2029
+ "grad_norm": 1.7197022438049316,
2030
+ "learning_rate": 0.00013770515891091292,
2031
+ "loss": 4.5018,
2032
+ "num_input_tokens_seen": 221803520,
2033
+ "step": 37950
2034
+ },
2035
+ {
2036
+ "epoch": 0.4198601568139116,
2037
+ "grad_norm": 1.879470944404602,
2038
+ "learning_rate": 0.00013761699751678742,
2039
+ "loss": 4.497,
2040
+ "num_input_tokens_seen": 222679392,
2041
+ "step": 38100
2042
+ },
2043
+ {
2044
+ "epoch": 0.4215131495572734,
2045
+ "grad_norm": 1.716430902481079,
2046
+ "learning_rate": 0.0001375288361226619,
2047
+ "loss": 4.4979,
2048
+ "num_input_tokens_seen": 223568224,
2049
+ "step": 38250
2050
+ },
2051
+ {
2052
+ "epoch": 0.4231661423006353,
2053
+ "grad_norm": 1.8879536390304565,
2054
+ "learning_rate": 0.0001374406747285364,
2055
+ "loss": 4.4972,
2056
+ "num_input_tokens_seen": 224460192,
2057
+ "step": 38400
2058
+ },
2059
+ {
2060
+ "epoch": 0.42481913504399715,
2061
+ "grad_norm": 1.9361037015914917,
2062
+ "learning_rate": 0.00013735251333441088,
2063
+ "loss": 4.4968,
2064
+ "num_input_tokens_seen": 225318816,
2065
+ "step": 38550
2066
+ },
2067
+ {
2068
+ "epoch": 0.426472127787359,
2069
+ "grad_norm": 1.8587005138397217,
2070
+ "learning_rate": 0.00013726435194028535,
2071
+ "loss": 4.4982,
2072
+ "num_input_tokens_seen": 226176704,
2073
+ "step": 38700
2074
+ },
2075
+ {
2076
+ "epoch": 0.4281251205307209,
2077
+ "grad_norm": 1.8819633722305298,
2078
+ "learning_rate": 0.00013717619054615985,
2079
+ "loss": 4.4929,
2080
+ "num_input_tokens_seen": 227049984,
2081
+ "step": 38850
2082
+ },
2083
+ {
2084
+ "epoch": 0.4297781132740827,
2085
+ "grad_norm": 1.8268516063690186,
2086
+ "learning_rate": 0.00013708802915203433,
2087
+ "loss": 4.5063,
2088
+ "num_input_tokens_seen": 227926784,
2089
+ "step": 39000
2090
+ },
2091
+ {
2092
+ "epoch": 0.4314311060174446,
2093
+ "grad_norm": 1.8819466829299927,
2094
+ "learning_rate": 0.00013699986775790883,
2095
+ "loss": 4.5024,
2096
+ "num_input_tokens_seen": 228823584,
2097
+ "step": 39150
2098
+ },
2099
+ {
2100
+ "epoch": 0.43308409876080645,
2101
+ "grad_norm": 1.8801319599151611,
2102
+ "learning_rate": 0.0001369117063637833,
2103
+ "loss": 4.5043,
2104
+ "num_input_tokens_seen": 229691904,
2105
+ "step": 39300
2106
+ },
2107
+ {
2108
+ "epoch": 0.4347370915041683,
2109
+ "grad_norm": 1.8677760362625122,
2110
+ "learning_rate": 0.0001368241327122853,
2111
+ "loss": 4.5102,
2112
+ "num_input_tokens_seen": 230569056,
2113
+ "step": 39450
2114
+ },
2115
+ {
2116
+ "epoch": 0.4363900842475302,
2117
+ "grad_norm": 1.8280800580978394,
2118
+ "learning_rate": 0.0001367359713181598,
2119
+ "loss": 4.5014,
2120
+ "num_input_tokens_seen": 231437792,
2121
+ "step": 39600
2122
+ },
2123
+ {
2124
+ "epoch": 0.438043076990892,
2125
+ "grad_norm": 1.7932913303375244,
2126
+ "learning_rate": 0.00013664780992403427,
2127
+ "loss": 4.5082,
2128
+ "num_input_tokens_seen": 232310848,
2129
+ "step": 39750
2130
+ },
2131
+ {
2132
+ "epoch": 0.43969606973425385,
2133
+ "grad_norm": 1.7872642278671265,
2134
+ "learning_rate": 0.00013655964852990878,
2135
+ "loss": 4.5091,
2136
+ "num_input_tokens_seen": 233190464,
2137
+ "step": 39900
2138
+ },
2139
+ {
2140
+ "epoch": 0.44134906247761574,
2141
+ "grad_norm": 1.9207595586776733,
2142
+ "learning_rate": 0.00013647148713578325,
2143
+ "loss": 4.4932,
2144
+ "num_input_tokens_seen": 234064256,
2145
+ "step": 40050
2146
+ },
2147
+ {
2148
+ "epoch": 0.4430020552209776,
2149
+ "grad_norm": 1.9176338911056519,
2150
+ "learning_rate": 0.00013638332574165773,
2151
+ "loss": 4.4982,
2152
+ "num_input_tokens_seen": 234931104,
2153
+ "step": 40200
2154
+ },
2155
+ {
2156
+ "epoch": 0.4446550479643394,
2157
+ "grad_norm": 1.839328646659851,
2158
+ "learning_rate": 0.00013629516434753223,
2159
+ "loss": 4.5,
2160
+ "num_input_tokens_seen": 235790176,
2161
+ "step": 40350
2162
+ },
2163
+ {
2164
+ "epoch": 0.4463080407077013,
2165
+ "grad_norm": 1.8120410442352295,
2166
+ "learning_rate": 0.0001362070029534067,
2167
+ "loss": 4.4866,
2168
+ "num_input_tokens_seen": 236663488,
2169
+ "step": 40500
2170
+ },
2171
+ {
2172
+ "epoch": 0.44796103345106314,
2173
+ "grad_norm": 1.7365509271621704,
2174
+ "learning_rate": 0.0001361188415592812,
2175
+ "loss": 4.484,
2176
+ "num_input_tokens_seen": 237550016,
2177
+ "step": 40650
2178
+ },
2179
+ {
2180
+ "epoch": 0.44961402619442503,
2181
+ "grad_norm": 1.8573724031448364,
2182
+ "learning_rate": 0.00013603068016515568,
2183
+ "loss": 4.4949,
2184
+ "num_input_tokens_seen": 238428800,
2185
+ "step": 40800
2186
+ },
2187
+ {
2188
+ "epoch": 0.45126701893778687,
2189
+ "grad_norm": 1.8714196681976318,
2190
+ "learning_rate": 0.00013594251877103018,
2191
+ "loss": 4.4861,
2192
+ "num_input_tokens_seen": 239301216,
2193
+ "step": 40950
2194
+ },
2195
+ {
2196
+ "epoch": 0.4529200116811487,
2197
+ "grad_norm": 1.813636064529419,
2198
+ "learning_rate": 0.00013585435737690466,
2199
+ "loss": 4.4939,
2200
+ "num_input_tokens_seen": 240172352,
2201
+ "step": 41100
2202
+ },
2203
+ {
2204
+ "epoch": 0.4545730044245106,
2205
+ "grad_norm": 1.7828494310379028,
2206
+ "learning_rate": 0.00013576619598277913,
2207
+ "loss": 4.5073,
2208
+ "num_input_tokens_seen": 241058368,
2209
+ "step": 41250
2210
+ },
2211
+ {
2212
+ "epoch": 0.45622599716787243,
2213
+ "grad_norm": 1.9278478622436523,
2214
+ "learning_rate": 0.00013567803458865363,
2215
+ "loss": 4.4831,
2216
+ "num_input_tokens_seen": 241944320,
2217
+ "step": 41400
2218
+ },
2219
+ {
2220
+ "epoch": 0.45787898991123427,
2221
+ "grad_norm": 1.8244856595993042,
2222
+ "learning_rate": 0.0001355898731945281,
2223
+ "loss": 4.4889,
2224
+ "num_input_tokens_seen": 242827808,
2225
+ "step": 41550
2226
+ },
2227
+ {
2228
+ "epoch": 0.45953198265459616,
2229
+ "grad_norm": 1.9578852653503418,
2230
+ "learning_rate": 0.0001355017118004026,
2231
+ "loss": 4.4938,
2232
+ "num_input_tokens_seen": 243697248,
2233
+ "step": 41700
2234
+ },
2235
+ {
2236
+ "epoch": 0.461184975397958,
2237
+ "grad_norm": 1.8681639432907104,
2238
+ "learning_rate": 0.0001354135504062771,
2239
+ "loss": 4.493,
2240
+ "num_input_tokens_seen": 244549632,
2241
+ "step": 41850
2242
+ },
2243
+ {
2244
+ "epoch": 0.46283796814131983,
2245
+ "grad_norm": 1.9330469369888306,
2246
+ "learning_rate": 0.0001353253890121516,
2247
+ "loss": 4.4823,
2248
+ "num_input_tokens_seen": 245429600,
2249
+ "step": 42000
2250
+ },
2251
+ {
2252
+ "epoch": 0.4644909608846817,
2253
+ "grad_norm": 1.775819182395935,
2254
+ "learning_rate": 0.00013523722761802606,
2255
+ "loss": 4.4887,
2256
+ "num_input_tokens_seen": 246307872,
2257
+ "step": 42150
2258
+ },
2259
+ {
2260
+ "epoch": 0.46614395362804356,
2261
+ "grad_norm": 1.9359744787216187,
2262
+ "learning_rate": 0.00013514906622390057,
2263
+ "loss": 4.4813,
2264
+ "num_input_tokens_seen": 247194432,
2265
+ "step": 42300
2266
+ },
2267
+ {
2268
+ "epoch": 0.46779694637140545,
2269
+ "grad_norm": 1.8034732341766357,
2270
+ "learning_rate": 0.00013506090482977504,
2271
+ "loss": 4.4957,
2272
+ "num_input_tokens_seen": 248049248,
2273
+ "step": 42450
2274
+ },
2275
+ {
2276
+ "epoch": 0.4694499391147673,
2277
+ "grad_norm": 1.8069413900375366,
2278
+ "learning_rate": 0.00013497274343564954,
2279
+ "loss": 4.484,
2280
+ "num_input_tokens_seen": 248930624,
2281
+ "step": 42600
2282
+ },
2283
+ {
2284
+ "epoch": 0.4711029318581291,
2285
+ "grad_norm": 1.8796617984771729,
2286
+ "learning_rate": 0.00013488458204152402,
2287
+ "loss": 4.4887,
2288
+ "num_input_tokens_seen": 249808512,
2289
+ "step": 42750
2290
+ },
2291
+ {
2292
+ "epoch": 0.472755924601491,
2293
+ "grad_norm": 1.8367630243301392,
2294
+ "learning_rate": 0.00013479642064739852,
2295
+ "loss": 4.4779,
2296
+ "num_input_tokens_seen": 250669088,
2297
+ "step": 42900
2298
+ },
2299
+ {
2300
+ "epoch": 0.47440891734485285,
2301
+ "grad_norm": 1.9085215330123901,
2302
+ "learning_rate": 0.0001347088469959005,
2303
+ "loss": 4.489,
2304
+ "num_input_tokens_seen": 251553792,
2305
+ "step": 43050
2306
+ },
2307
+ {
2308
+ "epoch": 0.4760619100882147,
2309
+ "grad_norm": 1.8323450088500977,
2310
+ "learning_rate": 0.000134620685601775,
2311
+ "loss": 4.482,
2312
+ "num_input_tokens_seen": 252440032,
2313
+ "step": 43200
2314
+ },
2315
+ {
2316
+ "epoch": 0.4777149028315766,
2317
+ "grad_norm": 2.1364452838897705,
2318
+ "learning_rate": 0.0001345325242076495,
2319
+ "loss": 4.4873,
2320
+ "num_input_tokens_seen": 253320576,
2321
+ "step": 43350
2322
+ },
2323
+ {
2324
+ "epoch": 0.4793678955749384,
2325
+ "grad_norm": 1.8431365489959717,
2326
+ "learning_rate": 0.00013444436281352396,
2327
+ "loss": 4.4844,
2328
+ "num_input_tokens_seen": 254203552,
2329
+ "step": 43500
2330
+ },
2331
+ {
2332
+ "epoch": 0.4810208883183003,
2333
+ "grad_norm": 1.853715419769287,
2334
+ "learning_rate": 0.00013435620141939847,
2335
+ "loss": 4.4905,
2336
+ "num_input_tokens_seen": 255078656,
2337
+ "step": 43650
2338
+ },
2339
+ {
2340
+ "epoch": 0.48267388106166215,
2341
+ "grad_norm": 2.0351803302764893,
2342
+ "learning_rate": 0.00013426804002527294,
2343
+ "loss": 4.4854,
2344
+ "num_input_tokens_seen": 255961792,
2345
+ "step": 43800
2346
+ },
2347
+ {
2348
+ "epoch": 0.484326873805024,
2349
+ "grad_norm": 1.867595911026001,
2350
+ "learning_rate": 0.00013417987863114744,
2351
+ "loss": 4.4734,
2352
+ "num_input_tokens_seen": 256823744,
2353
+ "step": 43950
2354
+ },
2355
+ {
2356
+ "epoch": 0.4859798665483859,
2357
+ "grad_norm": 1.9231115579605103,
2358
+ "learning_rate": 0.00013409171723702192,
2359
+ "loss": 4.4982,
2360
+ "num_input_tokens_seen": 257687488,
2361
+ "step": 44100
2362
+ },
2363
+ {
2364
+ "epoch": 0.4876328592917477,
2365
+ "grad_norm": 1.9620647430419922,
2366
+ "learning_rate": 0.00013400355584289642,
2367
+ "loss": 4.4815,
2368
+ "num_input_tokens_seen": 258561568,
2369
+ "step": 44250
2370
+ },
2371
+ {
2372
+ "epoch": 0.48928585203510955,
2373
+ "grad_norm": 1.741739273071289,
2374
+ "learning_rate": 0.0001339153944487709,
2375
+ "loss": 4.4863,
2376
+ "num_input_tokens_seen": 259449344,
2377
+ "step": 44400
2378
+ },
2379
+ {
2380
+ "epoch": 0.49093884477847144,
2381
+ "grad_norm": 1.9869145154953003,
2382
+ "learning_rate": 0.0001338272330546454,
2383
+ "loss": 4.48,
2384
+ "num_input_tokens_seen": 260317888,
2385
+ "step": 44550
2386
+ },
2387
+ {
2388
+ "epoch": 0.4925918375218333,
2389
+ "grad_norm": 1.9364657402038574,
2390
+ "learning_rate": 0.00013373907166051987,
2391
+ "loss": 4.477,
2392
+ "num_input_tokens_seen": 261198656,
2393
+ "step": 44700
2394
+ },
2395
+ {
2396
+ "epoch": 0.4942448302651951,
2397
+ "grad_norm": 1.946509599685669,
2398
+ "learning_rate": 0.00013365091026639437,
2399
+ "loss": 4.4762,
2400
+ "num_input_tokens_seen": 262082432,
2401
+ "step": 44850
2402
+ },
2403
+ {
2404
+ "epoch": 0.495897823008557,
2405
+ "grad_norm": 1.7882816791534424,
2406
+ "learning_rate": 0.00013356274887226885,
2407
+ "loss": 4.478,
2408
+ "num_input_tokens_seen": 262961504,
2409
+ "step": 45000
2410
+ },
2411
+ {
2412
+ "epoch": 0.49755081575191884,
2413
+ "grad_norm": 1.8857409954071045,
2414
+ "learning_rate": 0.00013347458747814332,
2415
+ "loss": 4.48,
2416
+ "num_input_tokens_seen": 263810336,
2417
+ "step": 45150
2418
+ },
2419
+ {
2420
+ "epoch": 0.49920380849528073,
2421
+ "grad_norm": 1.8983428478240967,
2422
+ "learning_rate": 0.00013338642608401783,
2423
+ "loss": 4.4792,
2424
+ "num_input_tokens_seen": 264684512,
2425
+ "step": 45300
2426
+ },
2427
+ {
2428
+ "epoch": 0.5008568012386425,
2429
+ "grad_norm": 1.8649096488952637,
2430
+ "learning_rate": 0.0001332982646898923,
2431
+ "loss": 4.473,
2432
+ "num_input_tokens_seen": 265553056,
2433
+ "step": 45450
2434
+ },
2435
+ {
2436
+ "epoch": 0.5025097939820045,
2437
+ "grad_norm": 1.8233070373535156,
2438
+ "learning_rate": 0.0001332101032957668,
2439
+ "loss": 4.4854,
2440
+ "num_input_tokens_seen": 266438976,
2441
+ "step": 45600
2442
+ },
2443
+ {
2444
+ "epoch": 0.5041627867253663,
2445
+ "grad_norm": 1.8780871629714966,
2446
+ "learning_rate": 0.00013312194190164128,
2447
+ "loss": 4.4806,
2448
+ "num_input_tokens_seen": 267305760,
2449
+ "step": 45750
2450
+ },
2451
+ {
2452
+ "epoch": 0.5058157794687281,
2453
+ "grad_norm": 1.9368178844451904,
2454
+ "learning_rate": 0.00013303378050751578,
2455
+ "loss": 4.4805,
2456
+ "num_input_tokens_seen": 268185792,
2457
+ "step": 45900
2458
+ },
2459
+ {
2460
+ "epoch": 0.50746877221209,
2461
+ "grad_norm": 1.8207215070724487,
2462
+ "learning_rate": 0.00013294561911339026,
2463
+ "loss": 4.4839,
2464
+ "num_input_tokens_seen": 269074432,
2465
+ "step": 46050
2466
+ },
2467
+ {
2468
+ "epoch": 0.5091217649554518,
2469
+ "grad_norm": 1.9221956729888916,
2470
+ "learning_rate": 0.00013285745771926473,
2471
+ "loss": 4.4724,
2472
+ "num_input_tokens_seen": 269945728,
2473
+ "step": 46200
2474
+ },
2475
+ {
2476
+ "epoch": 0.5107747576988138,
2477
+ "grad_norm": 1.8592416048049927,
2478
+ "learning_rate": 0.00013276929632513923,
2479
+ "loss": 4.4801,
2480
+ "num_input_tokens_seen": 270826912,
2481
+ "step": 46350
2482
+ },
2483
+ {
2484
+ "epoch": 0.5124277504421756,
2485
+ "grad_norm": 1.8793606758117676,
2486
+ "learning_rate": 0.0001326811349310137,
2487
+ "loss": 4.4832,
2488
+ "num_input_tokens_seen": 271696640,
2489
+ "step": 46500
2490
+ },
2491
+ {
2492
+ "epoch": 0.5140807431855374,
2493
+ "grad_norm": 1.9293113946914673,
2494
+ "learning_rate": 0.0001325929735368882,
2495
+ "loss": 4.4703,
2496
+ "num_input_tokens_seen": 272577280,
2497
+ "step": 46650
2498
+ },
2499
+ {
2500
+ "epoch": 0.5157337359288993,
2501
+ "grad_norm": 1.8582684993743896,
2502
+ "learning_rate": 0.00013250481214276269,
2503
+ "loss": 4.4785,
2504
+ "num_input_tokens_seen": 273448128,
2505
+ "step": 46800
2506
+ },
2507
+ {
2508
+ "epoch": 0.5173867286722611,
2509
+ "grad_norm": 1.8803818225860596,
2510
+ "learning_rate": 0.0001324166507486372,
2511
+ "loss": 4.4851,
2512
+ "num_input_tokens_seen": 274313472,
2513
+ "step": 46950
2514
+ },
2515
+ {
2516
+ "epoch": 0.5190397214156229,
2517
+ "grad_norm": 1.989717721939087,
2518
+ "learning_rate": 0.00013232907709713918,
2519
+ "loss": 4.4814,
2520
+ "num_input_tokens_seen": 275168416,
2521
+ "step": 47100
2522
+ },
2523
+ {
2524
+ "epoch": 0.5206927141589849,
2525
+ "grad_norm": 1.9770921468734741,
2526
+ "learning_rate": 0.00013224091570301365,
2527
+ "loss": 4.4825,
2528
+ "num_input_tokens_seen": 276035776,
2529
+ "step": 47250
2530
+ },
2531
+ {
2532
+ "epoch": 0.5223457069023467,
2533
+ "grad_norm": 1.8764408826828003,
2534
+ "learning_rate": 0.00013215275430888816,
2535
+ "loss": 4.4788,
2536
+ "num_input_tokens_seen": 276909952,
2537
+ "step": 47400
2538
+ },
2539
+ {
2540
+ "epoch": 0.5239986996457086,
2541
+ "grad_norm": 1.860009789466858,
2542
+ "learning_rate": 0.00013206459291476263,
2543
+ "loss": 4.4834,
2544
+ "num_input_tokens_seen": 277795232,
2545
+ "step": 47550
2546
+ },
2547
+ {
2548
+ "epoch": 0.5256516923890704,
2549
+ "grad_norm": 1.949278473854065,
2550
+ "learning_rate": 0.00013197643152063713,
2551
+ "loss": 4.4752,
2552
+ "num_input_tokens_seen": 278690720,
2553
+ "step": 47700
2554
+ },
2555
+ {
2556
+ "epoch": 0.5273046851324322,
2557
+ "grad_norm": 1.868780255317688,
2558
+ "learning_rate": 0.0001318882701265116,
2559
+ "loss": 4.4797,
2560
+ "num_input_tokens_seen": 279567232,
2561
+ "step": 47850
2562
+ },
2563
+ {
2564
+ "epoch": 0.5289576778757942,
2565
+ "grad_norm": 1.7434320449829102,
2566
+ "learning_rate": 0.0001318001087323861,
2567
+ "loss": 4.4726,
2568
+ "num_input_tokens_seen": 280424832,
2569
+ "step": 48000
2570
+ },
2571
+ {
2572
+ "epoch": 0.530610670619156,
2573
+ "grad_norm": 1.8644661903381348,
2574
+ "learning_rate": 0.00013171194733826059,
2575
+ "loss": 4.4648,
2576
+ "num_input_tokens_seen": 281294752,
2577
+ "step": 48150
2578
+ },
2579
+ {
2580
+ "epoch": 0.5322636633625178,
2581
+ "grad_norm": 1.9029775857925415,
2582
+ "learning_rate": 0.00013162378594413506,
2583
+ "loss": 4.4791,
2584
+ "num_input_tokens_seen": 282156544,
2585
+ "step": 48300
2586
+ },
2587
+ {
2588
+ "epoch": 0.5339166561058797,
2589
+ "grad_norm": 1.7862669229507446,
2590
+ "learning_rate": 0.00013153562455000956,
2591
+ "loss": 4.4688,
2592
+ "num_input_tokens_seen": 283033248,
2593
+ "step": 48450
2594
+ },
2595
+ {
2596
+ "epoch": 0.5355696488492415,
2597
+ "grad_norm": 1.7603411674499512,
2598
+ "learning_rate": 0.00013144746315588404,
2599
+ "loss": 4.4807,
2600
+ "num_input_tokens_seen": 283922048,
2601
+ "step": 48600
2602
+ },
2603
+ {
2604
+ "epoch": 0.5372226415926034,
2605
+ "grad_norm": 1.868235468864441,
2606
+ "learning_rate": 0.00013135930176175854,
2607
+ "loss": 4.4702,
2608
+ "num_input_tokens_seen": 284800416,
2609
+ "step": 48750
2610
+ },
2611
+ {
2612
+ "epoch": 0.5388756343359653,
2613
+ "grad_norm": 1.864640474319458,
2614
+ "learning_rate": 0.00013127114036763301,
2615
+ "loss": 4.479,
2616
+ "num_input_tokens_seen": 285673888,
2617
+ "step": 48900
2618
+ },
2619
+ {
2620
+ "epoch": 0.5405286270793271,
2621
+ "grad_norm": 1.7705098390579224,
2622
+ "learning_rate": 0.00013118297897350752,
2623
+ "loss": 4.4782,
2624
+ "num_input_tokens_seen": 286553856,
2625
+ "step": 49050
2626
+ },
2627
+ {
2628
+ "epoch": 0.542181619822689,
2629
+ "grad_norm": 1.9763901233673096,
2630
+ "learning_rate": 0.000131094817579382,
2631
+ "loss": 4.4654,
2632
+ "num_input_tokens_seen": 287436736,
2633
+ "step": 49200
2634
+ },
2635
+ {
2636
+ "epoch": 0.5438346125660508,
2637
+ "grad_norm": 1.905661702156067,
2638
+ "learning_rate": 0.0001310066561852565,
2639
+ "loss": 4.4718,
2640
+ "num_input_tokens_seen": 288306976,
2641
+ "step": 49350
2642
+ },
2643
+ {
2644
+ "epoch": 0.5454876053094126,
2645
+ "grad_norm": 1.8861178159713745,
2646
+ "learning_rate": 0.00013091849479113097,
2647
+ "loss": 4.4618,
2648
+ "num_input_tokens_seen": 289176352,
2649
+ "step": 49500
2650
+ },
2651
+ {
2652
+ "epoch": 0.5471405980527746,
2653
+ "grad_norm": 1.8697651624679565,
2654
+ "learning_rate": 0.00013083033339700547,
2655
+ "loss": 4.4725,
2656
+ "num_input_tokens_seen": 290061760,
2657
+ "step": 49650
2658
+ },
2659
+ {
2660
+ "epoch": 0.5487935907961364,
2661
+ "grad_norm": 1.8051494359970093,
2662
+ "learning_rate": 0.00013074217200287995,
2663
+ "loss": 4.459,
2664
+ "num_input_tokens_seen": 290924640,
2665
+ "step": 49800
2666
+ },
2667
+ {
2668
+ "epoch": 0.5504465835394983,
2669
+ "grad_norm": 1.7766984701156616,
2670
+ "learning_rate": 0.00013065401060875445,
2671
+ "loss": 4.4729,
2672
+ "num_input_tokens_seen": 291801280,
2673
+ "step": 49950
2674
+ },
2675
+ {
2676
+ "epoch": 0.5520995762828601,
2677
+ "grad_norm": 1.7969352006912231,
2678
+ "learning_rate": 0.00013056584921462892,
2679
+ "loss": 4.4789,
2680
+ "num_input_tokens_seen": 292673632,
2681
+ "step": 50100
2682
+ },
2683
+ {
2684
+ "epoch": 0.5537525690262219,
2685
+ "grad_norm": 1.8939694166183472,
2686
+ "learning_rate": 0.00013047768782050343,
2687
+ "loss": 4.4674,
2688
+ "num_input_tokens_seen": 293542176,
2689
+ "step": 50250
2690
+ },
2691
+ {
2692
+ "epoch": 0.5554055617695838,
2693
+ "grad_norm": 1.8721749782562256,
2694
+ "learning_rate": 0.0001303895264263779,
2695
+ "loss": 4.4591,
2696
+ "num_input_tokens_seen": 294415424,
2697
+ "step": 50400
2698
+ },
2699
+ {
2700
+ "epoch": 0.5570585545129457,
2701
+ "grad_norm": 1.7810068130493164,
2702
+ "learning_rate": 0.0001303013650322524,
2703
+ "loss": 4.475,
2704
+ "num_input_tokens_seen": 295292384,
2705
+ "step": 50550
2706
+ },
2707
+ {
2708
+ "epoch": 0.5587115472563076,
2709
+ "grad_norm": 1.8184670209884644,
2710
+ "learning_rate": 0.00013021320363812688,
2711
+ "loss": 4.473,
2712
+ "num_input_tokens_seen": 296168640,
2713
+ "step": 50700
2714
+ },
2715
+ {
2716
+ "epoch": 0.5603645399996694,
2717
+ "grad_norm": 1.8167736530303955,
2718
+ "learning_rate": 0.00013012504224400138,
2719
+ "loss": 4.4604,
2720
+ "num_input_tokens_seen": 297035904,
2721
+ "step": 50850
2722
+ },
2723
+ {
2724
+ "epoch": 0.5620175327430312,
2725
+ "grad_norm": 1.902001142501831,
2726
+ "learning_rate": 0.00013003688084987585,
2727
+ "loss": 4.4653,
2728
+ "num_input_tokens_seen": 297893248,
2729
+ "step": 51000
2730
+ },
2731
+ {
2732
+ "epoch": 0.5636705254863931,
2733
+ "grad_norm": 1.7989624738693237,
2734
+ "learning_rate": 0.00012994930719837785,
2735
+ "loss": 4.4581,
2736
+ "num_input_tokens_seen": 298773088,
2737
+ "step": 51150
2738
+ },
2739
+ {
2740
+ "epoch": 0.565323518229755,
2741
+ "grad_norm": 1.9608672857284546,
2742
+ "learning_rate": 0.00012986114580425232,
2743
+ "loss": 4.4553,
2744
+ "num_input_tokens_seen": 299639712,
2745
+ "step": 51300
2746
+ },
2747
+ {
2748
+ "epoch": 0.5669765109731169,
2749
+ "grad_norm": 1.8063366413116455,
2750
+ "learning_rate": 0.00012977298441012682,
2751
+ "loss": 4.4748,
2752
+ "num_input_tokens_seen": 300504576,
2753
+ "step": 51450
2754
+ },
2755
+ {
2756
+ "epoch": 0.5686295037164787,
2757
+ "grad_norm": 1.7892522811889648,
2758
+ "learning_rate": 0.0001296848230160013,
2759
+ "loss": 4.4759,
2760
+ "num_input_tokens_seen": 301381600,
2761
+ "step": 51600
2762
+ },
2763
+ {
2764
+ "epoch": 0.5702824964598405,
2765
+ "grad_norm": 1.9009445905685425,
2766
+ "learning_rate": 0.00012959666162187577,
2767
+ "loss": 4.4643,
2768
+ "num_input_tokens_seen": 302264064,
2769
+ "step": 51750
2770
+ },
2771
+ {
2772
+ "epoch": 0.5719354892032024,
2773
+ "grad_norm": 1.8855173587799072,
2774
+ "learning_rate": 0.00012950850022775027,
2775
+ "loss": 4.4569,
2776
+ "num_input_tokens_seen": 303131648,
2777
+ "step": 51900
2778
+ },
2779
+ {
2780
+ "epoch": 0.5735884819465642,
2781
+ "grad_norm": 1.8198268413543701,
2782
+ "learning_rate": 0.00012942033883362475,
2783
+ "loss": 4.465,
2784
+ "num_input_tokens_seen": 304010848,
2785
+ "step": 52050
2786
+ },
2787
+ {
2788
+ "epoch": 0.5752414746899261,
2789
+ "grad_norm": 1.8278980255126953,
2790
+ "learning_rate": 0.00012933217743949925,
2791
+ "loss": 4.4674,
2792
+ "num_input_tokens_seen": 304901952,
2793
+ "step": 52200
2794
+ },
2795
+ {
2796
+ "epoch": 0.576894467433288,
2797
+ "grad_norm": 1.8281151056289673,
2798
+ "learning_rate": 0.00012924401604537373,
2799
+ "loss": 4.4718,
2800
+ "num_input_tokens_seen": 305763424,
2801
+ "step": 52350
2802
+ },
2803
+ {
2804
+ "epoch": 0.5785474601766498,
2805
+ "grad_norm": 1.8816980123519897,
2806
+ "learning_rate": 0.00012915585465124823,
2807
+ "loss": 4.4726,
2808
+ "num_input_tokens_seen": 306641856,
2809
+ "step": 52500
2810
+ },
2811
+ {
2812
+ "epoch": 0.5802004529200117,
2813
+ "grad_norm": 1.8755218982696533,
2814
+ "learning_rate": 0.0001290676932571227,
2815
+ "loss": 4.4744,
2816
+ "num_input_tokens_seen": 307520288,
2817
+ "step": 52650
2818
+ },
2819
+ {
2820
+ "epoch": 0.5818534456633735,
2821
+ "grad_norm": 1.8329659700393677,
2822
+ "learning_rate": 0.0001289795318629972,
2823
+ "loss": 4.4572,
2824
+ "num_input_tokens_seen": 308410624,
2825
+ "step": 52800
2826
+ },
2827
+ {
2828
+ "epoch": 0.5835064384067354,
2829
+ "grad_norm": 1.9654128551483154,
2830
+ "learning_rate": 0.00012889137046887168,
2831
+ "loss": 4.4671,
2832
+ "num_input_tokens_seen": 309288736,
2833
+ "step": 52950
2834
+ },
2835
+ {
2836
+ "epoch": 0.5851594311500973,
2837
+ "grad_norm": 1.8310860395431519,
2838
+ "learning_rate": 0.00012880320907474618,
2839
+ "loss": 4.4684,
2840
+ "num_input_tokens_seen": 310165440,
2841
+ "step": 53100
2842
+ },
2843
+ {
2844
+ "epoch": 0.5868124238934591,
2845
+ "grad_norm": 1.8104560375213623,
2846
+ "learning_rate": 0.00012871563542324817,
2847
+ "loss": 4.4546,
2848
+ "num_input_tokens_seen": 311044352,
2849
+ "step": 53250
2850
+ },
2851
+ {
2852
+ "epoch": 0.588465416636821,
2853
+ "grad_norm": 1.8414585590362549,
2854
+ "learning_rate": 0.00012862747402912265,
2855
+ "loss": 4.4625,
2856
+ "num_input_tokens_seen": 311919424,
2857
+ "step": 53400
2858
+ },
2859
+ {
2860
+ "epoch": 0.5901184093801828,
2861
+ "grad_norm": 1.724381685256958,
2862
+ "learning_rate": 0.00012853931263499715,
2863
+ "loss": 4.4632,
2864
+ "num_input_tokens_seen": 312803712,
2865
+ "step": 53550
2866
+ },
2867
+ {
2868
+ "epoch": 0.5917714021235447,
2869
+ "grad_norm": 1.7701301574707031,
2870
+ "learning_rate": 0.00012845115124087163,
2871
+ "loss": 4.4609,
2872
+ "num_input_tokens_seen": 313672800,
2873
+ "step": 53700
2874
+ },
2875
+ {
2876
+ "epoch": 0.5934243948669066,
2877
+ "grad_norm": 1.8755768537521362,
2878
+ "learning_rate": 0.00012836298984674613,
2879
+ "loss": 4.461,
2880
+ "num_input_tokens_seen": 314543776,
2881
+ "step": 53850
2882
+ },
2883
+ {
2884
+ "epoch": 0.5950773876102684,
2885
+ "grad_norm": 1.8842816352844238,
2886
+ "learning_rate": 0.0001282748284526206,
2887
+ "loss": 4.462,
2888
+ "num_input_tokens_seen": 315413216,
2889
+ "step": 54000
2890
+ },
2891
+ {
2892
+ "epoch": 0.5967303803536302,
2893
+ "grad_norm": 1.8173580169677734,
2894
+ "learning_rate": 0.0001281866670584951,
2895
+ "loss": 4.4595,
2896
+ "num_input_tokens_seen": 316286592,
2897
+ "step": 54150
2898
+ },
2899
+ {
2900
+ "epoch": 0.5983833730969921,
2901
+ "grad_norm": 1.8613582849502563,
2902
+ "learning_rate": 0.00012809850566436958,
2903
+ "loss": 4.4729,
2904
+ "num_input_tokens_seen": 317171968,
2905
+ "step": 54300
2906
+ },
2907
+ {
2908
+ "epoch": 0.6000363658403539,
2909
+ "grad_norm": 1.8345390558242798,
2910
+ "learning_rate": 0.00012801034427024408,
2911
+ "loss": 4.4558,
2912
+ "num_input_tokens_seen": 318069504,
2913
+ "step": 54450
2914
+ },
2915
+ {
2916
+ "epoch": 0.6016893585837159,
2917
+ "grad_norm": 1.9001188278198242,
2918
+ "learning_rate": 0.00012792218287611856,
2919
+ "loss": 4.4493,
2920
+ "num_input_tokens_seen": 318929088,
2921
+ "step": 54600
2922
+ },
2923
+ {
2924
+ "epoch": 0.6033423513270777,
2925
+ "grad_norm": 1.7820019721984863,
2926
+ "learning_rate": 0.00012783402148199306,
2927
+ "loss": 4.4529,
2928
+ "num_input_tokens_seen": 319802400,
2929
+ "step": 54750
2930
+ },
2931
+ {
2932
+ "epoch": 0.6049953440704395,
2933
+ "grad_norm": 1.8836514949798584,
2934
+ "learning_rate": 0.00012774586008786754,
2935
+ "loss": 4.4601,
2936
+ "num_input_tokens_seen": 320667168,
2937
+ "step": 54900
2938
+ },
2939
+ {
2940
+ "epoch": 0.6066483368138014,
2941
+ "grad_norm": 1.820078730583191,
2942
+ "learning_rate": 0.00012765828643636953,
2943
+ "loss": 4.4583,
2944
+ "num_input_tokens_seen": 321544160,
2945
+ "step": 55050
2946
+ },
2947
+ {
2948
+ "epoch": 0.6083013295571632,
2949
+ "grad_norm": 1.7549668550491333,
2950
+ "learning_rate": 0.000127570125042244,
2951
+ "loss": 4.4507,
2952
+ "num_input_tokens_seen": 322413504,
2953
+ "step": 55200
2954
+ },
2955
+ {
2956
+ "epoch": 0.6099543223005252,
2957
+ "grad_norm": 1.819643497467041,
2958
+ "learning_rate": 0.0001274819636481185,
2959
+ "loss": 4.4572,
2960
+ "num_input_tokens_seen": 323301120,
2961
+ "step": 55350
2962
+ },
2963
+ {
2964
+ "epoch": 0.611607315043887,
2965
+ "grad_norm": 1.8832948207855225,
2966
+ "learning_rate": 0.00012739380225399298,
2967
+ "loss": 4.4499,
2968
+ "num_input_tokens_seen": 324179040,
2969
+ "step": 55500
2970
+ },
2971
+ {
2972
+ "epoch": 0.6132603077872488,
2973
+ "grad_norm": 1.9329428672790527,
2974
+ "learning_rate": 0.00012730564085986748,
2975
+ "loss": 4.4598,
2976
+ "num_input_tokens_seen": 325044704,
2977
+ "step": 55650
2978
+ },
2979
+ {
2980
+ "epoch": 0.6149133005306107,
2981
+ "grad_norm": 1.8368948698043823,
2982
+ "learning_rate": 0.00012721747946574196,
2983
+ "loss": 4.4636,
2984
+ "num_input_tokens_seen": 325903360,
2985
+ "step": 55800
2986
+ },
2987
+ {
2988
+ "epoch": 0.6165662932739725,
2989
+ "grad_norm": 1.8610767126083374,
2990
+ "learning_rate": 0.00012712931807161646,
2991
+ "loss": 4.4602,
2992
+ "num_input_tokens_seen": 326791648,
2993
+ "step": 55950
2994
+ },
2995
+ {
2996
+ "epoch": 0.6182192860173343,
2997
+ "grad_norm": 1.853129506111145,
2998
+ "learning_rate": 0.00012704115667749093,
2999
+ "loss": 4.4581,
3000
+ "num_input_tokens_seen": 327669216,
3001
+ "step": 56100
3002
+ },
3003
+ {
3004
+ "epoch": 0.6198722787606963,
3005
+ "grad_norm": 1.8630894422531128,
3006
+ "learning_rate": 0.0001269529952833654,
3007
+ "loss": 4.4584,
3008
+ "num_input_tokens_seen": 328547424,
3009
+ "step": 56250
3010
+ },
3011
+ {
3012
+ "epoch": 0.6215252715040581,
3013
+ "grad_norm": 1.8581258058547974,
3014
+ "learning_rate": 0.0001268648338892399,
3015
+ "loss": 4.4632,
3016
+ "num_input_tokens_seen": 329414400,
3017
+ "step": 56400
3018
+ },
3019
+ {
3020
+ "epoch": 0.62317826424742,
3021
+ "grad_norm": 1.8294817209243774,
3022
+ "learning_rate": 0.00012677667249511439,
3023
+ "loss": 4.4634,
3024
+ "num_input_tokens_seen": 330297376,
3025
+ "step": 56550
3026
+ },
3027
+ {
3028
+ "epoch": 0.6248312569907818,
3029
+ "grad_norm": 1.9625203609466553,
3030
+ "learning_rate": 0.0001266885111009889,
3031
+ "loss": 4.4598,
3032
+ "num_input_tokens_seen": 331181792,
3033
+ "step": 56700
3034
+ },
3035
+ {
3036
+ "epoch": 0.6264842497341436,
3037
+ "grad_norm": 1.821718454360962,
3038
+ "learning_rate": 0.00012660034970686336,
3039
+ "loss": 4.4549,
3040
+ "num_input_tokens_seen": 332056416,
3041
+ "step": 56850
3042
+ },
3043
+ {
3044
+ "epoch": 0.6281372424775056,
3045
+ "grad_norm": 1.8366010189056396,
3046
+ "learning_rate": 0.00012651218831273786,
3047
+ "loss": 4.4418,
3048
+ "num_input_tokens_seen": 332908864,
3049
+ "step": 57000
3050
+ },
3051
+ {
3052
+ "epoch": 0.6297902352208674,
3053
+ "grad_norm": 1.858789086341858,
3054
+ "learning_rate": 0.00012642402691861234,
3055
+ "loss": 4.4592,
3056
+ "num_input_tokens_seen": 333791520,
3057
+ "step": 57150
3058
+ },
3059
+ {
3060
+ "epoch": 0.6314432279642292,
3061
+ "grad_norm": 1.9188382625579834,
3062
+ "learning_rate": 0.00012633586552448684,
3063
+ "loss": 4.4531,
3064
+ "num_input_tokens_seen": 334675488,
3065
+ "step": 57300
3066
+ },
3067
+ {
3068
+ "epoch": 0.6330962207075911,
3069
+ "grad_norm": 1.8480638265609741,
3070
+ "learning_rate": 0.00012624770413036132,
3071
+ "loss": 4.4557,
3072
+ "num_input_tokens_seen": 335569664,
3073
+ "step": 57450
3074
+ },
3075
+ {
3076
+ "epoch": 0.6347492134509529,
3077
+ "grad_norm": 1.8409630060195923,
3078
+ "learning_rate": 0.00012615954273623582,
3079
+ "loss": 4.454,
3080
+ "num_input_tokens_seen": 336438752,
3081
+ "step": 57600
3082
+ },
3083
+ {
3084
+ "epoch": 0.6364022061943148,
3085
+ "grad_norm": 1.7734564542770386,
3086
+ "learning_rate": 0.0001260713813421103,
3087
+ "loss": 4.4587,
3088
+ "num_input_tokens_seen": 337318464,
3089
+ "step": 57750
3090
+ },
3091
+ {
3092
+ "epoch": 0.6380551989376767,
3093
+ "grad_norm": 1.8258734941482544,
3094
+ "learning_rate": 0.0001259832199479848,
3095
+ "loss": 4.4501,
3096
+ "num_input_tokens_seen": 338196416,
3097
+ "step": 57900
3098
+ },
3099
+ {
3100
+ "epoch": 0.6397081916810385,
3101
+ "grad_norm": 1.8730100393295288,
3102
+ "learning_rate": 0.00012589505855385927,
3103
+ "loss": 4.4508,
3104
+ "num_input_tokens_seen": 339066272,
3105
+ "step": 58050
3106
+ },
3107
+ {
3108
+ "epoch": 0.6413611844244004,
3109
+ "grad_norm": 1.7968626022338867,
3110
+ "learning_rate": 0.00012580689715973375,
3111
+ "loss": 4.4465,
3112
+ "num_input_tokens_seen": 339940576,
3113
+ "step": 58200
3114
+ },
3115
+ {
3116
+ "epoch": 0.6430141771677622,
3117
+ "grad_norm": 1.8305721282958984,
3118
+ "learning_rate": 0.00012571873576560825,
3119
+ "loss": 4.4452,
3120
+ "num_input_tokens_seen": 340827872,
3121
+ "step": 58350
3122
+ },
3123
+ {
3124
+ "epoch": 0.644667169911124,
3125
+ "grad_norm": 1.8106398582458496,
3126
+ "learning_rate": 0.00012563057437148272,
3127
+ "loss": 4.4436,
3128
+ "num_input_tokens_seen": 341710720,
3129
+ "step": 58500
3130
+ },
3131
+ {
3132
+ "epoch": 0.646320162654486,
3133
+ "grad_norm": 1.8428856134414673,
3134
+ "learning_rate": 0.00012554300071998474,
3135
+ "loss": 4.4607,
3136
+ "num_input_tokens_seen": 342592992,
3137
+ "step": 58650
3138
+ },
3139
+ {
3140
+ "epoch": 0.6479731553978478,
3141
+ "grad_norm": 1.89970064163208,
3142
+ "learning_rate": 0.00012545483932585922,
3143
+ "loss": 4.446,
3144
+ "num_input_tokens_seen": 343458496,
3145
+ "step": 58800
3146
+ },
3147
+ {
3148
+ "epoch": 0.6496261481412097,
3149
+ "grad_norm": 1.887024998664856,
3150
+ "learning_rate": 0.00012536667793173372,
3151
+ "loss": 4.4438,
3152
+ "num_input_tokens_seen": 344324192,
3153
+ "step": 58950
3154
+ },
3155
+ {
3156
+ "epoch": 0.6512791408845715,
3157
+ "grad_norm": 1.751080870628357,
3158
+ "learning_rate": 0.0001252785165376082,
3159
+ "loss": 4.4439,
3160
+ "num_input_tokens_seen": 345204320,
3161
+ "step": 59100
3162
+ },
3163
+ {
3164
+ "epoch": 0.6529321336279333,
3165
+ "grad_norm": 1.8455328941345215,
3166
+ "learning_rate": 0.0001251903551434827,
3167
+ "loss": 4.4446,
3168
+ "num_input_tokens_seen": 346100960,
3169
+ "step": 59250
3170
+ },
3171
+ {
3172
+ "epoch": 0.6545851263712952,
3173
+ "grad_norm": 1.9079509973526,
3174
+ "learning_rate": 0.00012510219374935717,
3175
+ "loss": 4.4441,
3176
+ "num_input_tokens_seen": 346981376,
3177
+ "step": 59400
3178
+ },
3179
+ {
3180
+ "epoch": 0.6562381191146571,
3181
+ "grad_norm": 1.8034120798110962,
3182
+ "learning_rate": 0.00012501403235523167,
3183
+ "loss": 4.4555,
3184
+ "num_input_tokens_seen": 347855584,
3185
+ "step": 59550
3186
+ },
3187
+ {
3188
+ "epoch": 0.657891111858019,
3189
+ "grad_norm": 1.7936707735061646,
3190
+ "learning_rate": 0.00012492587096110615,
3191
+ "loss": 4.4493,
3192
+ "num_input_tokens_seen": 348733664,
3193
+ "step": 59700
3194
+ },
3195
+ {
3196
+ "epoch": 0.6595441046013808,
3197
+ "grad_norm": 1.80596923828125,
3198
+ "learning_rate": 0.00012483770956698065,
3199
+ "loss": 4.4445,
3200
+ "num_input_tokens_seen": 349592128,
3201
+ "step": 59850
3202
+ },
3203
+ {
3204
+ "epoch": 0.6611970973447426,
3205
+ "grad_norm": 1.7837984561920166,
3206
+ "learning_rate": 0.00012474954817285512,
3207
+ "loss": 4.4472,
3208
+ "num_input_tokens_seen": 350464672,
3209
+ "step": 60000
3210
+ },
3211
+ {
3212
+ "epoch": 0.6628500900881045,
3213
+ "grad_norm": 1.8550629615783691,
3214
+ "learning_rate": 0.0001246613867787296,
3215
+ "loss": 4.4436,
3216
+ "num_input_tokens_seen": 351360576,
3217
+ "step": 60150
3218
+ },
3219
+ {
3220
+ "epoch": 0.6645030828314664,
3221
+ "grad_norm": 1.8099464178085327,
3222
+ "learning_rate": 0.0001245738131272316,
3223
+ "loss": 4.4439,
3224
+ "num_input_tokens_seen": 352250944,
3225
+ "step": 60300
3226
+ },
3227
+ {
3228
+ "epoch": 0.6661560755748283,
3229
+ "grad_norm": 1.869233250617981,
3230
+ "learning_rate": 0.0001244856517331061,
3231
+ "loss": 4.45,
3232
+ "num_input_tokens_seen": 353120608,
3233
+ "step": 60450
3234
+ },
3235
+ {
3236
+ "epoch": 0.6678090683181901,
3237
+ "grad_norm": 1.8628960847854614,
3238
+ "learning_rate": 0.00012439749033898057,
3239
+ "loss": 4.4446,
3240
+ "num_input_tokens_seen": 353986944,
3241
+ "step": 60600
3242
+ },
3243
+ {
3244
+ "epoch": 0.6694620610615519,
3245
+ "grad_norm": 1.7935791015625,
3246
+ "learning_rate": 0.00012430932894485504,
3247
+ "loss": 4.4481,
3248
+ "num_input_tokens_seen": 354851456,
3249
+ "step": 60750
3250
+ },
3251
+ {
3252
+ "epoch": 0.6711150538049138,
3253
+ "grad_norm": 1.919735074043274,
3254
+ "learning_rate": 0.00012422116755072955,
3255
+ "loss": 4.4491,
3256
+ "num_input_tokens_seen": 355735616,
3257
+ "step": 60900
3258
+ },
3259
+ {
3260
+ "epoch": 0.6727680465482756,
3261
+ "grad_norm": 1.9296785593032837,
3262
+ "learning_rate": 0.00012413300615660402,
3263
+ "loss": 4.4384,
3264
+ "num_input_tokens_seen": 356613408,
3265
+ "step": 61050
3266
+ },
3267
+ {
3268
+ "epoch": 0.6744210392916375,
3269
+ "grad_norm": 1.8167061805725098,
3270
+ "learning_rate": 0.00012404484476247852,
3271
+ "loss": 4.4326,
3272
+ "num_input_tokens_seen": 357463840,
3273
+ "step": 61200
3274
+ },
3275
+ {
3276
+ "epoch": 0.6760740320349994,
3277
+ "grad_norm": 1.86695396900177,
3278
+ "learning_rate": 0.000123956683368353,
3279
+ "loss": 4.4501,
3280
+ "num_input_tokens_seen": 358354464,
3281
+ "step": 61350
3282
+ },
3283
+ {
3284
+ "epoch": 0.6777270247783612,
3285
+ "grad_norm": 1.8627629280090332,
3286
+ "learning_rate": 0.0001238685219742275,
3287
+ "loss": 4.4496,
3288
+ "num_input_tokens_seen": 359222016,
3289
+ "step": 61500
3290
+ },
3291
+ {
3292
+ "epoch": 0.6793800175217231,
3293
+ "grad_norm": 1.8496758937835693,
3294
+ "learning_rate": 0.00012378036058010197,
3295
+ "loss": 4.4505,
3296
+ "num_input_tokens_seen": 360112096,
3297
+ "step": 61650
3298
+ },
3299
+ {
3300
+ "epoch": 0.6810330102650849,
3301
+ "grad_norm": 1.8193156719207764,
3302
+ "learning_rate": 0.00012369219918597645,
3303
+ "loss": 4.452,
3304
+ "num_input_tokens_seen": 360995520,
3305
+ "step": 61800
3306
+ },
3307
+ {
3308
+ "epoch": 0.6826860030084468,
3309
+ "grad_norm": 1.7519707679748535,
3310
+ "learning_rate": 0.00012360403779185095,
3311
+ "loss": 4.4439,
3312
+ "num_input_tokens_seen": 361873184,
3313
+ "step": 61950
3314
+ },
3315
+ {
3316
+ "epoch": 0.6843389957518087,
3317
+ "grad_norm": 1.9227124452590942,
3318
+ "learning_rate": 0.00012351587639772543,
3319
+ "loss": 4.4416,
3320
+ "num_input_tokens_seen": 362749312,
3321
+ "step": 62100
3322
+ },
3323
+ {
3324
+ "epoch": 0.6859919884951705,
3325
+ "grad_norm": 1.8492848873138428,
3326
+ "learning_rate": 0.00012342771500359993,
3327
+ "loss": 4.4541,
3328
+ "num_input_tokens_seen": 363635936,
3329
+ "step": 62250
3330
+ },
3331
+ {
3332
+ "epoch": 0.6876449812385323,
3333
+ "grad_norm": 1.946057677268982,
3334
+ "learning_rate": 0.0001233395536094744,
3335
+ "loss": 4.435,
3336
+ "num_input_tokens_seen": 364500576,
3337
+ "step": 62400
3338
+ },
3339
+ {
3340
+ "epoch": 0.6892979739818942,
3341
+ "grad_norm": 1.8880736827850342,
3342
+ "learning_rate": 0.0001232513922153489,
3343
+ "loss": 4.4442,
3344
+ "num_input_tokens_seen": 365363744,
3345
+ "step": 62550
3346
+ },
3347
+ {
3348
+ "epoch": 0.690950966725256,
3349
+ "grad_norm": 1.864534854888916,
3350
+ "learning_rate": 0.00012316323082122338,
3351
+ "loss": 4.4398,
3352
+ "num_input_tokens_seen": 366253600,
3353
+ "step": 62700
3354
+ },
3355
+ {
3356
+ "epoch": 0.692603959468618,
3357
+ "grad_norm": 1.8077435493469238,
3358
+ "learning_rate": 0.00012307506942709788,
3359
+ "loss": 4.4462,
3360
+ "num_input_tokens_seen": 367119136,
3361
+ "step": 62850
3362
+ },
3363
+ {
3364
+ "epoch": 0.6942569522119798,
3365
+ "grad_norm": 1.8797168731689453,
3366
+ "learning_rate": 0.00012298690803297236,
3367
+ "loss": 4.4535,
3368
+ "num_input_tokens_seen": 367998656,
3369
+ "step": 63000
3370
+ },
3371
+ {
3372
+ "epoch": 0.6959099449553416,
3373
+ "grad_norm": 1.9124201536178589,
3374
+ "learning_rate": 0.00012289874663884686,
3375
+ "loss": 4.4314,
3376
+ "num_input_tokens_seen": 368873888,
3377
+ "step": 63150
3378
+ },
3379
+ {
3380
+ "epoch": 0.6975629376987035,
3381
+ "grad_norm": 1.919708013534546,
3382
+ "learning_rate": 0.00012281058524472134,
3383
+ "loss": 4.4524,
3384
+ "num_input_tokens_seen": 369761216,
3385
+ "step": 63300
3386
+ },
3387
+ {
3388
+ "epoch": 0.6992159304420653,
3389
+ "grad_norm": 1.8248168230056763,
3390
+ "learning_rate": 0.00012272242385059584,
3391
+ "loss": 4.4422,
3392
+ "num_input_tokens_seen": 370638688,
3393
+ "step": 63450
3394
+ },
3395
+ {
3396
+ "epoch": 0.7008689231854273,
3397
+ "grad_norm": 1.810051441192627,
3398
+ "learning_rate": 0.0001226342624564703,
3399
+ "loss": 4.4313,
3400
+ "num_input_tokens_seen": 371541344,
3401
+ "step": 63600
3402
+ },
3403
+ {
3404
+ "epoch": 0.7025219159287891,
3405
+ "grad_norm": 1.8361635208129883,
3406
+ "learning_rate": 0.00012254610106234481,
3407
+ "loss": 4.436,
3408
+ "num_input_tokens_seen": 372415168,
3409
+ "step": 63750
3410
+ },
3411
+ {
3412
+ "epoch": 0.7041749086721509,
3413
+ "grad_norm": 1.8005433082580566,
3414
+ "learning_rate": 0.0001224579396682193,
3415
+ "loss": 4.4353,
3416
+ "num_input_tokens_seen": 373283872,
3417
+ "step": 63900
3418
+ },
3419
+ {
3420
+ "epoch": 0.7058279014155128,
3421
+ "grad_norm": 1.8291569948196411,
3422
+ "learning_rate": 0.0001223697782740938,
3423
+ "loss": 4.4486,
3424
+ "num_input_tokens_seen": 374156960,
3425
+ "step": 64050
3426
+ },
3427
+ {
3428
+ "epoch": 0.7074808941588746,
3429
+ "grad_norm": 1.6987590789794922,
3430
+ "learning_rate": 0.00012228161687996827,
3431
+ "loss": 4.4402,
3432
+ "num_input_tokens_seen": 375045632,
3433
+ "step": 64200
3434
+ },
3435
+ {
3436
+ "epoch": 0.7091338869022366,
3437
+ "grad_norm": 1.8456915616989136,
3438
+ "learning_rate": 0.00012219345548584277,
3439
+ "loss": 4.4502,
3440
+ "num_input_tokens_seen": 375936576,
3441
+ "step": 64350
3442
+ },
3443
+ {
3444
+ "epoch": 0.7107868796455984,
3445
+ "grad_norm": 1.9141839742660522,
3446
+ "learning_rate": 0.00012210529409171724,
3447
+ "loss": 4.4521,
3448
+ "num_input_tokens_seen": 376816512,
3449
+ "step": 64500
3450
+ },
3451
+ {
3452
+ "epoch": 0.7124398723889602,
3453
+ "grad_norm": 1.8822457790374756,
3454
+ "learning_rate": 0.00012201713269759175,
3455
+ "loss": 4.4347,
3456
+ "num_input_tokens_seen": 377684448,
3457
+ "step": 64650
3458
+ },
3459
+ {
3460
+ "epoch": 0.7140928651323221,
3461
+ "grad_norm": 1.8143234252929688,
3462
+ "learning_rate": 0.00012192897130346622,
3463
+ "loss": 4.4336,
3464
+ "num_input_tokens_seen": 378553120,
3465
+ "step": 64800
3466
+ },
3467
+ {
3468
+ "epoch": 0.7157458578756839,
3469
+ "grad_norm": 1.8877683877944946,
3470
+ "learning_rate": 0.00012184080990934071,
3471
+ "loss": 4.4425,
3472
+ "num_input_tokens_seen": 379413888,
3473
+ "step": 64950
3474
+ },
3475
+ {
3476
+ "epoch": 0.7173988506190457,
3477
+ "grad_norm": 1.8746610879898071,
3478
+ "learning_rate": 0.0001217526485152152,
3479
+ "loss": 4.4417,
3480
+ "num_input_tokens_seen": 380290304,
3481
+ "step": 65100
3482
+ },
3483
+ {
3484
+ "epoch": 0.7190518433624077,
3485
+ "grad_norm": 2.0395500659942627,
3486
+ "learning_rate": 0.00012166448712108969,
3487
+ "loss": 4.4423,
3488
+ "num_input_tokens_seen": 381185920,
3489
+ "step": 65250
3490
+ },
3491
+ {
3492
+ "epoch": 0.7207048361057695,
3493
+ "grad_norm": 1.992492914199829,
3494
+ "learning_rate": 0.00012157632572696418,
3495
+ "loss": 4.4484,
3496
+ "num_input_tokens_seen": 382075808,
3497
+ "step": 65400
3498
+ },
3499
+ {
3500
+ "epoch": 0.7223578288491314,
3501
+ "grad_norm": 1.8621459007263184,
3502
+ "learning_rate": 0.00012148816433283866,
3503
+ "loss": 4.4274,
3504
+ "num_input_tokens_seen": 382955232,
3505
+ "step": 65550
3506
+ },
3507
+ {
3508
+ "epoch": 0.7240108215924932,
3509
+ "grad_norm": 1.8787345886230469,
3510
+ "learning_rate": 0.00012140000293871315,
3511
+ "loss": 4.4378,
3512
+ "num_input_tokens_seen": 383834592,
3513
+ "step": 65700
3514
+ },
3515
+ {
3516
+ "epoch": 0.725663814335855,
3517
+ "grad_norm": 1.8640894889831543,
3518
+ "learning_rate": 0.00012131184154458764,
3519
+ "loss": 4.4557,
3520
+ "num_input_tokens_seen": 384710016,
3521
+ "step": 65850
3522
+ },
3523
+ {
3524
+ "epoch": 0.727316807079217,
3525
+ "grad_norm": 1.918143630027771,
3526
+ "learning_rate": 0.00012122368015046212,
3527
+ "loss": 4.4467,
3528
+ "num_input_tokens_seen": 385593120,
3529
+ "step": 66000
3530
+ },
3531
+ {
3532
+ "epoch": 0.7289697998225788,
3533
+ "grad_norm": 1.8295505046844482,
3534
+ "learning_rate": 0.00012113551875633662,
3535
+ "loss": 4.4257,
3536
+ "num_input_tokens_seen": 386460160,
3537
+ "step": 66150
3538
+ },
3539
+ {
3540
+ "epoch": 0.7306227925659406,
3541
+ "grad_norm": 1.880216360092163,
3542
+ "learning_rate": 0.00012104794510483861,
3543
+ "loss": 4.4312,
3544
+ "num_input_tokens_seen": 387328000,
3545
+ "step": 66300
3546
+ },
3547
+ {
3548
+ "epoch": 0.7322757853093025,
3549
+ "grad_norm": 1.818788766860962,
3550
+ "learning_rate": 0.00012095978371071308,
3551
+ "loss": 4.4402,
3552
+ "num_input_tokens_seen": 388200192,
3553
+ "step": 66450
3554
+ },
3555
+ {
3556
+ "epoch": 0.7339287780526643,
3557
+ "grad_norm": 1.82147216796875,
3558
+ "learning_rate": 0.00012087162231658759,
3559
+ "loss": 4.4468,
3560
+ "num_input_tokens_seen": 389079264,
3561
+ "step": 66600
3562
+ },
3563
+ {
3564
+ "epoch": 0.7355817707960262,
3565
+ "grad_norm": 1.8930702209472656,
3566
+ "learning_rate": 0.00012078346092246206,
3567
+ "loss": 4.4362,
3568
+ "num_input_tokens_seen": 389974112,
3569
+ "step": 66750
3570
+ },
3571
+ {
3572
+ "epoch": 0.7372347635393881,
3573
+ "grad_norm": 1.8484946489334106,
3574
+ "learning_rate": 0.00012069529952833656,
3575
+ "loss": 4.4409,
3576
+ "num_input_tokens_seen": 390875168,
3577
+ "step": 66900
3578
+ },
3579
+ {
3580
+ "epoch": 0.7388877562827499,
3581
+ "grad_norm": 1.894093632698059,
3582
+ "learning_rate": 0.00012060713813421104,
3583
+ "loss": 4.4197,
3584
+ "num_input_tokens_seen": 391736384,
3585
+ "step": 67050
3586
+ },
3587
+ {
3588
+ "epoch": 0.7405407490261118,
3589
+ "grad_norm": 1.918149471282959,
3590
+ "learning_rate": 0.00012051897674008553,
3591
+ "loss": 4.4487,
3592
+ "num_input_tokens_seen": 392626688,
3593
+ "step": 67200
3594
+ },
3595
+ {
3596
+ "epoch": 0.7421937417694736,
3597
+ "grad_norm": 1.8563427925109863,
3598
+ "learning_rate": 0.00012043081534596002,
3599
+ "loss": 4.4375,
3600
+ "num_input_tokens_seen": 393508640,
3601
+ "step": 67350
3602
+ },
3603
+ {
3604
+ "epoch": 0.7438467345128355,
3605
+ "grad_norm": 1.8275529146194458,
3606
+ "learning_rate": 0.0001203426539518345,
3607
+ "loss": 4.4333,
3608
+ "num_input_tokens_seen": 394397120,
3609
+ "step": 67500
3610
+ },
3611
+ {
3612
+ "epoch": 0.7454997272561974,
3613
+ "grad_norm": 1.824823260307312,
3614
+ "learning_rate": 0.00012025449255770899,
3615
+ "loss": 4.4213,
3616
+ "num_input_tokens_seen": 395273408,
3617
+ "step": 67650
3618
+ },
3619
+ {
3620
+ "epoch": 0.7471527199995592,
3621
+ "grad_norm": 1.7815489768981934,
3622
+ "learning_rate": 0.00012016633116358348,
3623
+ "loss": 4.4295,
3624
+ "num_input_tokens_seen": 396151584,
3625
+ "step": 67800
3626
+ },
3627
+ {
3628
+ "epoch": 0.7488057127429211,
3629
+ "grad_norm": 1.9288073778152466,
3630
+ "learning_rate": 0.00012007816976945797,
3631
+ "loss": 4.4348,
3632
+ "num_input_tokens_seen": 397006880,
3633
+ "step": 67950
3634
+ },
3635
+ {
3636
+ "epoch": 0.7504587054862829,
3637
+ "grad_norm": 1.866746425628662,
3638
+ "learning_rate": 0.00011999000837533245,
3639
+ "loss": 4.4306,
3640
+ "num_input_tokens_seen": 397879072,
3641
+ "step": 68100
3642
+ },
3643
+ {
3644
+ "epoch": 0.7521116982296447,
3645
+ "grad_norm": 1.8168858289718628,
3646
+ "learning_rate": 0.00011990184698120693,
3647
+ "loss": 4.4321,
3648
+ "num_input_tokens_seen": 398772672,
3649
+ "step": 68250
3650
+ },
3651
+ {
3652
+ "epoch": 0.7537646909730066,
3653
+ "grad_norm": 1.7801350355148315,
3654
+ "learning_rate": 0.00011981368558708142,
3655
+ "loss": 4.4358,
3656
+ "num_input_tokens_seen": 399663136,
3657
+ "step": 68400
3658
+ },
3659
+ {
3660
+ "epoch": 0.7554176837163685,
3661
+ "grad_norm": 1.9442716836929321,
3662
+ "learning_rate": 0.00011972611193558343,
3663
+ "loss": 4.4357,
3664
+ "num_input_tokens_seen": 400543936,
3665
+ "step": 68550
3666
+ },
3667
+ {
3668
+ "epoch": 0.7570706764597304,
3669
+ "grad_norm": 1.8754234313964844,
3670
+ "learning_rate": 0.0001196379505414579,
3671
+ "loss": 4.4279,
3672
+ "num_input_tokens_seen": 401411136,
3673
+ "step": 68700
3674
+ },
3675
+ {
3676
+ "epoch": 0.7587236692030922,
3677
+ "grad_norm": 1.8986996412277222,
3678
+ "learning_rate": 0.0001195497891473324,
3679
+ "loss": 4.4345,
3680
+ "num_input_tokens_seen": 402290464,
3681
+ "step": 68850
3682
+ },
3683
+ {
3684
+ "epoch": 0.760376661946454,
3685
+ "grad_norm": 1.8807158470153809,
3686
+ "learning_rate": 0.00011946162775320688,
3687
+ "loss": 4.4329,
3688
+ "num_input_tokens_seen": 403176768,
3689
+ "step": 69000
3690
+ },
3691
+ {
3692
+ "epoch": 0.7620296546898159,
3693
+ "grad_norm": 1.8661843538284302,
3694
+ "learning_rate": 0.00011937346635908138,
3695
+ "loss": 4.4327,
3696
+ "num_input_tokens_seen": 404053888,
3697
+ "step": 69150
3698
+ },
3699
+ {
3700
+ "epoch": 0.7636826474331778,
3701
+ "grad_norm": 1.9022386074066162,
3702
+ "learning_rate": 0.00011928530496495586,
3703
+ "loss": 4.4304,
3704
+ "num_input_tokens_seen": 404951328,
3705
+ "step": 69300
3706
+ },
3707
+ {
3708
+ "epoch": 0.7653356401765397,
3709
+ "grad_norm": 1.9497708082199097,
3710
+ "learning_rate": 0.00011919714357083035,
3711
+ "loss": 4.4319,
3712
+ "num_input_tokens_seen": 405824128,
3713
+ "step": 69450
3714
+ },
3715
+ {
3716
+ "epoch": 0.7669886329199015,
3717
+ "grad_norm": 1.7283419370651245,
3718
+ "learning_rate": 0.00011910898217670483,
3719
+ "loss": 4.4222,
3720
+ "num_input_tokens_seen": 406694592,
3721
+ "step": 69600
3722
+ },
3723
+ {
3724
+ "epoch": 0.7686416256632633,
3725
+ "grad_norm": 1.8692352771759033,
3726
+ "learning_rate": 0.00011902082078257931,
3727
+ "loss": 4.4257,
3728
+ "num_input_tokens_seen": 407577856,
3729
+ "step": 69750
3730
+ },
3731
+ {
3732
+ "epoch": 0.7702946184066252,
3733
+ "grad_norm": 1.918215036392212,
3734
+ "learning_rate": 0.00011893265938845381,
3735
+ "loss": 4.4275,
3736
+ "num_input_tokens_seen": 408455424,
3737
+ "step": 69900
3738
+ },
3739
+ {
3740
+ "epoch": 0.771947611149987,
3741
+ "grad_norm": 1.8184279203414917,
3742
+ "learning_rate": 0.00011884449799432829,
3743
+ "loss": 4.4287,
3744
+ "num_input_tokens_seen": 409330752,
3745
+ "step": 70050
3746
+ },
3747
+ {
3748
+ "epoch": 0.773600603893349,
3749
+ "grad_norm": 1.846740961074829,
3750
+ "learning_rate": 0.00011875633660020279,
3751
+ "loss": 4.4244,
3752
+ "num_input_tokens_seen": 410215104,
3753
+ "step": 70200
3754
+ },
3755
+ {
3756
+ "epoch": 0.7752535966367108,
3757
+ "grad_norm": 1.9468152523040771,
3758
+ "learning_rate": 0.00011866817520607726,
3759
+ "loss": 4.4223,
3760
+ "num_input_tokens_seen": 411089696,
3761
+ "step": 70350
3762
+ },
3763
+ {
3764
+ "epoch": 0.7769065893800726,
3765
+ "grad_norm": 1.87180495262146,
3766
+ "learning_rate": 0.00011858001381195175,
3767
+ "loss": 4.4511,
3768
+ "num_input_tokens_seen": 411988000,
3769
+ "step": 70500
3770
+ },
3771
+ {
3772
+ "epoch": 0.7785595821234345,
3773
+ "grad_norm": 1.8375773429870605,
3774
+ "learning_rate": 0.00011849185241782624,
3775
+ "loss": 4.4361,
3776
+ "num_input_tokens_seen": 412876544,
3777
+ "step": 70650
3778
+ },
3779
+ {
3780
+ "epoch": 0.7802125748667963,
3781
+ "grad_norm": 1.7592004537582397,
3782
+ "learning_rate": 0.00011840369102370073,
3783
+ "loss": 4.4272,
3784
+ "num_input_tokens_seen": 413737824,
3785
+ "step": 70800
3786
+ },
3787
+ {
3788
+ "epoch": 0.7818655676101582,
3789
+ "grad_norm": 1.9243676662445068,
3790
+ "learning_rate": 0.00011831552962957522,
3791
+ "loss": 4.4343,
3792
+ "num_input_tokens_seen": 414613184,
3793
+ "step": 70950
3794
+ },
3795
+ {
3796
+ "epoch": 0.7835185603535201,
3797
+ "grad_norm": 1.9014674425125122,
3798
+ "learning_rate": 0.0001182273682354497,
3799
+ "loss": 4.4217,
3800
+ "num_input_tokens_seen": 415494144,
3801
+ "step": 71100
3802
+ },
3803
+ {
3804
+ "epoch": 0.7851715530968819,
3805
+ "grad_norm": 1.8528156280517578,
3806
+ "learning_rate": 0.0001181397945839517,
3807
+ "loss": 4.4323,
3808
+ "num_input_tokens_seen": 416354240,
3809
+ "step": 71250
3810
+ },
3811
+ {
3812
+ "epoch": 0.7868245458402437,
3813
+ "grad_norm": 1.7702356576919556,
3814
+ "learning_rate": 0.0001180516331898262,
3815
+ "loss": 4.4335,
3816
+ "num_input_tokens_seen": 417214944,
3817
+ "step": 71400
3818
+ },
3819
+ {
3820
+ "epoch": 0.7884775385836056,
3821
+ "grad_norm": 1.893778920173645,
3822
+ "learning_rate": 0.00011796347179570067,
3823
+ "loss": 4.4288,
3824
+ "num_input_tokens_seen": 418073984,
3825
+ "step": 71550
3826
+ },
3827
+ {
3828
+ "epoch": 0.7901305313269674,
3829
+ "grad_norm": 1.8179432153701782,
3830
+ "learning_rate": 0.00011787531040157515,
3831
+ "loss": 4.4161,
3832
+ "num_input_tokens_seen": 418956256,
3833
+ "step": 71700
3834
+ },
3835
+ {
3836
+ "epoch": 0.7917835240703294,
3837
+ "grad_norm": 1.8786159753799438,
3838
+ "learning_rate": 0.00011778714900744965,
3839
+ "loss": 4.4224,
3840
+ "num_input_tokens_seen": 419832544,
3841
+ "step": 71850
3842
+ },
3843
+ {
3844
+ "epoch": 0.7934365168136912,
3845
+ "grad_norm": 1.864493727684021,
3846
+ "learning_rate": 0.00011769898761332413,
3847
+ "loss": 4.4294,
3848
+ "num_input_tokens_seen": 420706176,
3849
+ "step": 72000
3850
+ },
3851
+ {
3852
+ "epoch": 0.795089509557053,
3853
+ "grad_norm": 1.7827798128128052,
3854
+ "learning_rate": 0.00011761082621919863,
3855
+ "loss": 4.4265,
3856
+ "num_input_tokens_seen": 421594880,
3857
+ "step": 72150
3858
+ },
3859
+ {
3860
+ "epoch": 0.7967425023004149,
3861
+ "grad_norm": 1.8714325428009033,
3862
+ "learning_rate": 0.0001175226648250731,
3863
+ "loss": 4.4428,
3864
+ "num_input_tokens_seen": 422456256,
3865
+ "step": 72300
3866
+ },
3867
+ {
3868
+ "epoch": 0.7983954950437767,
3869
+ "grad_norm": 1.8954764604568481,
3870
+ "learning_rate": 0.0001174345034309476,
3871
+ "loss": 4.4198,
3872
+ "num_input_tokens_seen": 423334208,
3873
+ "step": 72450
3874
+ },
3875
+ {
3876
+ "epoch": 0.8000484877871387,
3877
+ "grad_norm": 1.9334732294082642,
3878
+ "learning_rate": 0.00011734634203682208,
3879
+ "loss": 4.4285,
3880
+ "num_input_tokens_seen": 424227104,
3881
+ "step": 72600
3882
+ },
3883
+ {
3884
+ "epoch": 0.8017014805305005,
3885
+ "grad_norm": 1.8234983682632446,
3886
+ "learning_rate": 0.00011725818064269657,
3887
+ "loss": 4.438,
3888
+ "num_input_tokens_seen": 425093536,
3889
+ "step": 72750
3890
+ },
3891
+ {
3892
+ "epoch": 0.8033544732738623,
3893
+ "grad_norm": 1.8719639778137207,
3894
+ "learning_rate": 0.00011717001924857106,
3895
+ "loss": 4.432,
3896
+ "num_input_tokens_seen": 425967904,
3897
+ "step": 72900
3898
+ },
3899
+ {
3900
+ "epoch": 0.8050074660172242,
3901
+ "grad_norm": 1.879062533378601,
3902
+ "learning_rate": 0.00011708185785444555,
3903
+ "loss": 4.4157,
3904
+ "num_input_tokens_seen": 426848192,
3905
+ "step": 73050
3906
+ },
3907
+ {
3908
+ "epoch": 0.806660458760586,
3909
+ "grad_norm": 1.8409887552261353,
3910
+ "learning_rate": 0.00011699369646032003,
3911
+ "loss": 4.4173,
3912
+ "num_input_tokens_seen": 427730528,
3913
+ "step": 73200
3914
+ },
3915
+ {
3916
+ "epoch": 0.8083134515039478,
3917
+ "grad_norm": 1.9242078065872192,
3918
+ "learning_rate": 0.00011690553506619452,
3919
+ "loss": 4.4296,
3920
+ "num_input_tokens_seen": 428621792,
3921
+ "step": 73350
3922
+ },
3923
+ {
3924
+ "epoch": 0.8099664442473098,
3925
+ "grad_norm": 1.8767496347427368,
3926
+ "learning_rate": 0.00011681737367206901,
3927
+ "loss": 4.4245,
3928
+ "num_input_tokens_seen": 429494624,
3929
+ "step": 73500
3930
+ },
3931
+ {
3932
+ "epoch": 0.8116194369906716,
3933
+ "grad_norm": 1.8519647121429443,
3934
+ "learning_rate": 0.0001167292122779435,
3935
+ "loss": 4.4254,
3936
+ "num_input_tokens_seen": 430360704,
3937
+ "step": 73650
3938
+ },
3939
+ {
3940
+ "epoch": 0.8132724297340335,
3941
+ "grad_norm": 1.9251487255096436,
3942
+ "learning_rate": 0.00011664105088381798,
3943
+ "loss": 4.4362,
3944
+ "num_input_tokens_seen": 431239616,
3945
+ "step": 73800
3946
+ },
3947
+ {
3948
+ "epoch": 0.8149254224773953,
3949
+ "grad_norm": 1.8970694541931152,
3950
+ "learning_rate": 0.00011655288948969248,
3951
+ "loss": 4.4261,
3952
+ "num_input_tokens_seen": 432109120,
3953
+ "step": 73950
3954
+ },
3955
+ {
3956
+ "epoch": 0.8165784152207571,
3957
+ "grad_norm": 1.8284028768539429,
3958
+ "learning_rate": 0.00011646472809556695,
3959
+ "loss": 4.4406,
3960
+ "num_input_tokens_seen": 433005440,
3961
+ "step": 74100
3962
+ },
3963
+ {
3964
+ "epoch": 0.8182314079641191,
3965
+ "grad_norm": 1.7933986186981201,
3966
+ "learning_rate": 0.00011637656670144145,
3967
+ "loss": 4.4333,
3968
+ "num_input_tokens_seen": 433891456,
3969
+ "step": 74250
3970
+ },
3971
+ {
3972
+ "epoch": 0.8198844007074809,
3973
+ "grad_norm": 1.802509069442749,
3974
+ "learning_rate": 0.00011628840530731593,
3975
+ "loss": 4.4201,
3976
+ "num_input_tokens_seen": 434769856,
3977
+ "step": 74400
3978
+ },
3979
+ {
3980
+ "epoch": 0.8215373934508428,
3981
+ "grad_norm": 1.7515144348144531,
3982
+ "learning_rate": 0.00011620024391319043,
3983
+ "loss": 4.4225,
3984
+ "num_input_tokens_seen": 435665920,
3985
+ "step": 74550
3986
+ },
3987
+ {
3988
+ "epoch": 0.8231903861942046,
3989
+ "grad_norm": 1.8373006582260132,
3990
+ "learning_rate": 0.00011611208251906491,
3991
+ "loss": 4.4265,
3992
+ "num_input_tokens_seen": 436549984,
3993
+ "step": 74700
3994
+ },
3995
+ {
3996
+ "epoch": 0.8248433789375664,
3997
+ "grad_norm": 1.8570173978805542,
3998
+ "learning_rate": 0.00011602392112493941,
3999
+ "loss": 4.4196,
4000
+ "num_input_tokens_seen": 437427456,
4001
+ "step": 74850
4002
+ },
4003
+ {
4004
+ "epoch": 0.8264963716809283,
4005
+ "grad_norm": 1.9485052824020386,
4006
+ "learning_rate": 0.00011593575973081388,
4007
+ "loss": 4.4235,
4008
+ "num_input_tokens_seen": 438316576,
4009
+ "step": 75000
4010
+ },
4011
+ {
4012
+ "epoch": 0.8281493644242902,
4013
+ "grad_norm": 1.8972394466400146,
4014
+ "learning_rate": 0.00011584818607931588,
4015
+ "loss": 4.4231,
4016
+ "num_input_tokens_seen": 439211648,
4017
+ "step": 75150
4018
+ },
4019
+ {
4020
+ "epoch": 0.829802357167652,
4021
+ "grad_norm": 1.778745412826538,
4022
+ "learning_rate": 0.00011576002468519036,
4023
+ "loss": 4.423,
4024
+ "num_input_tokens_seen": 440086912,
4025
+ "step": 75300
4026
+ },
4027
+ {
4028
+ "epoch": 0.8314553499110139,
4029
+ "grad_norm": 1.923743486404419,
4030
+ "learning_rate": 0.00011567245103369236,
4031
+ "loss": 4.4112,
4032
+ "num_input_tokens_seen": 440976736,
4033
+ "step": 75450
4034
+ },
4035
+ {
4036
+ "epoch": 0.8331083426543757,
4037
+ "grad_norm": 1.8902959823608398,
4038
+ "learning_rate": 0.00011558428963956683,
4039
+ "loss": 4.4171,
4040
+ "num_input_tokens_seen": 441827264,
4041
+ "step": 75600
4042
+ },
4043
+ {
4044
+ "epoch": 0.8347613353977376,
4045
+ "grad_norm": 1.882279396057129,
4046
+ "learning_rate": 0.00011549612824544133,
4047
+ "loss": 4.4386,
4048
+ "num_input_tokens_seen": 442687328,
4049
+ "step": 75750
4050
+ },
4051
+ {
4052
+ "epoch": 0.8364143281410995,
4053
+ "grad_norm": 1.8508954048156738,
4054
+ "learning_rate": 0.00011540796685131581,
4055
+ "loss": 4.4261,
4056
+ "num_input_tokens_seen": 443558176,
4057
+ "step": 75900
4058
+ },
4059
+ {
4060
+ "epoch": 0.8380673208844613,
4061
+ "grad_norm": 1.8582794666290283,
4062
+ "learning_rate": 0.00011531980545719031,
4063
+ "loss": 4.4214,
4064
+ "num_input_tokens_seen": 444448512,
4065
+ "step": 76050
4066
+ },
4067
+ {
4068
+ "epoch": 0.8397203136278232,
4069
+ "grad_norm": 1.8337671756744385,
4070
+ "learning_rate": 0.00011523164406306478,
4071
+ "loss": 4.4257,
4072
+ "num_input_tokens_seen": 445311808,
4073
+ "step": 76200
4074
+ },
4075
+ {
4076
+ "epoch": 0.841373306371185,
4077
+ "grad_norm": 1.8980998992919922,
4078
+ "learning_rate": 0.00011514348266893929,
4079
+ "loss": 4.4294,
4080
+ "num_input_tokens_seen": 446200832,
4081
+ "step": 76350
4082
+ },
4083
+ {
4084
+ "epoch": 0.8430262991145469,
4085
+ "grad_norm": 1.8506239652633667,
4086
+ "learning_rate": 0.00011505532127481376,
4087
+ "loss": 4.4205,
4088
+ "num_input_tokens_seen": 447081376,
4089
+ "step": 76500
4090
+ },
4091
+ {
4092
+ "epoch": 0.8446792918579088,
4093
+ "grad_norm": 1.8824795484542847,
4094
+ "learning_rate": 0.00011496715988068826,
4095
+ "loss": 4.4213,
4096
+ "num_input_tokens_seen": 447960352,
4097
+ "step": 76650
4098
+ },
4099
+ {
4100
+ "epoch": 0.8463322846012706,
4101
+ "grad_norm": 1.8223339319229126,
4102
+ "learning_rate": 0.00011487899848656274,
4103
+ "loss": 4.4267,
4104
+ "num_input_tokens_seen": 448832096,
4105
+ "step": 76800
4106
+ },
4107
+ {
4108
+ "epoch": 0.8479852773446325,
4109
+ "grad_norm": 1.8224749565124512,
4110
+ "learning_rate": 0.00011479083709243724,
4111
+ "loss": 4.4181,
4112
+ "num_input_tokens_seen": 449706720,
4113
+ "step": 76950
4114
+ },
4115
+ {
4116
+ "epoch": 0.8496382700879943,
4117
+ "grad_norm": 1.903432011604309,
4118
+ "learning_rate": 0.00011470267569831172,
4119
+ "loss": 4.4309,
4120
+ "num_input_tokens_seen": 450578752,
4121
+ "step": 77100
4122
+ },
4123
+ {
4124
+ "epoch": 0.8512912628313561,
4125
+ "grad_norm": 1.8261497020721436,
4126
+ "learning_rate": 0.0001146145143041862,
4127
+ "loss": 4.4195,
4128
+ "num_input_tokens_seen": 451441120,
4129
+ "step": 77250
4130
+ },
4131
+ {
4132
+ "epoch": 0.852944255574718,
4133
+ "grad_norm": 1.8583135604858398,
4134
+ "learning_rate": 0.00011452635291006069,
4135
+ "loss": 4.4198,
4136
+ "num_input_tokens_seen": 452318560,
4137
+ "step": 77400
4138
+ },
4139
+ {
4140
+ "epoch": 0.8545972483180799,
4141
+ "grad_norm": 1.8936694860458374,
4142
+ "learning_rate": 0.00011443819151593518,
4143
+ "loss": 4.4287,
4144
+ "num_input_tokens_seen": 453195136,
4145
+ "step": 77550
4146
+ },
4147
+ {
4148
+ "epoch": 0.8562502410614418,
4149
+ "grad_norm": 1.9256082773208618,
4150
+ "learning_rate": 0.00011435003012180967,
4151
+ "loss": 4.4156,
4152
+ "num_input_tokens_seen": 454055456,
4153
+ "step": 77700
4154
+ },
4155
+ {
4156
+ "epoch": 0.8579032338048036,
4157
+ "grad_norm": 1.8237937688827515,
4158
+ "learning_rate": 0.00011426186872768416,
4159
+ "loss": 4.4348,
4160
+ "num_input_tokens_seen": 454932800,
4161
+ "step": 77850
4162
+ },
4163
+ {
4164
+ "epoch": 0.8595562265481654,
4165
+ "grad_norm": 1.8298827409744263,
4166
+ "learning_rate": 0.00011417370733355865,
4167
+ "loss": 4.4178,
4168
+ "num_input_tokens_seen": 455826208,
4169
+ "step": 78000
4170
+ },
4171
+ {
4172
+ "epoch": 0.8612092192915273,
4173
+ "grad_norm": 1.895670771598816,
4174
+ "learning_rate": 0.00011408554593943314,
4175
+ "loss": 4.4164,
4176
+ "num_input_tokens_seen": 456726848,
4177
+ "step": 78150
4178
+ },
4179
+ {
4180
+ "epoch": 0.8628622120348892,
4181
+ "grad_norm": 1.750807523727417,
4182
+ "learning_rate": 0.00011399738454530761,
4183
+ "loss": 4.4051,
4184
+ "num_input_tokens_seen": 457606464,
4185
+ "step": 78300
4186
+ },
4187
+ {
4188
+ "epoch": 0.8645152047782511,
4189
+ "grad_norm": 1.8419345617294312,
4190
+ "learning_rate": 0.00011390922315118211,
4191
+ "loss": 4.4249,
4192
+ "num_input_tokens_seen": 458484896,
4193
+ "step": 78450
4194
+ },
4195
+ {
4196
+ "epoch": 0.8661681975216129,
4197
+ "grad_norm": 2.033911943435669,
4198
+ "learning_rate": 0.00011382106175705659,
4199
+ "loss": 4.421,
4200
+ "num_input_tokens_seen": 459353824,
4201
+ "step": 78600
4202
+ },
4203
+ {
4204
+ "epoch": 0.8678211902649747,
4205
+ "grad_norm": 1.9020805358886719,
4206
+ "learning_rate": 0.00011373290036293109,
4207
+ "loss": 4.4016,
4208
+ "num_input_tokens_seen": 460221184,
4209
+ "step": 78750
4210
+ },
4211
+ {
4212
+ "epoch": 0.8694741830083366,
4213
+ "grad_norm": 1.91862952709198,
4214
+ "learning_rate": 0.00011364473896880557,
4215
+ "loss": 4.4093,
4216
+ "num_input_tokens_seen": 461087744,
4217
+ "step": 78900
4218
+ },
4219
+ {
4220
+ "epoch": 0.8711271757516984,
4221
+ "grad_norm": 1.7994396686553955,
4222
+ "learning_rate": 0.00011355657757468007,
4223
+ "loss": 4.4281,
4224
+ "num_input_tokens_seen": 461960160,
4225
+ "step": 79050
4226
+ },
4227
+ {
4228
+ "epoch": 0.8727801684950603,
4229
+ "grad_norm": 1.7911181449890137,
4230
+ "learning_rate": 0.00011346841618055454,
4231
+ "loss": 4.4229,
4232
+ "num_input_tokens_seen": 462838144,
4233
+ "step": 79200
4234
+ },
4235
+ {
4236
+ "epoch": 0.8744331612384222,
4237
+ "grad_norm": 1.923474907875061,
4238
+ "learning_rate": 0.00011338025478642904,
4239
+ "loss": 4.4103,
4240
+ "num_input_tokens_seen": 463703328,
4241
+ "step": 79350
4242
+ },
4243
+ {
4244
+ "epoch": 0.876086153981784,
4245
+ "grad_norm": 1.994814157485962,
4246
+ "learning_rate": 0.00011329268113493102,
4247
+ "loss": 4.4128,
4248
+ "num_input_tokens_seen": 464568896,
4249
+ "step": 79500
4250
+ },
4251
+ {
4252
+ "epoch": 0.8777391467251459,
4253
+ "grad_norm": 1.875200867652893,
4254
+ "learning_rate": 0.00011320451974080551,
4255
+ "loss": 4.4224,
4256
+ "num_input_tokens_seen": 465434144,
4257
+ "step": 79650
4258
+ },
4259
+ {
4260
+ "epoch": 0.8793921394685077,
4261
+ "grad_norm": 1.8729829788208008,
4262
+ "learning_rate": 0.00011311635834668,
4263
+ "loss": 4.4274,
4264
+ "num_input_tokens_seen": 466293984,
4265
+ "step": 79800
4266
+ },
4267
+ {
4268
+ "epoch": 0.8810451322118696,
4269
+ "grad_norm": 1.772687315940857,
4270
+ "learning_rate": 0.00011302819695255449,
4271
+ "loss": 4.4178,
4272
+ "num_input_tokens_seen": 467169280,
4273
+ "step": 79950
4274
+ },
4275
+ {
4276
+ "epoch": 0.8826981249552315,
4277
+ "grad_norm": 1.8293451070785522,
4278
+ "learning_rate": 0.00011294003555842898,
4279
+ "loss": 4.412,
4280
+ "num_input_tokens_seen": 468023552,
4281
+ "step": 80100
4282
+ },
4283
+ {
4284
+ "epoch": 0.8843511176985933,
4285
+ "grad_norm": 1.9000316858291626,
4286
+ "learning_rate": 0.00011285187416430346,
4287
+ "loss": 4.4114,
4288
+ "num_input_tokens_seen": 468883168,
4289
+ "step": 80250
4290
+ },
4291
+ {
4292
+ "epoch": 0.8860041104419552,
4293
+ "grad_norm": 1.8056668043136597,
4294
+ "learning_rate": 0.00011276371277017795,
4295
+ "loss": 4.4225,
4296
+ "num_input_tokens_seen": 469761120,
4297
+ "step": 80400
4298
+ },
4299
+ {
4300
+ "epoch": 0.887657103185317,
4301
+ "grad_norm": 1.7813293933868408,
4302
+ "learning_rate": 0.00011267555137605243,
4303
+ "loss": 4.4176,
4304
+ "num_input_tokens_seen": 470629344,
4305
+ "step": 80550
4306
+ },
4307
+ {
4308
+ "epoch": 0.8893100959286788,
4309
+ "grad_norm": 1.8244847059249878,
4310
+ "learning_rate": 0.00011258738998192693,
4311
+ "loss": 4.4082,
4312
+ "num_input_tokens_seen": 471489280,
4313
+ "step": 80700
4314
+ },
4315
+ {
4316
+ "epoch": 0.8909630886720408,
4317
+ "grad_norm": 1.8946529626846313,
4318
+ "learning_rate": 0.00011249981633042892,
4319
+ "loss": 4.4162,
4320
+ "num_input_tokens_seen": 472365504,
4321
+ "step": 80850
4322
+ },
4323
+ {
4324
+ "epoch": 0.8926160814154026,
4325
+ "grad_norm": 1.870685338973999,
4326
+ "learning_rate": 0.0001124116549363034,
4327
+ "loss": 4.4009,
4328
+ "num_input_tokens_seen": 473238752,
4329
+ "step": 81000
4330
+ },
4331
+ {
4332
+ "epoch": 0.8942690741587644,
4333
+ "grad_norm": 1.9169375896453857,
4334
+ "learning_rate": 0.0001123234935421779,
4335
+ "loss": 4.4176,
4336
+ "num_input_tokens_seen": 474122880,
4337
+ "step": 81150
4338
+ },
4339
+ {
4340
+ "epoch": 0.8959220669021263,
4341
+ "grad_norm": 1.9780856370925903,
4342
+ "learning_rate": 0.00011223533214805237,
4343
+ "loss": 4.401,
4344
+ "num_input_tokens_seen": 475002240,
4345
+ "step": 81300
4346
+ },
4347
+ {
4348
+ "epoch": 0.8975750596454881,
4349
+ "grad_norm": 1.8493030071258545,
4350
+ "learning_rate": 0.00011214717075392688,
4351
+ "loss": 4.4091,
4352
+ "num_input_tokens_seen": 475882624,
4353
+ "step": 81450
4354
  }
4355
  ],
4356
  "logging_steps": 150,
4357
  "max_steps": 272232,
4358
+ "num_input_tokens_seen": 476177504,
4359
  "num_train_epochs": 3,
4360
  "save_steps": 500,
4361
  "stateful_callbacks": {
 
4370
  "attributes": {}
4371
  }
4372
  },
4373
+ "total_flos": 7352485415362560.0,
4374
  "train_batch_size": 32,
4375
  "trial_name": null,
4376
  "trial_params": null