|
2025-01-08 19:17:21,692 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat -fno-strict-overflow -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -c /tmp/tmp2ck6dpv_/test.c -o /tmp/tmp2ck6dpv_/test.o |
|
2025-01-08 19:17:21,723 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat /tmp/tmp2ck6dpv_/test.o -laio -o /tmp/tmp2ck6dpv_/a.out |
|
2025-01-08 19:17:22,159 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat -fno-strict-overflow -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -c /tmp/tmppfomhgow/test.c -o /tmp/tmppfomhgow/test.o |
|
2025-01-08 19:17:22,204 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat /tmp/tmppfomhgow/test.o -L/usr/local/cuda -L/usr/local/cuda/lib64 -lcufile -o /tmp/tmppfomhgow/a.out |
|
2025-01-08 19:17:24,497 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat -fno-strict-overflow -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -c /tmp/tmp0g0jxvh5/test.c -o /tmp/tmp0g0jxvh5/test.o |
|
2025-01-08 19:17:24,525 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat /tmp/tmp0g0jxvh5/test.o -laio -o /tmp/tmp0g0jxvh5/a.out |
|
2025-01-08 19:17:24,555 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat -fno-strict-overflow -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -c /tmp/tmp6e8myn4y/test.c -o /tmp/tmp6e8myn4y/test.o |
|
2025-01-08 19:17:24,557 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat -fno-strict-overflow -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -c /tmp/tmp7vyq5nz_/test.c -o /tmp/tmp7vyq5nz_/test.o |
|
2025-01-08 19:17:24,582 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat /tmp/tmp6e8myn4y/test.o -laio -o /tmp/tmp6e8myn4y/a.out |
|
2025-01-08 19:17:24,583 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat /tmp/tmp7vyq5nz_/test.o -laio -o /tmp/tmp7vyq5nz_/a.out |
|
2025-01-08 19:17:24,960 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat -fno-strict-overflow -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -c /tmp/tmp2rrm1y3q/test.c -o /tmp/tmp2rrm1y3q/test.o |
|
2025-01-08 19:17:24,983 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat -fno-strict-overflow -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -c /tmp/tmpbg_v2wps/test.c -o /tmp/tmpbg_v2wps/test.o |
|
2025-01-08 19:17:24,986 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat /tmp/tmp2rrm1y3q/test.o -L/usr/local/cuda -L/usr/local/cuda/lib64 -lcufile -o /tmp/tmp2rrm1y3q/a.out |
|
2025-01-08 19:17:25,007 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat /tmp/tmpbg_v2wps/test.o -L/usr/local/cuda -L/usr/local/cuda/lib64 -lcufile -o /tmp/tmpbg_v2wps/a.out |
|
2025-01-08 19:17:25,049 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat -fno-strict-overflow -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -c /tmp/tmp8gt5q4f1/test.c -o /tmp/tmp8gt5q4f1/test.o |
|
2025-01-08 19:17:25,071 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat /tmp/tmp8gt5q4f1/test.o -L/usr/local/cuda -L/usr/local/cuda/lib64 -lcufile -o /tmp/tmp8gt5q4f1/a.out |
|
2025-01-08 19:22:03,063 - INFO - Training started |
|
2025-01-08 19:22:03,063 - INFO - Total steps: 1281 |
|
2025-01-08 19:34:20,509 - INFO - Step 5/1281 (0.4%), loss: 1.4755, learning_rate: 1.00e-04, epoch: 0.0117, step_time: 1120.55s, elapsed_time: 1120.55s, grad_norm: 0.8934 |
|
2025-01-08 19:44:33,351 - INFO - Step 10/1281 (0.8%), loss: 1.0298, learning_rate: 1.00e-04, epoch: 0.0234, step_time: 612.84s, elapsed_time: 1733.39s, grad_norm: 0.7723 |
|
2025-01-08 19:54:45,845 - INFO - Step 15/1281 (1.2%), loss: 0.8797, learning_rate: 1.00e-04, epoch: 0.0351, step_time: 612.49s, elapsed_time: 2345.88s, grad_norm: 1.5860 |
|
2025-01-08 20:04:57,722 - INFO - Step 20/1281 (1.6%), loss: 0.7675, learning_rate: 9.99e-05, epoch: 0.0468, step_time: 611.88s, elapsed_time: 2957.76s, grad_norm: 1.6310 |
|
2025-01-08 20:15:05,543 - INFO - Step 25/1281 (2.0%), loss: 0.7195, learning_rate: 9.99e-05, epoch: 0.0585, step_time: 607.82s, elapsed_time: 3565.58s, grad_norm: 1.2201 |
|
2025-01-08 20:25:08,368 - INFO - Step 30/1281 (2.3%), loss: 0.6904, learning_rate: 9.99e-05, epoch: 0.0702, step_time: 602.82s, elapsed_time: 4168.40s, grad_norm: 0.6741 |
|
2025-01-08 20:35:09,287 - INFO - Step 35/1281 (2.7%), loss: 0.6628, learning_rate: 9.98e-05, epoch: 0.0819, step_time: 600.92s, elapsed_time: 4769.32s, grad_norm: 0.5197 |
|
2025-01-08 20:45:12,080 - INFO - Step 40/1281 (3.1%), loss: 0.6241, learning_rate: 9.98e-05, epoch: 0.0936, step_time: 602.79s, elapsed_time: 5372.12s, grad_norm: 0.4424 |
|
2025-01-08 20:55:15,658 - INFO - Step 45/1281 (3.5%), loss: 0.6229, learning_rate: 9.97e-05, epoch: 0.1053, step_time: 603.58s, elapsed_time: 5975.69s, grad_norm: 0.4752 |
|
2025-01-08 21:05:17,810 - INFO - Step 50/1281 (3.9%), loss: 0.5978, learning_rate: 9.96e-05, epoch: 0.1170, step_time: 602.15s, elapsed_time: 6577.85s, grad_norm: 0.3438 |
|
2025-01-08 21:15:19,753 - INFO - Step 55/1281 (4.3%), loss: 0.5847, learning_rate: 9.95e-05, epoch: 0.1287, step_time: 601.94s, elapsed_time: 7179.79s, grad_norm: 0.3600 |
|
2025-01-08 21:25:21,775 - INFO - Step 60/1281 (4.7%), loss: 0.5686, learning_rate: 9.95e-05, epoch: 0.1404, step_time: 602.02s, elapsed_time: 7781.81s, grad_norm: 0.3749 |
|
2025-01-08 21:35:25,945 - INFO - Step 65/1281 (5.1%), loss: 0.5787, learning_rate: 9.94e-05, epoch: 0.1520, step_time: 604.17s, elapsed_time: 8385.98s, grad_norm: 0.3729 |
|
2025-01-08 21:45:26,630 - INFO - Step 70/1281 (5.5%), loss: 0.5608, learning_rate: 9.93e-05, epoch: 0.1637, step_time: 600.69s, elapsed_time: 8986.67s, grad_norm: 0.3449 |
|
2025-01-08 21:55:29,457 - INFO - Step 75/1281 (5.9%), loss: 0.5192, learning_rate: 9.92e-05, epoch: 0.1754, step_time: 602.83s, elapsed_time: 9589.49s, grad_norm: 0.3919 |
|
2025-01-08 22:05:33,796 - INFO - Step 80/1281 (6.2%), loss: 0.5120, learning_rate: 9.90e-05, epoch: 0.1871, step_time: 604.34s, elapsed_time: 10193.83s, grad_norm: 0.3015 |
|
2025-01-08 22:15:37,562 - INFO - Step 85/1281 (6.6%), loss: 0.4869, learning_rate: 9.89e-05, epoch: 0.1988, step_time: 603.77s, elapsed_time: 10797.60s, grad_norm: 0.2931 |
|
2025-01-08 22:25:41,155 - INFO - Step 90/1281 (7.0%), loss: 0.4632, learning_rate: 9.88e-05, epoch: 0.2105, step_time: 603.59s, elapsed_time: 11401.19s, grad_norm: 0.3108 |
|
2025-01-08 22:35:45,900 - INFO - Step 95/1281 (7.4%), loss: 0.4794, learning_rate: 9.86e-05, epoch: 0.2222, step_time: 604.74s, elapsed_time: 12005.94s, grad_norm: 0.3473 |
|
2025-01-08 22:45:48,393 - INFO - Step 100/1281 (7.8%), loss: 0.4609, learning_rate: 9.85e-05, epoch: 0.2339, step_time: 602.49s, elapsed_time: 12608.43s, grad_norm: 0.2963 |
|
2025-01-08 22:55:51,306 - INFO - Step 105/1281 (8.2%), loss: 0.4842, learning_rate: 9.84e-05, epoch: 0.2456, step_time: 602.91s, elapsed_time: 13211.34s, grad_norm: 0.2883 |
|
2025-01-08 23:05:52,794 - INFO - Step 110/1281 (8.6%), loss: 0.4557, learning_rate: 9.82e-05, epoch: 0.2573, step_time: 601.49s, elapsed_time: 13812.83s, grad_norm: 0.2928 |
|
2025-01-08 23:15:55,888 - INFO - Step 115/1281 (9.0%), loss: 0.4644, learning_rate: 9.80e-05, epoch: 0.2690, step_time: 603.09s, elapsed_time: 14415.92s, grad_norm: 0.2669 |
|
2025-01-08 23:25:57,881 - INFO - Step 120/1281 (9.4%), loss: 0.4490, learning_rate: 9.79e-05, epoch: 0.2807, step_time: 601.99s, elapsed_time: 15017.92s, grad_norm: 0.3591 |
|
2025-01-08 23:35:58,659 - INFO - Step 125/1281 (9.8%), loss: 0.4663, learning_rate: 9.77e-05, epoch: 0.2924, step_time: 600.78s, elapsed_time: 15618.70s, grad_norm: 0.2833 |
|
2025-01-08 23:46:00,173 - INFO - Step 130/1281 (10.1%), loss: 0.4461, learning_rate: 9.75e-05, epoch: 0.3041, step_time: 601.51s, elapsed_time: 16220.21s, grad_norm: 0.2706 |
|
2025-01-08 23:56:01,734 - INFO - Step 135/1281 (10.5%), loss: 0.4481, learning_rate: 9.73e-05, epoch: 0.3158, step_time: 601.56s, elapsed_time: 16821.77s, grad_norm: 0.2958 |
|
2025-01-09 00:06:06,317 - INFO - Step 140/1281 (10.9%), loss: 0.4631, learning_rate: 9.71e-05, epoch: 0.3275, step_time: 604.58s, elapsed_time: 17426.35s, grad_norm: 0.2749 |
|
2025-01-09 00:16:09,098 - INFO - Step 145/1281 (11.3%), loss: 0.4503, learning_rate: 9.69e-05, epoch: 0.3392, step_time: 602.78s, elapsed_time: 18029.13s, grad_norm: 0.3135 |
|
2025-01-09 00:26:11,816 - INFO - Step 150/1281 (11.7%), loss: 0.4389, learning_rate: 9.67e-05, epoch: 0.3509, step_time: 602.72s, elapsed_time: 18631.85s, grad_norm: 0.2961 |
|
2025-01-09 00:36:12,847 - INFO - Step 155/1281 (12.1%), loss: 0.4391, learning_rate: 9.64e-05, epoch: 0.3626, step_time: 601.03s, elapsed_time: 19232.88s, grad_norm: 0.2587 |
|
2025-01-09 00:46:15,889 - INFO - Step 160/1281 (12.5%), loss: 0.4372, learning_rate: 9.62e-05, epoch: 0.3743, step_time: 603.04s, elapsed_time: 19835.93s, grad_norm: 0.2949 |
|
2025-01-09 00:56:18,225 - INFO - Step 165/1281 (12.9%), loss: 0.4333, learning_rate: 9.60e-05, epoch: 0.3860, step_time: 602.34s, elapsed_time: 20438.26s, grad_norm: 0.2650 |
|
2025-01-09 01:06:21,912 - INFO - Step 170/1281 (13.3%), loss: 0.4352, learning_rate: 9.57e-05, epoch: 0.3977, step_time: 603.69s, elapsed_time: 21041.95s, grad_norm: 0.2787 |
|
2025-01-09 01:16:26,354 - INFO - Step 175/1281 (13.7%), loss: 0.4215, learning_rate: 9.55e-05, epoch: 0.4094, step_time: 604.44s, elapsed_time: 21646.39s, grad_norm: 0.2737 |
|
2025-01-09 01:26:27,417 - INFO - Step 180/1281 (14.1%), loss: 0.4382, learning_rate: 9.52e-05, epoch: 0.4211, step_time: 601.06s, elapsed_time: 22247.45s, grad_norm: 0.2691 |
|
2025-01-09 01:36:29,557 - INFO - Step 185/1281 (14.4%), loss: 0.4456, learning_rate: 9.49e-05, epoch: 0.4327, step_time: 602.14s, elapsed_time: 22849.59s, grad_norm: 0.2718 |
|
2025-01-09 01:46:30,681 - INFO - Step 190/1281 (14.8%), loss: 0.4134, learning_rate: 9.47e-05, epoch: 0.4444, step_time: 601.12s, elapsed_time: 23450.72s, grad_norm: 0.2703 |
|
2025-01-09 01:56:32,016 - INFO - Step 195/1281 (15.2%), loss: 0.4200, learning_rate: 9.44e-05, epoch: 0.4561, step_time: 601.33s, elapsed_time: 24052.05s, grad_norm: 0.2519 |
|
2025-01-09 02:06:34,240 - INFO - Step 200/1281 (15.6%), loss: 0.4261, learning_rate: 9.41e-05, epoch: 0.4678, step_time: 602.22s, elapsed_time: 24654.28s, grad_norm: 0.3421 |
|
2025-01-09 02:16:36,844 - INFO - Step 205/1281 (16.0%), loss: 0.3964, learning_rate: 9.38e-05, epoch: 0.4795, step_time: 602.60s, elapsed_time: 25256.88s, grad_norm: 0.2663 |
|
2025-01-09 02:26:39,776 - INFO - Step 210/1281 (16.4%), loss: 0.4266, learning_rate: 9.35e-05, epoch: 0.4912, step_time: 602.93s, elapsed_time: 25859.81s, grad_norm: 0.2692 |
|
2025-01-09 02:36:42,346 - INFO - Step 215/1281 (16.8%), loss: 0.4340, learning_rate: 9.32e-05, epoch: 0.5029, step_time: 602.57s, elapsed_time: 26462.38s, grad_norm: 0.2842 |
|
2025-01-09 02:46:44,912 - INFO - Step 220/1281 (17.2%), loss: 0.4246, learning_rate: 9.29e-05, epoch: 0.5146, step_time: 602.57s, elapsed_time: 27064.95s, grad_norm: 0.4175 |
|
2025-01-09 02:56:48,074 - INFO - Step 225/1281 (17.6%), loss: 0.4436, learning_rate: 9.26e-05, epoch: 0.5263, step_time: 603.16s, elapsed_time: 27668.11s, grad_norm: 0.2852 |
|
2025-01-09 03:06:49,000 - INFO - Step 230/1281 (18.0%), loss: 0.4152, learning_rate: 9.23e-05, epoch: 0.5380, step_time: 600.93s, elapsed_time: 28269.04s, grad_norm: 0.2848 |
|
2025-01-09 03:16:50,893 - INFO - Step 235/1281 (18.3%), loss: 0.4013, learning_rate: 9.19e-05, epoch: 0.5497, step_time: 601.89s, elapsed_time: 28870.93s, grad_norm: 0.2704 |
|
2025-01-09 03:26:53,653 - INFO - Step 240/1281 (18.7%), loss: 0.3941, learning_rate: 9.16e-05, epoch: 0.5614, step_time: 602.76s, elapsed_time: 29473.69s, grad_norm: 0.2616 |
|
2025-01-09 03:36:53,946 - INFO - Step 245/1281 (19.1%), loss: 0.4165, learning_rate: 9.12e-05, epoch: 0.5731, step_time: 600.29s, elapsed_time: 30073.98s, grad_norm: 0.2544 |
|
2025-01-09 03:46:56,614 - INFO - Step 250/1281 (19.5%), loss: 0.4177, learning_rate: 9.09e-05, epoch: 0.5848, step_time: 602.67s, elapsed_time: 30676.65s, grad_norm: 0.2776 |
|
2025-01-09 03:56:57,796 - INFO - Step 255/1281 (19.9%), loss: 0.4018, learning_rate: 9.05e-05, epoch: 0.5965, step_time: 601.18s, elapsed_time: 31277.83s, grad_norm: 0.2499 |
|
2025-01-09 04:07:00,066 - INFO - Step 260/1281 (20.3%), loss: 0.4138, learning_rate: 9.02e-05, epoch: 0.6082, step_time: 602.27s, elapsed_time: 31880.10s, grad_norm: 0.2693 |
|
2025-01-09 04:17:02,224 - INFO - Step 265/1281 (20.7%), loss: 0.3984, learning_rate: 8.98e-05, epoch: 0.6199, step_time: 602.16s, elapsed_time: 32482.26s, grad_norm: 0.2744 |
|
2025-01-09 04:27:03,630 - INFO - Step 270/1281 (21.1%), loss: 0.4269, learning_rate: 8.94e-05, epoch: 0.6316, step_time: 601.41s, elapsed_time: 33083.67s, grad_norm: 0.2762 |
|
2025-01-09 04:37:06,555 - INFO - Step 275/1281 (21.5%), loss: 0.3986, learning_rate: 8.91e-05, epoch: 0.6433, step_time: 602.93s, elapsed_time: 33686.59s, grad_norm: 0.2647 |
|
2025-01-09 04:47:08,811 - INFO - Step 280/1281 (21.9%), loss: 0.4057, learning_rate: 8.87e-05, epoch: 0.6550, step_time: 602.26s, elapsed_time: 34288.85s, grad_norm: 0.2787 |
|
2025-01-09 04:57:11,235 - INFO - Step 285/1281 (22.2%), loss: 0.4143, learning_rate: 8.83e-05, epoch: 0.6667, step_time: 602.42s, elapsed_time: 34891.27s, grad_norm: 0.3001 |
|
2025-01-09 05:07:12,645 - INFO - Step 290/1281 (22.6%), loss: 0.4012, learning_rate: 8.79e-05, epoch: 0.6784, step_time: 601.41s, elapsed_time: 35492.68s, grad_norm: 0.2544 |
|
2025-01-09 05:17:14,293 - INFO - Step 295/1281 (23.0%), loss: 0.3942, learning_rate: 8.75e-05, epoch: 0.6901, step_time: 601.65s, elapsed_time: 36094.33s, grad_norm: 0.2604 |
|
2025-01-09 05:27:17,925 - INFO - Step 300/1281 (23.4%), loss: 0.3974, learning_rate: 8.71e-05, epoch: 0.7018, step_time: 603.63s, elapsed_time: 36697.96s, grad_norm: 0.2718 |
|
2025-01-09 05:37:19,535 - INFO - Step 305/1281 (23.8%), loss: 0.3967, learning_rate: 8.67e-05, epoch: 0.7135, step_time: 601.61s, elapsed_time: 37299.57s, grad_norm: 0.2717 |
|
2025-01-09 05:47:20,092 - INFO - Step 310/1281 (24.2%), loss: 0.3765, learning_rate: 8.62e-05, epoch: 0.7251, step_time: 600.56s, elapsed_time: 37900.13s, grad_norm: 0.2735 |
|
2025-01-09 05:57:20,851 - INFO - Step 315/1281 (24.6%), loss: 0.4131, learning_rate: 8.58e-05, epoch: 0.7368, step_time: 600.76s, elapsed_time: 38500.89s, grad_norm: 0.2609 |
|
2025-01-09 06:07:22,985 - INFO - Step 320/1281 (25.0%), loss: 0.3945, learning_rate: 8.54e-05, epoch: 0.7485, step_time: 602.13s, elapsed_time: 39103.02s, grad_norm: 0.2507 |
|
2025-01-09 06:17:24,449 - INFO - Step 325/1281 (25.4%), loss: 0.3916, learning_rate: 8.49e-05, epoch: 0.7602, step_time: 601.46s, elapsed_time: 39704.49s, grad_norm: 0.2386 |
|
2025-01-09 06:27:25,872 - INFO - Step 330/1281 (25.8%), loss: 0.3894, learning_rate: 8.45e-05, epoch: 0.7719, step_time: 601.42s, elapsed_time: 40305.91s, grad_norm: 0.2645 |
|
2025-01-09 06:37:27,281 - INFO - Step 335/1281 (26.2%), loss: 0.3955, learning_rate: 8.41e-05, epoch: 0.7836, step_time: 601.41s, elapsed_time: 40907.32s, grad_norm: 0.2722 |
|
2025-01-09 06:47:28,321 - INFO - Step 340/1281 (26.5%), loss: 0.3725, learning_rate: 8.36e-05, epoch: 0.7953, step_time: 601.04s, elapsed_time: 41508.36s, grad_norm: 0.2430 |
|
2025-01-09 06:57:30,311 - INFO - Step 345/1281 (26.9%), loss: 0.3883, learning_rate: 8.31e-05, epoch: 0.8070, step_time: 601.99s, elapsed_time: 42110.35s, grad_norm: 0.2525 |
|
2025-01-09 07:07:32,983 - INFO - Step 350/1281 (27.3%), loss: 0.3883, learning_rate: 8.27e-05, epoch: 0.8187, step_time: 602.67s, elapsed_time: 42713.02s, grad_norm: 0.2387 |
|
2025-01-09 07:17:34,098 - INFO - Step 355/1281 (27.7%), loss: 0.3906, learning_rate: 8.22e-05, epoch: 0.8304, step_time: 601.12s, elapsed_time: 43314.13s, grad_norm: 0.2725 |
|
2025-01-09 07:27:37,098 - INFO - Step 360/1281 (28.1%), loss: 0.3751, learning_rate: 8.17e-05, epoch: 0.8421, step_time: 603.00s, elapsed_time: 43917.13s, grad_norm: 0.2814 |
|
2025-01-09 07:37:37,150 - INFO - Step 365/1281 (28.5%), loss: 0.3858, learning_rate: 8.13e-05, epoch: 0.8538, step_time: 600.05s, elapsed_time: 44517.19s, grad_norm: 0.2561 |
|
2025-01-09 07:47:40,487 - INFO - Step 370/1281 (28.9%), loss: 0.3629, learning_rate: 8.08e-05, epoch: 0.8655, step_time: 603.34s, elapsed_time: 45120.52s, grad_norm: 0.2712 |
|
2025-01-09 07:57:41,870 - INFO - Step 375/1281 (29.3%), loss: 0.3733, learning_rate: 8.03e-05, epoch: 0.8772, step_time: 601.38s, elapsed_time: 45721.91s, grad_norm: 0.2457 |
|
2025-01-09 08:07:42,687 - INFO - Step 380/1281 (29.7%), loss: 0.3691, learning_rate: 7.98e-05, epoch: 0.8889, step_time: 600.82s, elapsed_time: 46322.72s, grad_norm: 0.2544 |
|
2025-01-09 08:17:46,148 - INFO - Step 385/1281 (30.1%), loss: 0.3768, learning_rate: 7.93e-05, epoch: 0.9006, step_time: 603.46s, elapsed_time: 46926.18s, grad_norm: 0.2821 |
|
2025-01-09 08:27:49,374 - INFO - Step 390/1281 (30.4%), loss: 0.3914, learning_rate: 7.88e-05, epoch: 0.9123, step_time: 603.23s, elapsed_time: 47529.41s, grad_norm: 0.2370 |
|
2025-01-09 08:37:51,424 - INFO - Step 395/1281 (30.8%), loss: 0.3796, learning_rate: 7.83e-05, epoch: 0.9240, step_time: 602.05s, elapsed_time: 48131.46s, grad_norm: 0.2675 |
|
2025-01-09 08:47:53,337 - INFO - Step 400/1281 (31.2%), loss: 0.3701, learning_rate: 7.78e-05, epoch: 0.9357, step_time: 601.91s, elapsed_time: 48733.37s, grad_norm: 0.2477 |
|
2025-01-09 08:57:57,081 - INFO - Step 405/1281 (31.6%), loss: 0.3703, learning_rate: 7.73e-05, epoch: 0.9474, step_time: 603.74s, elapsed_time: 49337.12s, grad_norm: 0.2288 |
|
2025-01-09 09:07:58,310 - INFO - Step 410/1281 (32.0%), loss: 0.3958, learning_rate: 7.68e-05, epoch: 0.9591, step_time: 601.23s, elapsed_time: 49938.35s, grad_norm: 0.2681 |
|
2025-01-09 09:17:59,916 - INFO - Step 415/1281 (32.4%), loss: 0.3704, learning_rate: 7.63e-05, epoch: 0.9708, step_time: 601.61s, elapsed_time: 50539.95s, grad_norm: 0.2619 |
|
2025-01-09 09:28:02,267 - INFO - Step 420/1281 (32.8%), loss: 0.3609, learning_rate: 7.57e-05, epoch: 0.9825, step_time: 602.35s, elapsed_time: 51142.30s, grad_norm: 0.2586 |
|
2025-01-09 09:38:04,706 - INFO - Step 425/1281 (33.2%), loss: 0.3553, learning_rate: 7.52e-05, epoch: 0.9942, step_time: 602.44s, elapsed_time: 51744.74s, grad_norm: 0.2764 |
|
2025-01-09 11:03:43,855 - INFO - Loss improved from inf to 0.37839 |
|
2025-01-09 11:03:43,855 - INFO - Loss improved from inf to 0.37839 |
|
2025-01-09 11:03:43,855 - INFO - Loss improved from inf to 0.37839 |
|
2025-01-09 11:03:43,856 - INFO - Step 427/1281 (33.3%), epoch: 0.9988, step_time: 5139.15s, elapsed_time: 56883.89s |
|
2025-01-09 11:03:43,858 - INFO - Evaluation Results: |
|
eval_loss: 0.3784 |
|
eval_runtime: 4839.9190 |
|
eval_samples_per_second: 0.3140 |
|
eval_steps_per_second: 0.0790 |
|
epoch: 0.9988 |
|
elapsed_time: 56883.89s |
|
step_time: 5139.15s |
|
2025-01-09 11:03:43,858 - INFO - Loss improved from inf to 0.37839 |
|
2025-01-09 11:07:38,811 - INFO - Saving model to mistral-sft-lora-fsdp2/checkpoint-427/pytorch_model_fsdp_0 |
|
2025-01-09 11:07:41,993 - INFO - Model saved to mistral-sft-lora-fsdp2/checkpoint-427/pytorch_model_fsdp_0 |
|
2025-01-09 11:07:48,542 - INFO - Saving Optimizer state to mistral-sft-lora-fsdp2/checkpoint-427/optimizer_0 |
|
2025-01-09 11:07:54,762 - INFO - Optimizer state saved in mistral-sft-lora-fsdp2/checkpoint-427/optimizer_0 |
|
2025-01-09 11:12:58,976 - INFO - Step 430/1281 (33.6%), loss: 0.3358, learning_rate: 7.47e-05, epoch: 1.0058, step_time: 555.12s, elapsed_time: 57439.01s, grad_norm: 0.2442 |
|
2025-01-09 11:23:00,412 - INFO - Step 435/1281 (34.0%), loss: 0.3080, learning_rate: 7.41e-05, epoch: 1.0175, step_time: 601.44s, elapsed_time: 58040.45s, grad_norm: 0.2875 |
|
2025-01-09 11:33:01,968 - INFO - Step 440/1281 (34.3%), loss: 0.2953, learning_rate: 7.36e-05, epoch: 1.0292, step_time: 601.56s, elapsed_time: 58642.00s, grad_norm: 0.2805 |
|
2025-01-09 11:43:03,497 - INFO - Step 445/1281 (34.7%), loss: 0.3037, learning_rate: 7.31e-05, epoch: 1.0409, step_time: 601.53s, elapsed_time: 59243.53s, grad_norm: 0.2654 |
|
2025-01-09 11:53:07,041 - INFO - Step 450/1281 (35.1%), loss: 0.2832, learning_rate: 7.25e-05, epoch: 1.0526, step_time: 603.54s, elapsed_time: 59847.08s, grad_norm: 0.2852 |
|
2025-01-09 12:03:07,434 - INFO - Step 455/1281 (35.5%), loss: 0.3054, learning_rate: 7.20e-05, epoch: 1.0643, step_time: 600.39s, elapsed_time: 60447.47s, grad_norm: 0.2682 |
|
2025-01-09 12:13:09,870 - INFO - Step 460/1281 (35.9%), loss: 0.3028, learning_rate: 7.14e-05, epoch: 1.0760, step_time: 602.44s, elapsed_time: 61049.91s, grad_norm: 0.2565 |
|
2025-01-09 12:23:11,111 - INFO - Step 465/1281 (36.3%), loss: 0.3113, learning_rate: 7.09e-05, epoch: 1.0877, step_time: 601.24s, elapsed_time: 61651.15s, grad_norm: 0.3189 |
|
2025-01-09 12:33:13,633 - INFO - Step 470/1281 (36.7%), loss: 0.2981, learning_rate: 7.03e-05, epoch: 1.0994, step_time: 602.52s, elapsed_time: 62253.67s, grad_norm: 0.2654 |
|
2025-01-09 12:43:16,153 - INFO - Step 475/1281 (37.1%), loss: 0.2783, learning_rate: 6.97e-05, epoch: 1.1111, step_time: 602.52s, elapsed_time: 62856.19s, grad_norm: 0.2746 |
|
2025-01-09 12:53:18,734 - INFO - Step 480/1281 (37.5%), loss: 0.2974, learning_rate: 6.92e-05, epoch: 1.1228, step_time: 602.58s, elapsed_time: 63458.77s, grad_norm: 0.2595 |
|
2025-01-09 13:03:21,282 - INFO - Step 485/1281 (37.9%), loss: 0.2939, learning_rate: 6.86e-05, epoch: 1.1345, step_time: 602.55s, elapsed_time: 64061.32s, grad_norm: 0.2647 |
|
2025-01-09 13:13:24,387 - INFO - Step 490/1281 (38.3%), loss: 0.2997, learning_rate: 6.80e-05, epoch: 1.1462, step_time: 603.10s, elapsed_time: 64664.42s, grad_norm: 0.2750 |
|
2025-01-09 13:23:27,147 - INFO - Step 495/1281 (38.6%), loss: 0.3256, learning_rate: 6.75e-05, epoch: 1.1579, step_time: 602.76s, elapsed_time: 65267.18s, grad_norm: 0.2724 |
|
2025-01-09 13:33:28,507 - INFO - Step 500/1281 (39.0%), loss: 0.2977, learning_rate: 6.69e-05, epoch: 1.1696, step_time: 601.36s, elapsed_time: 65868.54s, grad_norm: 0.2810 |
|
2025-01-09 13:43:31,959 - INFO - Step 505/1281 (39.4%), loss: 0.2879, learning_rate: 6.63e-05, epoch: 1.1813, step_time: 603.45s, elapsed_time: 66472.00s, grad_norm: 0.2742 |
|
2025-01-09 13:53:34,193 - INFO - Step 510/1281 (39.8%), loss: 0.2938, learning_rate: 6.57e-05, epoch: 1.1930, step_time: 602.23s, elapsed_time: 67074.23s, grad_norm: 0.2618 |
|
2025-01-09 14:03:36,535 - INFO - Step 515/1281 (40.2%), loss: 0.2940, learning_rate: 6.51e-05, epoch: 1.2047, step_time: 602.34s, elapsed_time: 67676.57s, grad_norm: 0.2717 |
|
2025-01-09 14:13:38,835 - INFO - Step 520/1281 (40.6%), loss: 0.2918, learning_rate: 6.46e-05, epoch: 1.2164, step_time: 602.30s, elapsed_time: 68278.87s, grad_norm: 0.2684 |
|
2025-01-09 14:23:39,190 - INFO - Step 525/1281 (41.0%), loss: 0.2867, learning_rate: 6.40e-05, epoch: 1.2281, step_time: 600.35s, elapsed_time: 68879.23s, grad_norm: 0.4385 |
|
2025-01-09 14:33:43,339 - INFO - Step 530/1281 (41.4%), loss: 0.3066, learning_rate: 6.34e-05, epoch: 1.2398, step_time: 604.15s, elapsed_time: 69483.38s, grad_norm: 0.2842 |
|
2025-01-09 14:43:44,186 - INFO - Step 535/1281 (41.8%), loss: 0.2906, learning_rate: 6.28e-05, epoch: 1.2515, step_time: 600.85s, elapsed_time: 70084.22s, grad_norm: 0.3030 |
|
2025-01-09 14:53:45,212 - INFO - Step 540/1281 (42.2%), loss: 0.2804, learning_rate: 6.22e-05, epoch: 1.2632, step_time: 601.03s, elapsed_time: 70685.25s, grad_norm: 0.2722 |
|
2025-01-09 15:03:47,472 - INFO - Step 545/1281 (42.5%), loss: 0.2889, learning_rate: 6.16e-05, epoch: 1.2749, step_time: 602.26s, elapsed_time: 71287.51s, grad_norm: 0.2555 |
|
2025-01-09 15:13:47,833 - INFO - Step 550/1281 (42.9%), loss: 0.3026, learning_rate: 6.10e-05, epoch: 1.2865, step_time: 600.36s, elapsed_time: 71887.87s, grad_norm: 0.3013 |
|
2025-01-09 15:23:48,804 - INFO - Step 555/1281 (43.3%), loss: 0.2852, learning_rate: 6.04e-05, epoch: 1.2982, step_time: 600.97s, elapsed_time: 72488.84s, grad_norm: 0.2799 |
|
2025-01-09 15:33:51,001 - INFO - Step 560/1281 (43.7%), loss: 0.2935, learning_rate: 5.98e-05, epoch: 1.3099, step_time: 602.20s, elapsed_time: 73091.04s, grad_norm: 0.2852 |
|
2025-01-09 15:43:52,840 - INFO - Step 565/1281 (44.1%), loss: 0.3003, learning_rate: 5.92e-05, epoch: 1.3216, step_time: 601.84s, elapsed_time: 73692.88s, grad_norm: 0.2470 |
|
2025-01-09 15:53:55,220 - INFO - Step 570/1281 (44.5%), loss: 0.2917, learning_rate: 5.86e-05, epoch: 1.3333, step_time: 602.38s, elapsed_time: 74295.26s, grad_norm: 0.2715 |
|
2025-01-09 16:03:55,987 - INFO - Step 575/1281 (44.9%), loss: 0.3041, learning_rate: 5.80e-05, epoch: 1.3450, step_time: 600.77s, elapsed_time: 74896.02s, grad_norm: 0.2821 |
|
2025-01-09 16:13:58,521 - INFO - Step 580/1281 (45.3%), loss: 0.2922, learning_rate: 5.74e-05, epoch: 1.3567, step_time: 602.53s, elapsed_time: 75498.56s, grad_norm: 0.2884 |
|
2025-01-09 16:24:01,691 - INFO - Step 585/1281 (45.7%), loss: 0.2804, learning_rate: 5.68e-05, epoch: 1.3684, step_time: 603.17s, elapsed_time: 76101.73s, grad_norm: 0.2801 |
|
2025-01-09 16:34:03,952 - INFO - Step 590/1281 (46.1%), loss: 0.2981, learning_rate: 5.62e-05, epoch: 1.3801, step_time: 602.26s, elapsed_time: 76703.99s, grad_norm: 0.2860 |
|
2025-01-09 16:44:06,819 - INFO - Step 595/1281 (46.4%), loss: 0.2973, learning_rate: 5.56e-05, epoch: 1.3918, step_time: 602.87s, elapsed_time: 77306.86s, grad_norm: 0.2838 |
|
2025-01-09 16:54:08,700 - INFO - Step 600/1281 (46.8%), loss: 0.2949, learning_rate: 5.50e-05, epoch: 1.4035, step_time: 601.88s, elapsed_time: 77908.74s, grad_norm: 0.2911 |
|
2025-01-09 17:04:09,850 - INFO - Step 605/1281 (47.2%), loss: 0.3150, learning_rate: 5.43e-05, epoch: 1.4152, step_time: 601.15s, elapsed_time: 78509.89s, grad_norm: 0.3110 |
|
2025-01-09 17:14:10,730 - INFO - Step 610/1281 (47.6%), loss: 0.2896, learning_rate: 5.37e-05, epoch: 1.4269, step_time: 600.88s, elapsed_time: 79110.77s, grad_norm: 0.2746 |
|
2025-01-09 17:24:13,756 - INFO - Step 615/1281 (48.0%), loss: 0.2915, learning_rate: 5.31e-05, epoch: 1.4386, step_time: 603.03s, elapsed_time: 79713.79s, grad_norm: 0.2786 |
|
2025-01-09 17:34:14,612 - INFO - Step 620/1281 (48.4%), loss: 0.2944, learning_rate: 5.25e-05, epoch: 1.4503, step_time: 600.86s, elapsed_time: 80314.65s, grad_norm: 0.2823 |
|
2025-01-09 17:44:16,318 - INFO - Step 625/1281 (48.8%), loss: 0.2925, learning_rate: 5.19e-05, epoch: 1.4620, step_time: 601.71s, elapsed_time: 80916.35s, grad_norm: 0.2673 |
|
2025-01-09 17:54:19,676 - INFO - Step 630/1281 (49.2%), loss: 0.2960, learning_rate: 5.13e-05, epoch: 1.4737, step_time: 603.36s, elapsed_time: 81519.71s, grad_norm: 0.3164 |
|
2025-01-09 18:04:20,358 - INFO - Step 635/1281 (49.6%), loss: 0.2874, learning_rate: 5.07e-05, epoch: 1.4854, step_time: 600.68s, elapsed_time: 82120.39s, grad_norm: 0.2758 |
|
2025-01-09 18:14:23,288 - INFO - Step 640/1281 (50.0%), loss: 0.2799, learning_rate: 5.01e-05, epoch: 1.4971, step_time: 602.93s, elapsed_time: 82723.32s, grad_norm: 0.2785 |
|
2025-01-09 18:24:25,568 - INFO - Step 645/1281 (50.4%), loss: 0.2918, learning_rate: 4.94e-05, epoch: 1.5088, step_time: 602.28s, elapsed_time: 83325.60s, grad_norm: 0.2667 |
|
2025-01-09 18:34:27,001 - INFO - Step 650/1281 (50.7%), loss: 0.2853, learning_rate: 4.88e-05, epoch: 1.5205, step_time: 601.43s, elapsed_time: 83927.04s, grad_norm: 0.2973 |
|
2025-01-09 18:44:28,148 - INFO - Step 655/1281 (51.1%), loss: 0.2788, learning_rate: 4.82e-05, epoch: 1.5322, step_time: 601.15s, elapsed_time: 84528.18s, grad_norm: 0.2477 |
|
2025-01-09 18:54:28,745 - INFO - Step 660/1281 (51.5%), loss: 0.2985, learning_rate: 4.76e-05, epoch: 1.5439, step_time: 600.60s, elapsed_time: 85128.78s, grad_norm: 0.2741 |
|
2025-01-09 19:04:31,376 - INFO - Step 665/1281 (51.9%), loss: 0.2794, learning_rate: 4.70e-05, epoch: 1.5556, step_time: 602.63s, elapsed_time: 85731.41s, grad_norm: 0.2912 |
|
2025-01-09 19:14:32,769 - INFO - Step 670/1281 (52.3%), loss: 0.2875, learning_rate: 4.64e-05, epoch: 1.5673, step_time: 601.39s, elapsed_time: 86332.81s, grad_norm: 0.3043 |
|
2025-01-09 19:24:33,372 - INFO - Step 675/1281 (52.7%), loss: 0.2828, learning_rate: 4.58e-05, epoch: 1.5789, step_time: 600.60s, elapsed_time: 86933.41s, grad_norm: 0.3901 |
|
2025-01-09 19:34:36,198 - INFO - Step 680/1281 (53.1%), loss: 0.2810, learning_rate: 4.52e-05, epoch: 1.5906, step_time: 602.83s, elapsed_time: 87536.23s, grad_norm: 0.2815 |
|
2025-01-09 19:44:36,788 - INFO - Step 685/1281 (53.5%), loss: 0.2832, learning_rate: 4.46e-05, epoch: 1.6023, step_time: 600.59s, elapsed_time: 88136.82s, grad_norm: 0.2945 |
|
2025-01-09 19:54:39,428 - INFO - Step 690/1281 (53.9%), loss: 0.2659, learning_rate: 4.39e-05, epoch: 1.6140, step_time: 602.64s, elapsed_time: 88739.46s, grad_norm: 0.2763 |
|
2025-01-09 20:04:39,640 - INFO - Step 695/1281 (54.3%), loss: 0.2869, learning_rate: 4.33e-05, epoch: 1.6257, step_time: 600.21s, elapsed_time: 89339.68s, grad_norm: 0.2753 |
|
2025-01-09 20:14:41,110 - INFO - Step 700/1281 (54.6%), loss: 0.2673, learning_rate: 4.27e-05, epoch: 1.6374, step_time: 601.47s, elapsed_time: 89941.15s, grad_norm: 0.2644 |
|
2025-01-09 20:24:42,365 - INFO - Step 705/1281 (55.0%), loss: 0.2802, learning_rate: 4.21e-05, epoch: 1.6491, step_time: 601.26s, elapsed_time: 90542.40s, grad_norm: 0.2740 |
|
2025-01-09 20:34:43,632 - INFO - Step 710/1281 (55.4%), loss: 0.2733, learning_rate: 4.15e-05, epoch: 1.6608, step_time: 601.27s, elapsed_time: 91143.67s, grad_norm: 0.2736 |
|
2025-01-09 20:44:46,402 - INFO - Step 715/1281 (55.8%), loss: 0.2826, learning_rate: 4.09e-05, epoch: 1.6725, step_time: 602.77s, elapsed_time: 91746.44s, grad_norm: 0.2717 |
|
2025-01-09 20:54:48,061 - INFO - Step 720/1281 (56.2%), loss: 0.2846, learning_rate: 4.03e-05, epoch: 1.6842, step_time: 601.66s, elapsed_time: 92348.10s, grad_norm: 0.2715 |
|
2025-01-09 21:04:49,334 - INFO - Step 725/1281 (56.6%), loss: 0.2996, learning_rate: 3.97e-05, epoch: 1.6959, step_time: 601.27s, elapsed_time: 92949.37s, grad_norm: 0.3027 |
|
2025-01-09 21:14:51,437 - INFO - Step 730/1281 (57.0%), loss: 0.2879, learning_rate: 3.91e-05, epoch: 1.7076, step_time: 602.10s, elapsed_time: 93551.47s, grad_norm: 0.3064 |
|
2025-01-09 21:24:53,587 - INFO - Step 735/1281 (57.4%), loss: 0.2848, learning_rate: 3.85e-05, epoch: 1.7193, step_time: 602.15s, elapsed_time: 94153.62s, grad_norm: 0.3223 |
|
2025-01-09 21:34:54,898 - INFO - Step 740/1281 (57.8%), loss: 0.2834, learning_rate: 3.79e-05, epoch: 1.7310, step_time: 601.31s, elapsed_time: 94754.94s, grad_norm: 0.2773 |
|
2025-01-09 21:44:56,298 - INFO - Step 745/1281 (58.2%), loss: 0.2753, learning_rate: 3.73e-05, epoch: 1.7427, step_time: 601.40s, elapsed_time: 95356.33s, grad_norm: 0.2785 |
|
2025-01-09 21:54:57,042 - INFO - Step 750/1281 (58.5%), loss: 0.2887, learning_rate: 3.67e-05, epoch: 1.7544, step_time: 600.74s, elapsed_time: 95957.08s, grad_norm: 0.3198 |
|
2025-01-09 22:04:58,299 - INFO - Step 755/1281 (58.9%), loss: 0.2719, learning_rate: 3.61e-05, epoch: 1.7661, step_time: 601.26s, elapsed_time: 96558.34s, grad_norm: 0.3117 |
|
2025-01-09 22:14:59,715 - INFO - Step 760/1281 (59.3%), loss: 0.2866, learning_rate: 3.56e-05, epoch: 1.7778, step_time: 601.42s, elapsed_time: 97159.75s, grad_norm: 0.2745 |
|
2025-01-09 22:25:01,904 - INFO - Step 765/1281 (59.7%), loss: 0.2792, learning_rate: 3.50e-05, epoch: 1.7895, step_time: 602.19s, elapsed_time: 97761.94s, grad_norm: 0.3148 |
|
2025-01-09 22:35:02,029 - INFO - Step 770/1281 (60.1%), loss: 0.2677, learning_rate: 3.44e-05, epoch: 1.8012, step_time: 600.13s, elapsed_time: 98362.07s, grad_norm: 0.2906 |
|
2025-01-09 22:45:05,149 - INFO - Step 775/1281 (60.5%), loss: 0.2824, learning_rate: 3.38e-05, epoch: 1.8129, step_time: 603.12s, elapsed_time: 98965.19s, grad_norm: 0.3101 |
|
2025-01-09 22:55:07,287 - INFO - Step 780/1281 (60.9%), loss: 0.2657, learning_rate: 3.32e-05, epoch: 1.8246, step_time: 602.14s, elapsed_time: 99567.32s, grad_norm: 0.3029 |
|
2025-01-09 23:05:10,472 - INFO - Step 785/1281 (61.3%), loss: 0.2659, learning_rate: 3.26e-05, epoch: 1.8363, step_time: 603.18s, elapsed_time: 100170.51s, grad_norm: 0.2829 |
|
2025-01-09 23:15:12,554 - INFO - Step 790/1281 (61.7%), loss: 0.2697, learning_rate: 3.21e-05, epoch: 1.8480, step_time: 602.08s, elapsed_time: 100772.59s, grad_norm: 0.2818 |
|
2025-01-09 23:25:14,358 - INFO - Step 795/1281 (62.1%), loss: 0.2718, learning_rate: 3.15e-05, epoch: 1.8596, step_time: 601.80s, elapsed_time: 101374.40s, grad_norm: 0.3118 |
|
2025-01-09 23:35:16,921 - INFO - Step 800/1281 (62.5%), loss: 0.2820, learning_rate: 3.09e-05, epoch: 1.8713, step_time: 602.56s, elapsed_time: 101976.96s, grad_norm: 0.3433 |
|
2025-01-09 23:45:18,670 - INFO - Step 805/1281 (62.8%), loss: 0.2761, learning_rate: 3.04e-05, epoch: 1.8830, step_time: 601.75s, elapsed_time: 102578.71s, grad_norm: 0.2879 |
|
2025-01-09 23:55:19,147 - INFO - Step 810/1281 (63.2%), loss: 0.2694, learning_rate: 2.98e-05, epoch: 1.8947, step_time: 600.48s, elapsed_time: 103179.18s, grad_norm: 0.3026 |
|
2025-01-10 00:05:20,657 - INFO - Step 815/1281 (63.6%), loss: 0.2738, learning_rate: 2.92e-05, epoch: 1.9064, step_time: 601.51s, elapsed_time: 103780.69s, grad_norm: 0.3056 |
|
2025-01-10 00:15:22,407 - INFO - Step 820/1281 (64.0%), loss: 0.2534, learning_rate: 2.87e-05, epoch: 1.9181, step_time: 601.75s, elapsed_time: 104382.44s, grad_norm: 0.2891 |
|
2025-01-10 00:25:24,252 - INFO - Step 825/1281 (64.4%), loss: 0.2655, learning_rate: 2.81e-05, epoch: 1.9298, step_time: 601.85s, elapsed_time: 104984.29s, grad_norm: 0.2814 |
|
2025-01-10 00:35:26,634 - INFO - Step 830/1281 (64.8%), loss: 0.2700, learning_rate: 2.76e-05, epoch: 1.9415, step_time: 602.38s, elapsed_time: 105586.67s, grad_norm: 0.3045 |
|
2025-01-10 00:45:28,532 - INFO - Step 835/1281 (65.2%), loss: 0.2779, learning_rate: 2.70e-05, epoch: 1.9532, step_time: 601.90s, elapsed_time: 106188.57s, grad_norm: 0.2982 |
|
2025-01-10 00:55:31,220 - INFO - Step 840/1281 (65.6%), loss: 0.2525, learning_rate: 2.65e-05, epoch: 1.9649, step_time: 602.69s, elapsed_time: 106791.26s, grad_norm: 0.2658 |
|
2025-01-10 01:05:31,718 - INFO - Step 845/1281 (66.0%), loss: 0.2487, learning_rate: 2.60e-05, epoch: 1.9766, step_time: 600.50s, elapsed_time: 107391.75s, grad_norm: 0.3206 |
|
2025-01-10 01:15:33,025 - INFO - Step 850/1281 (66.4%), loss: 0.2612, learning_rate: 2.54e-05, epoch: 1.9883, step_time: 601.31s, elapsed_time: 107993.06s, grad_norm: 0.3026 |
|
2025-01-10 01:25:34,410 - INFO - Step 855/1281 (66.7%), loss: 0.2661, learning_rate: 2.49e-05, epoch: 2.0000, step_time: 601.38s, elapsed_time: 108594.45s, grad_norm: 0.3019 |
|
2025-01-10 02:46:13,716 - INFO - Loss improved from 0.37839 to 0.34214 |
|
2025-01-10 02:46:13,716 - INFO - Loss improved from 0.37839 to 0.34214 |
|
2025-01-10 02:46:13,716 - INFO - Loss improved from 0.37839 to 0.34214 |
|
2025-01-10 02:46:13,717 - INFO - Step 855/1281 (66.7%), epoch: 2.0000, step_time: 4839.31s, elapsed_time: 113433.75s |
|
2025-01-10 02:46:13,718 - INFO - Evaluation Results: |
|
eval_loss: 0.3421 |
|
eval_runtime: 4839.3029 |
|
eval_samples_per_second: 0.3140 |
|
eval_steps_per_second: 0.0790 |
|
epoch: 2.0000 |
|
elapsed_time: 113433.75s |
|
step_time: 4839.31s |
|
2025-01-10 02:46:13,718 - INFO - Loss improved from 0.37839 to 0.34214 |
|
2025-01-10 02:50:12,806 - INFO - Saving model to mistral-sft-lora-fsdp2/checkpoint-855/pytorch_model_fsdp_0 |
|
2025-01-10 02:50:17,176 - INFO - Model saved to mistral-sft-lora-fsdp2/checkpoint-855/pytorch_model_fsdp_0 |
|
2025-01-10 02:50:23,620 - INFO - Saving Optimizer state to mistral-sft-lora-fsdp2/checkpoint-855/optimizer_0 |
|
2025-01-10 02:50:29,798 - INFO - Optimizer state saved in mistral-sft-lora-fsdp2/checkpoint-855/optimizer_0 |
|
2025-01-10 03:00:33,458 - INFO - Step 860/1281 (67.1%), loss: 0.1981, learning_rate: 2.44e-05, epoch: 2.0117, step_time: 859.74s, elapsed_time: 114293.49s, grad_norm: 0.2535 |
|
2025-01-10 03:10:35,937 - INFO - Step 865/1281 (67.5%), loss: 0.1904, learning_rate: 2.38e-05, epoch: 2.0234, step_time: 602.48s, elapsed_time: 114895.97s, grad_norm: 0.2828 |
|
2025-01-10 03:20:37,191 - INFO - Step 870/1281 (67.9%), loss: 0.2030, learning_rate: 2.33e-05, epoch: 2.0351, step_time: 601.25s, elapsed_time: 115497.23s, grad_norm: 0.3518 |
|
2025-01-10 03:30:37,973 - INFO - Step 875/1281 (68.3%), loss: 0.2036, learning_rate: 2.28e-05, epoch: 2.0468, step_time: 600.78s, elapsed_time: 116098.01s, grad_norm: 0.2859 |
|
2025-01-10 03:40:39,195 - INFO - Step 880/1281 (68.7%), loss: 0.1995, learning_rate: 2.23e-05, epoch: 2.0585, step_time: 601.22s, elapsed_time: 116699.23s, grad_norm: 0.2996 |
|
2025-01-10 03:50:40,593 - INFO - Step 885/1281 (69.1%), loss: 0.2006, learning_rate: 2.18e-05, epoch: 2.0702, step_time: 601.40s, elapsed_time: 117300.63s, grad_norm: 0.3570 |
|
2025-01-10 04:00:43,119 - INFO - Step 890/1281 (69.5%), loss: 0.1934, learning_rate: 2.13e-05, epoch: 2.0819, step_time: 602.53s, elapsed_time: 117903.16s, grad_norm: 0.3005 |
|
2025-01-10 04:10:46,069 - INFO - Step 895/1281 (69.9%), loss: 0.2053, learning_rate: 2.08e-05, epoch: 2.0936, step_time: 602.95s, elapsed_time: 118506.11s, grad_norm: 0.3168 |
|
2025-01-10 04:20:49,127 - INFO - Step 900/1281 (70.3%), loss: 0.1915, learning_rate: 2.03e-05, epoch: 2.1053, step_time: 603.06s, elapsed_time: 119109.16s, grad_norm: 0.3223 |
|
2025-01-10 04:30:50,211 - INFO - Step 905/1281 (70.6%), loss: 0.2029, learning_rate: 1.98e-05, epoch: 2.1170, step_time: 601.08s, elapsed_time: 119710.25s, grad_norm: 0.3195 |
|
2025-01-10 04:40:51,258 - INFO - Step 910/1281 (71.0%), loss: 0.2031, learning_rate: 1.93e-05, epoch: 2.1287, step_time: 601.05s, elapsed_time: 120311.29s, grad_norm: 0.3267 |
|
2025-01-10 04:50:52,216 - INFO - Step 915/1281 (71.4%), loss: 0.1888, learning_rate: 1.88e-05, epoch: 2.1404, step_time: 600.96s, elapsed_time: 120912.25s, grad_norm: 0.2925 |
|
2025-01-10 05:00:55,030 - INFO - Step 920/1281 (71.8%), loss: 0.1937, learning_rate: 1.83e-05, epoch: 2.1520, step_time: 602.81s, elapsed_time: 121515.07s, grad_norm: 0.3355 |
|
2025-01-10 05:10:56,901 - INFO - Step 925/1281 (72.2%), loss: 0.1864, learning_rate: 1.79e-05, epoch: 2.1637, step_time: 601.87s, elapsed_time: 122116.94s, grad_norm: 0.2900 |
|
2025-01-10 05:20:59,543 - INFO - Step 930/1281 (72.6%), loss: 0.1960, learning_rate: 1.74e-05, epoch: 2.1754, step_time: 602.64s, elapsed_time: 122719.58s, grad_norm: 0.3226 |
|
2025-01-10 05:31:00,672 - INFO - Step 935/1281 (73.0%), loss: 0.1967, learning_rate: 1.69e-05, epoch: 2.1871, step_time: 601.13s, elapsed_time: 123320.71s, grad_norm: 0.3261 |
|
2025-01-10 05:41:03,279 - INFO - Step 940/1281 (73.4%), loss: 0.2048, learning_rate: 1.65e-05, epoch: 2.1988, step_time: 602.61s, elapsed_time: 123923.32s, grad_norm: 0.3172 |
|
2025-01-10 05:51:04,095 - INFO - Step 945/1281 (73.8%), loss: 0.1904, learning_rate: 1.60e-05, epoch: 2.2105, step_time: 600.82s, elapsed_time: 124524.13s, grad_norm: 0.3258 |
|
2025-01-10 06:01:06,142 - INFO - Step 950/1281 (74.2%), loss: 0.1891, learning_rate: 1.56e-05, epoch: 2.2222, step_time: 602.05s, elapsed_time: 125126.18s, grad_norm: 0.3249 |
|
2025-01-10 06:11:07,571 - INFO - Step 955/1281 (74.6%), loss: 0.2034, learning_rate: 1.51e-05, epoch: 2.2339, step_time: 601.43s, elapsed_time: 125727.61s, grad_norm: 0.3496 |
|
2025-01-10 06:21:09,096 - INFO - Step 960/1281 (74.9%), loss: 0.1983, learning_rate: 1.47e-05, epoch: 2.2456, step_time: 601.52s, elapsed_time: 126329.13s, grad_norm: 0.3015 |
|
2025-01-10 06:31:10,508 - INFO - Step 965/1281 (75.3%), loss: 0.1911, learning_rate: 1.43e-05, epoch: 2.2573, step_time: 601.41s, elapsed_time: 126930.55s, grad_norm: 0.3006 |
|
2025-01-10 06:41:12,406 - INFO - Step 970/1281 (75.7%), loss: 0.1988, learning_rate: 1.39e-05, epoch: 2.2690, step_time: 601.90s, elapsed_time: 127532.44s, grad_norm: 0.3315 |
|
2025-01-10 06:51:14,939 - INFO - Step 975/1281 (76.1%), loss: 0.1972, learning_rate: 1.34e-05, epoch: 2.2807, step_time: 602.53s, elapsed_time: 128134.98s, grad_norm: 0.3325 |
|
2025-01-10 07:01:15,346 - INFO - Step 980/1281 (76.5%), loss: 0.1930, learning_rate: 1.30e-05, epoch: 2.2924, step_time: 600.41s, elapsed_time: 128735.38s, grad_norm: 0.3046 |
|
2025-01-10 07:11:15,654 - INFO - Step 985/1281 (76.9%), loss: 0.1871, learning_rate: 1.26e-05, epoch: 2.3041, step_time: 600.31s, elapsed_time: 129335.69s, grad_norm: 0.3085 |
|
2025-01-10 07:21:19,118 - INFO - Step 990/1281 (77.3%), loss: 0.1877, learning_rate: 1.22e-05, epoch: 2.3158, step_time: 603.46s, elapsed_time: 129939.15s, grad_norm: 0.3576 |
|
2025-01-10 07:31:19,654 - INFO - Step 995/1281 (77.7%), loss: 0.1906, learning_rate: 1.18e-05, epoch: 2.3275, step_time: 600.54s, elapsed_time: 130539.69s, grad_norm: 0.3149 |
|
2025-01-10 07:41:20,229 - INFO - Step 1000/1281 (78.1%), loss: 0.1925, learning_rate: 1.14e-05, epoch: 2.3392, step_time: 600.58s, elapsed_time: 131140.27s, grad_norm: 0.3455 |
|
2025-01-10 07:51:20,945 - INFO - Step 1005/1281 (78.5%), loss: 0.1873, learning_rate: 1.10e-05, epoch: 2.3509, step_time: 600.72s, elapsed_time: 131740.98s, grad_norm: 0.3264 |
|
2025-01-10 08:01:21,404 - INFO - Step 1010/1281 (78.8%), loss: 0.1980, learning_rate: 1.06e-05, epoch: 2.3626, step_time: 600.46s, elapsed_time: 132341.44s, grad_norm: 0.3268 |
|
2025-01-10 08:11:22,239 - INFO - Step 1015/1281 (79.2%), loss: 0.1890, learning_rate: 1.03e-05, epoch: 2.3743, step_time: 600.83s, elapsed_time: 132942.28s, grad_norm: 0.3394 |
|
2025-01-10 08:21:25,622 - INFO - Step 1020/1281 (79.6%), loss: 0.1955, learning_rate: 9.90e-06, epoch: 2.3860, step_time: 603.38s, elapsed_time: 133545.66s, grad_norm: 0.3263 |
|
2025-01-10 08:31:27,973 - INFO - Step 1025/1281 (80.0%), loss: 0.1950, learning_rate: 9.53e-06, epoch: 2.3977, step_time: 602.35s, elapsed_time: 134148.01s, grad_norm: 0.3396 |
|
2025-01-10 08:41:29,715 - INFO - Step 1030/1281 (80.4%), loss: 0.1888, learning_rate: 9.18e-06, epoch: 2.4094, step_time: 601.74s, elapsed_time: 134749.75s, grad_norm: 0.3267 |
|
2025-01-10 08:51:31,975 - INFO - Step 1035/1281 (80.8%), loss: 0.1944, learning_rate: 8.83e-06, epoch: 2.4211, step_time: 602.26s, elapsed_time: 135352.01s, grad_norm: 0.3298 |
|
2025-01-10 09:01:34,084 - INFO - Step 1040/1281 (81.2%), loss: 0.1856, learning_rate: 8.48e-06, epoch: 2.4327, step_time: 602.11s, elapsed_time: 135954.12s, grad_norm: 0.3550 |
|
2025-01-10 09:11:35,933 - INFO - Step 1045/1281 (81.6%), loss: 0.1942, learning_rate: 8.14e-06, epoch: 2.4444, step_time: 601.85s, elapsed_time: 136555.97s, grad_norm: 0.3638 |
|
2025-01-10 09:21:40,343 - INFO - Step 1050/1281 (82.0%), loss: 0.1835, learning_rate: 7.81e-06, epoch: 2.4561, step_time: 604.41s, elapsed_time: 137160.38s, grad_norm: 0.3285 |
|
2025-01-10 09:31:42,184 - INFO - Step 1055/1281 (82.4%), loss: 0.1959, learning_rate: 7.49e-06, epoch: 2.4678, step_time: 601.84s, elapsed_time: 137762.22s, grad_norm: 0.3284 |
|
2025-01-10 09:41:44,621 - INFO - Step 1060/1281 (82.7%), loss: 0.1811, learning_rate: 7.17e-06, epoch: 2.4795, step_time: 602.44s, elapsed_time: 138364.66s, grad_norm: 0.3051 |
|
2025-01-10 09:51:46,469 - INFO - Step 1065/1281 (83.1%), loss: 0.1876, learning_rate: 6.85e-06, epoch: 2.4912, step_time: 601.85s, elapsed_time: 138966.51s, grad_norm: 0.3312 |
|
2025-01-10 10:01:48,298 - INFO - Step 1070/1281 (83.5%), loss: 0.1927, learning_rate: 6.55e-06, epoch: 2.5029, step_time: 601.83s, elapsed_time: 139568.33s, grad_norm: 0.3291 |
|
2025-01-10 10:11:51,069 - INFO - Step 1075/1281 (83.9%), loss: 0.1961, learning_rate: 6.25e-06, epoch: 2.5146, step_time: 602.77s, elapsed_time: 140171.11s, grad_norm: 0.2910 |
|
2025-01-10 10:21:52,197 - INFO - Step 1080/1281 (84.3%), loss: 0.1825, learning_rate: 5.95e-06, epoch: 2.5263, step_time: 601.13s, elapsed_time: 140772.23s, grad_norm: 0.3381 |
|
2025-01-10 10:31:55,103 - INFO - Step 1085/1281 (84.7%), loss: 0.1896, learning_rate: 5.67e-06, epoch: 2.5380, step_time: 602.91s, elapsed_time: 141375.14s, grad_norm: 0.3015 |
|
2025-01-10 10:41:57,124 - INFO - Step 1090/1281 (85.1%), loss: 0.1976, learning_rate: 5.39e-06, epoch: 2.5497, step_time: 602.02s, elapsed_time: 141977.16s, grad_norm: 0.3509 |
|
2025-01-10 10:51:58,065 - INFO - Step 1095/1281 (85.5%), loss: 0.1839, learning_rate: 5.11e-06, epoch: 2.5614, step_time: 600.94s, elapsed_time: 142578.10s, grad_norm: 0.3144 |
|
2025-01-10 11:02:00,662 - INFO - Step 1100/1281 (85.9%), loss: 0.1778, learning_rate: 4.85e-06, epoch: 2.5731, step_time: 602.60s, elapsed_time: 143180.70s, grad_norm: 0.3260 |
|
2025-01-10 11:12:01,817 - INFO - Step 1105/1281 (86.3%), loss: 0.1822, learning_rate: 4.59e-06, epoch: 2.5848, step_time: 601.15s, elapsed_time: 143781.85s, grad_norm: 0.3291 |
|
2025-01-10 11:22:03,278 - INFO - Step 1110/1281 (86.7%), loss: 0.1841, learning_rate: 4.33e-06, epoch: 2.5965, step_time: 601.46s, elapsed_time: 144383.31s, grad_norm: 0.3480 |
|
2025-01-10 11:32:05,794 - INFO - Step 1115/1281 (87.0%), loss: 0.1824, learning_rate: 4.09e-06, epoch: 2.6082, step_time: 602.52s, elapsed_time: 144985.83s, grad_norm: 0.3349 |
|
2025-01-10 11:42:08,498 - INFO - Step 1120/1281 (87.4%), loss: 0.1849, learning_rate: 3.85e-06, epoch: 2.6199, step_time: 602.70s, elapsed_time: 145588.53s, grad_norm: 0.3462 |
|
2025-01-10 11:52:09,441 - INFO - Step 1125/1281 (87.8%), loss: 0.1916, learning_rate: 3.61e-06, epoch: 2.6316, step_time: 600.94s, elapsed_time: 146189.48s, grad_norm: 0.3300 |
|
2025-01-10 12:02:11,015 - INFO - Step 1130/1281 (88.2%), loss: 0.1903, learning_rate: 3.39e-06, epoch: 2.6433, step_time: 601.57s, elapsed_time: 146791.05s, grad_norm: 0.3152 |
|
2025-01-10 12:12:13,603 - INFO - Step 1135/1281 (88.6%), loss: 0.1799, learning_rate: 3.17e-06, epoch: 2.6550, step_time: 602.59s, elapsed_time: 147393.64s, grad_norm: 0.3367 |
|
2025-01-10 12:22:13,925 - INFO - Step 1140/1281 (89.0%), loss: 0.1908, learning_rate: 2.96e-06, epoch: 2.6667, step_time: 600.32s, elapsed_time: 147993.96s, grad_norm: 0.3596 |
|
2025-01-10 12:32:15,406 - INFO - Step 1145/1281 (89.4%), loss: 0.1987, learning_rate: 2.76e-06, epoch: 2.6784, step_time: 601.48s, elapsed_time: 148595.44s, grad_norm: 0.3376 |
|
2025-01-10 12:42:16,184 - INFO - Step 1150/1281 (89.8%), loss: 0.1920, learning_rate: 2.56e-06, epoch: 2.6901, step_time: 600.78s, elapsed_time: 149196.22s, grad_norm: 0.3377 |
|
2025-01-10 12:52:16,533 - INFO - Step 1155/1281 (90.2%), loss: 0.1752, learning_rate: 2.37e-06, epoch: 2.7018, step_time: 600.35s, elapsed_time: 149796.57s, grad_norm: 0.3113 |
|
2025-01-10 13:02:19,791 - INFO - Step 1160/1281 (90.6%), loss: 0.1866, learning_rate: 2.19e-06, epoch: 2.7135, step_time: 603.26s, elapsed_time: 150399.83s, grad_norm: 0.3357 |
|
2025-01-10 13:12:20,485 - INFO - Step 1165/1281 (90.9%), loss: 0.1839, learning_rate: 2.01e-06, epoch: 2.7251, step_time: 600.69s, elapsed_time: 151000.52s, grad_norm: 0.3241 |
|
2025-01-10 13:22:21,568 - INFO - Step 1170/1281 (91.3%), loss: 0.1845, learning_rate: 1.84e-06, epoch: 2.7368, step_time: 601.08s, elapsed_time: 151601.60s, grad_norm: 0.3287 |
|
2025-01-10 13:32:22,990 - INFO - Step 1175/1281 (91.7%), loss: 0.1898, learning_rate: 1.68e-06, epoch: 2.7485, step_time: 601.42s, elapsed_time: 152203.03s, grad_norm: 0.3489 |
|
2025-01-10 13:42:24,898 - INFO - Step 1180/1281 (92.1%), loss: 0.1817, learning_rate: 1.53e-06, epoch: 2.7602, step_time: 601.91s, elapsed_time: 152804.93s, grad_norm: 0.3123 |
|
2025-01-10 13:52:27,715 - INFO - Step 1185/1281 (92.5%), loss: 0.1853, learning_rate: 1.38e-06, epoch: 2.7719, step_time: 602.82s, elapsed_time: 153407.75s, grad_norm: 0.3164 |
|
2025-01-10 14:02:28,431 - INFO - Step 1190/1281 (92.9%), loss: 0.1897, learning_rate: 1.24e-06, epoch: 2.7836, step_time: 600.72s, elapsed_time: 154008.47s, grad_norm: 0.3673 |
|
2025-01-10 14:12:28,620 - INFO - Step 1195/1281 (93.3%), loss: 0.1867, learning_rate: 1.11e-06, epoch: 2.7953, step_time: 600.19s, elapsed_time: 154608.66s, grad_norm: 0.3569 |
|
2025-01-10 14:22:29,665 - INFO - Step 1200/1281 (93.7%), loss: 0.1781, learning_rate: 9.83e-07, epoch: 2.8070, step_time: 601.05s, elapsed_time: 155209.70s, grad_norm: 0.3389 |
|
2025-01-10 14:32:31,064 - INFO - Step 1205/1281 (94.1%), loss: 0.1863, learning_rate: 8.66e-07, epoch: 2.8187, step_time: 601.40s, elapsed_time: 155811.10s, grad_norm: 0.3246 |
|
2025-01-10 14:42:31,726 - INFO - Step 1210/1281 (94.5%), loss: 0.1864, learning_rate: 7.56e-07, epoch: 2.8304, step_time: 600.66s, elapsed_time: 156411.76s, grad_norm: 0.3294 |
|
2025-01-10 14:52:31,923 - INFO - Step 1215/1281 (94.8%), loss: 0.1929, learning_rate: 6.54e-07, epoch: 2.8421, step_time: 600.20s, elapsed_time: 157011.96s, grad_norm: 0.3789 |
|
2025-01-10 15:02:33,206 - INFO - Step 1220/1281 (95.2%), loss: 0.1804, learning_rate: 5.58e-07, epoch: 2.8538, step_time: 601.28s, elapsed_time: 157613.24s, grad_norm: 0.2890 |
|
2025-01-10 15:12:33,869 - INFO - Step 1225/1281 (95.6%), loss: 0.1865, learning_rate: 4.71e-07, epoch: 2.8655, step_time: 600.66s, elapsed_time: 158213.91s, grad_norm: 0.3188 |
|
2025-01-10 15:22:36,112 - INFO - Step 1230/1281 (96.0%), loss: 0.1820, learning_rate: 3.91e-07, epoch: 2.8772, step_time: 602.24s, elapsed_time: 158816.15s, grad_norm: 0.3491 |
|
2025-01-10 15:32:37,318 - INFO - Step 1235/1281 (96.4%), loss: 0.1949, learning_rate: 3.18e-07, epoch: 2.8889, step_time: 601.21s, elapsed_time: 159417.35s, grad_norm: 0.3490 |
|
2025-01-10 15:42:39,200 - INFO - Step 1240/1281 (96.8%), loss: 0.1833, learning_rate: 2.53e-07, epoch: 2.9006, step_time: 601.88s, elapsed_time: 160019.24s, grad_norm: 0.3386 |
|
2025-01-10 15:52:42,532 - INFO - Step 1245/1281 (97.2%), loss: 0.1816, learning_rate: 1.95e-07, epoch: 2.9123, step_time: 603.33s, elapsed_time: 160622.57s, grad_norm: 0.3464 |
|
2025-01-10 16:02:44,549 - INFO - Step 1250/1281 (97.6%), loss: 0.1901, learning_rate: 1.44e-07, epoch: 2.9240, step_time: 602.02s, elapsed_time: 161224.59s, grad_norm: 0.3249 |
|
2025-01-10 16:12:45,583 - INFO - Step 1255/1281 (98.0%), loss: 0.1850, learning_rate: 1.02e-07, epoch: 2.9357, step_time: 601.03s, elapsed_time: 161825.62s, grad_norm: 0.3222 |
|
2025-01-10 16:22:47,835 - INFO - Step 1260/1281 (98.4%), loss: 0.1908, learning_rate: 6.63e-08, epoch: 2.9474, step_time: 602.25s, elapsed_time: 162427.87s, grad_norm: 0.3366 |
|
2025-01-10 16:32:50,447 - INFO - Step 1265/1281 (98.8%), loss: 0.1855, learning_rate: 3.85e-08, epoch: 2.9591, step_time: 602.61s, elapsed_time: 163030.48s, grad_norm: 0.3383 |
|
2025-01-10 16:42:52,016 - INFO - Step 1270/1281 (99.1%), loss: 0.1799, learning_rate: 1.82e-08, epoch: 2.9708, step_time: 601.57s, elapsed_time: 163632.05s, grad_norm: 0.3145 |
|
2025-01-10 16:52:54,137 - INFO - Step 1275/1281 (99.5%), loss: 0.1845, learning_rate: 5.41e-09, epoch: 2.9825, step_time: 602.12s, elapsed_time: 164234.17s, grad_norm: 0.3497 |
|
2025-01-10 17:02:55,809 - INFO - Step 1280/1281 (99.9%), loss: 0.1865, learning_rate: 1.50e-10, epoch: 2.9942, step_time: 601.67s, elapsed_time: 164835.85s, grad_norm: 0.3328 |
|
2025-01-10 17:08:53,883 - INFO - Saving model to mistral-sft-lora-fsdp2/checkpoint-1281/pytorch_model_fsdp_0 |
|
2025-01-10 17:08:57,686 - INFO - Model saved to mistral-sft-lora-fsdp2/checkpoint-1281/pytorch_model_fsdp_0 |
|
2025-01-10 17:09:04,668 - INFO - Saving Optimizer state to mistral-sft-lora-fsdp2/checkpoint-1281/optimizer_0 |
|
2025-01-10 17:09:10,610 - INFO - Optimizer state saved in mistral-sft-lora-fsdp2/checkpoint-1281/optimizer_0 |
|
2025-01-10 18:29:51,495 - INFO - Loss didn't improve. Patience: 1/1 |
|
2025-01-10 18:29:51,495 - INFO - Loss didn't improve. Patience: 1/1 |
|
2025-01-10 18:29:51,495 - INFO - Loss didn't improve. Patience: 1/1 |
|
2025-01-10 18:29:51,496 - INFO - Early stopping triggered! |
|
2025-01-10 18:29:51,496 - INFO - Early stopping triggered! |
|
2025-01-10 18:29:51,496 - INFO - Early stopping triggered! |
|
2025-01-10 18:29:51,496 - INFO - Step 1281/1281 (100.0%), epoch: 2.9965, step_time: 5215.69s, elapsed_time: 170051.53s |
|
2025-01-10 18:29:51,498 - INFO - Evaluation Results: |
|
eval_loss: 0.3573 |
|
eval_runtime: 4839.9044 |
|
eval_samples_per_second: 0.3140 |
|
eval_steps_per_second: 0.0790 |
|
epoch: 2.9965 |
|
elapsed_time: 170051.53s |
|
step_time: 5215.69s |
|
2025-01-10 18:29:51,498 - INFO - Loss didn't improve. Patience: 1/1 |
|
2025-01-10 18:29:51,498 - INFO - Early stopping triggered! |
|
2025-01-10 18:33:52,589 - INFO - Saving model to mistral-sft-lora-fsdp2/checkpoint-1281/pytorch_model_fsdp_0 |
|
2025-01-10 18:33:56,991 - INFO - Model saved to mistral-sft-lora-fsdp2/checkpoint-1281/pytorch_model_fsdp_0 |
|
2025-01-10 18:34:03,411 - INFO - Saving Optimizer state to mistral-sft-lora-fsdp2/checkpoint-1281/optimizer_0 |
|
2025-01-10 18:34:09,745 - INFO - Optimizer state saved in mistral-sft-lora-fsdp2/checkpoint-1281/optimizer_0 |
|
2025-01-10 18:34:10,189 - INFO - Step 1281/1281 (100.0%), epoch: 2.9965, step_time: 258.69s, elapsed_time: 170310.23s |
|
2025-01-10 18:34:10,190 - INFO - Training completed in 170310.23 seconds |
|
|