mistral-sft-lora-fsdp2 / logs /training_log.txt
Adil1567's picture
Model save
efaeb13 verified
raw
history blame
6.85 kB
2025-01-08 18:29:22,070 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat -fno-strict-overflow -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -c /tmp/tmpw2j4jae_/test.c -o /tmp/tmpw2j4jae_/test.o
2025-01-08 18:29:22,097 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat /tmp/tmpw2j4jae_/test.o -laio -o /tmp/tmpw2j4jae_/a.out
2025-01-08 18:29:22,252 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat -fno-strict-overflow -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -c /tmp/tmp3xs2q_w0/test.c -o /tmp/tmp3xs2q_w0/test.o
2025-01-08 18:29:22,279 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat /tmp/tmp3xs2q_w0/test.o -laio -o /tmp/tmp3xs2q_w0/a.out
2025-01-08 18:29:22,281 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat -fno-strict-overflow -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -c /tmp/tmpoehuhgbl/test.c -o /tmp/tmpoehuhgbl/test.o
2025-01-08 18:29:22,307 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat /tmp/tmpoehuhgbl/test.o -laio -o /tmp/tmpoehuhgbl/a.out
2025-01-08 18:29:22,311 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat -fno-strict-overflow -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -c /tmp/tmp5eog_7fp/test.c -o /tmp/tmp5eog_7fp/test.o
2025-01-08 18:29:22,334 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat /tmp/tmp5eog_7fp/test.o -laio -o /tmp/tmp5eog_7fp/a.out
2025-01-08 18:29:22,519 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat -fno-strict-overflow -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -c /tmp/tmp7o4d197o/test.c -o /tmp/tmp7o4d197o/test.o
2025-01-08 18:29:22,545 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat /tmp/tmp7o4d197o/test.o -L/usr/local/cuda -L/usr/local/cuda/lib64 -lcufile -o /tmp/tmp7o4d197o/a.out
2025-01-08 18:29:22,683 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat -fno-strict-overflow -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -c /tmp/tmpskkmpgdv/test.c -o /tmp/tmpskkmpgdv/test.o
2025-01-08 18:29:22,710 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat /tmp/tmpskkmpgdv/test.o -L/usr/local/cuda -L/usr/local/cuda/lib64 -lcufile -o /tmp/tmpskkmpgdv/a.out
2025-01-08 18:29:22,759 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat -fno-strict-overflow -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -c /tmp/tmp0i19mv7y/test.c -o /tmp/tmp0i19mv7y/test.o
2025-01-08 18:29:22,778 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat -fno-strict-overflow -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -O2 -isystem /root/anaconda3/envs/faiss_1.8.0/include -fPIC -c /tmp/tmptzck4dvd/test.c -o /tmp/tmptzck4dvd/test.o
2025-01-08 18:29:22,785 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat /tmp/tmp0i19mv7y/test.o -L/usr/local/cuda -L/usr/local/cuda/lib64 -lcufile -o /tmp/tmp0i19mv7y/a.out
2025-01-08 18:29:22,795 - INFO - gcc -pthread -B /root/anaconda3/envs/faiss_1.8.0/compiler_compat /tmp/tmptzck4dvd/test.o -L/usr/local/cuda -L/usr/local/cuda/lib64 -lcufile -o /tmp/tmptzck4dvd/a.out
2025-01-08 18:34:07,128 - INFO - Training started
2025-01-08 18:34:07,129 - INFO - Total steps: 2
2025-01-08 18:37:14,796 - INFO - Loss improved from inf to 1.98041
2025-01-08 18:37:14,796 - INFO - Loss improved from inf to 1.98041
2025-01-08 18:37:14,797 - INFO - Loss improved from inf to 1.98041
2025-01-08 18:37:14,798 - INFO - Step 1/2 (50.0%), epoch: 1.0000, step_time: 571.32s, elapsed_time: 571.32s
2025-01-08 18:37:14,799 - INFO - Evaluation Results:
eval_loss: 1.9804
eval_runtime: 24.9974
eval_samples_per_second: 0.3200
eval_steps_per_second: 0.0800
epoch: 1.0000
elapsed_time: 571.32s
step_time: 571.32s
2025-01-08 18:37:14,799 - INFO - Loss improved from inf to 1.98041
2025-01-08 18:40:40,756 - INFO - Saving model to mistral-sft-lora-fsdp2/checkpoint-1/pytorch_model_fsdp_0
2025-01-08 18:40:44,085 - INFO - Model saved to mistral-sft-lora-fsdp2/checkpoint-1/pytorch_model_fsdp_0
2025-01-08 18:40:50,139 - INFO - Saving Optimizer state to mistral-sft-lora-fsdp2/checkpoint-1/optimizer_0
2025-01-08 18:40:56,423 - INFO - Optimizer state saved in mistral-sft-lora-fsdp2/checkpoint-1/optimizer_0
2025-01-08 18:44:56,103 - INFO - Saving model to mistral-sft-lora-fsdp2/checkpoint-2/pytorch_model_fsdp_0
2025-01-08 18:44:59,225 - INFO - Model saved to mistral-sft-lora-fsdp2/checkpoint-2/pytorch_model_fsdp_0
2025-01-08 18:45:05,105 - INFO - Saving Optimizer state to mistral-sft-lora-fsdp2/checkpoint-2/optimizer_0
2025-01-08 18:45:11,104 - INFO - Optimizer state saved in mistral-sft-lora-fsdp2/checkpoint-2/optimizer_0
2025-01-08 18:45:36,527 - INFO - Loss improved from 1.98041 to 1.83309
2025-01-08 18:45:36,527 - INFO - Loss improved from 1.98041 to 1.83309
2025-01-08 18:45:36,527 - INFO - Loss improved from 1.98041 to 1.83309
2025-01-08 18:45:36,528 - INFO - Step 2/2 (100.0%), epoch: 2.0000, step_time: 501.73s, elapsed_time: 1073.05s
2025-01-08 18:45:36,529 - INFO - Evaluation Results:
eval_loss: 1.8331
eval_runtime: 25.1685
eval_samples_per_second: 0.3180
eval_steps_per_second: 0.0790
epoch: 2.0000
elapsed_time: 1073.05s
step_time: 501.73s
2025-01-08 18:45:36,529 - INFO - Loss improved from 1.98041 to 1.83309
2025-01-08 18:48:59,163 - INFO - Saving model to mistral-sft-lora-fsdp2/checkpoint-2/pytorch_model_fsdp_0
2025-01-08 18:49:02,615 - INFO - Model saved to mistral-sft-lora-fsdp2/checkpoint-2/pytorch_model_fsdp_0
2025-01-08 18:49:08,850 - INFO - Saving Optimizer state to mistral-sft-lora-fsdp2/checkpoint-2/optimizer_0
2025-01-08 18:49:15,280 - INFO - Optimizer state saved in mistral-sft-lora-fsdp2/checkpoint-2/optimizer_0
2025-01-08 18:49:15,799 - INFO - Step 2/2 (100.0%), epoch: 2.0000, step_time: 219.27s, elapsed_time: 1292.32s
2025-01-08 18:49:15,801 - INFO - Training completed in 1292.32 seconds