|
``` |
|
|
|
# code |
|
https://huggingface.co/vincentoh/llama3_70b_no_robot_fsdp_qlora |
|
|
|
|
|
# model |
|
wget "https://huggingface.co/vincentoh/llama3-70b-GGUF/blob/main/vincentoh/llama3-70b-GGUF" |
|
|
|
# memory usage |
|
Thu May 16 15:53:07 2024 |
|
+---------------------------------------------------------------------------------------+ |
|
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 | |
|
|-----------------------------------------+----------------------+----------------------+ |
|
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | |
|
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | |
|
| | | MIG M. | |
|
|=========================================+======================+======================| |
|
| 0 NVIDIA H100 PCIe On | 00000000:08:00.0 Off | 0 | |
|
| N/A 37C P0 76W / 350W | 40441MiB / 81559MiB | 24% Default | |
|
| | | Disabled | |
|
+-----------------------------------------+----------------------+----------------------+ |
|
|
|
+---------------------------------------------------------------------------------------+ |
|
| Processes: | |
|
| GPU GI CI PID Type Process name GPU Memory | |
|
| ID ID Usage | |
|
|=======================================================================================| |
|
| 0 N/A N/A 17735 C ./main 40428MiB | |
|
+---------------------------------------------------------------------------------------+ |
|
|
|
|
|
# token speed |
|
<|begin_of_text|>Why is the sky blue? The sky is blue due to a phenomenon called Rayleigh scattering. This scattering refers to the scattering of electromagnetic radiation (light) by particles much smaller than the wavelength of the light. The short-wavelength blue light is scattered more than the other colors of visible light, resulting in more blue light reaching the observer than the other colors of light.<|end_of_text|> [end of text] |
|
|
|
llama_print_timings: load time = 6244.37 ms |
|
llama_print_timings: sample time = 4.39 ms / 69 runs ( 0.06 ms per token, 15710.38 tokens per second) |
|
llama_print_timings: prompt eval time = 90.86 ms / 7 tokens ( 12.98 ms per token, 77.05 tokens per second) |
|
llama_print_timings: eval time = 2334.73 ms / 68 runs ( 34.33 ms per token, 29.13 tokens per second) |
|
llama_print_timings: total time = 2486.72 ms / 75 tokens |
|
Log end |
|
|
|
|
|
``` |