|
[I1021 17:17:47.067897003 debug.cpp:49] [c10d] The debug level is set to INFO. |
|
/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash: |
|
No module named 'vllm._version' |
|
from vllm.version import __version__ as VLLM_VERSION |
|
llama3.1 |
|
***************************** |
|
Namespace(cot_trigger_no=1, dataset='bd_math', data_path='bd_math_test.json', batch_size=64, eval_method='', model_path='../../Llama-3.1-8B/', model_type='llama3.1', output_dir='generate_result/zero_shot/bd_math/default/llama3.1/1/', lora_path='', method='zero_shot', data_question_key='question', data_answer_key='answer', sample_num=1, cuda_ind=0, tensor_parallel=1, cuda_start=0, cuda_num=8, load_in_8bit=False, rewrite=False, use_typewriter=0, temperature=0.7, top_p=1, iter_max_new_tokens=512, init_max_new_tokens=2048, min_new_tokens=1, correct_response_format='The correct response is:', cot_trigger="Let's think step by step.") |
|
***************************** |
|
WARNING 10-21 17:17:53 arg_utils.py:953] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False. |
|
INFO 10-21 17:17:53 config.py:1005] Chunked prefill is enabled with max_num_batched_tokens=512. |
|
INFO 10-21 17:17:53 llm_engine.py:237] Initializing an LLM engine (vdev) with config: model='../../Llama-3.1-8B/', speculative_config=None, tokenizer='../../Llama-3.1-8B/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=../../Llama-3.1-8B/, use_v2_block_manager=True, num_scheduler_steps=1, chunked_prefill_enabled=True multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, mm_processor_kwargs=None) |
|
[I1021 17:17:55.299354664 TCPStore.cpp:312] [c10d - debug] The server has started on port = 39457. |
|
[I1021 17:17:55.299507279 TCPStoreLibUvBackend.cpp:1067] [c10d - debug] Uv main loop running |
|
[I1021 17:17:55.300507801 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (10.117.192.77, 39457). |
|
[I1021 17:17:55.300631164 socket.cpp:884] [c10d] The client socket has connected to [n117-192-077.byted.org]:39457 on [n117-192-077.byted.org]:63684. |
|
[I1021 17:17:55.303608993 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 10.117.192.77:39457 |
|
[W1021 17:17:55.304066608 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator()) |
|
[I1021 17:17:55.304127511 ProcessGroupNCCL.cpp:852] [PG 0 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0 |
|
[I1021 17:17:55.304136181 ProcessGroupNCCL.cpp:861] [PG 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0 |
|
[rank0]:[I1021 17:17:55.304687998 ProcessGroupNCCL.cpp:852] [PG 1 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xb4c4d70, SPLIT_COLOR: 3389850942126204093, PG Name: 1 |
|
[rank0]:[I1021 17:17:55.304706666 ProcessGroupNCCL.cpp:861] [PG 1 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0 |
|
[rank0]:[I1021 17:17:55.318136354 ProcessGroupNCCL.cpp:852] [PG 3 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xb4c4d70, SPLIT_COLOR: 3389850942126204093, PG Name: 3 |
|
[rank0]:[I1021 17:17:55.318156378 ProcessGroupNCCL.cpp:861] [PG 3 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0 |
|
[rank0]:[I1021 17:17:55.319864667 ProcessGroupNCCL.cpp:852] [PG 5 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xb4c4d70, SPLIT_COLOR: 3389850942126204093, PG Name: 5 |
|
[rank0]:[I1021 17:17:55.319881146 ProcessGroupNCCL.cpp:861] [PG 5 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0 |
|
INFO 10-21 17:17:55 model_runner.py:1060] Starting to load model ../../Llama-3.1-8B/... |
|
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s] |
|
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:00<00:02, 1.13it/s] |
|
Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:01<00:01, 1.66it/s] |
|
Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:02<00:00, 1.30it/s] |
|
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00, 1.10it/s] |
|
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00, 1.18it/s] |
|
|
|
INFO 10-21 17:17:59 model_runner.py:1071] Loading model weights took 14.9888 GB |
|
INFO 10-21 17:18:00 gpu_executor.py:122] # GPU blocks: 30099, # CPU blocks: 2048 |
|
INFO 10-21 17:18:00 gpu_executor.py:126] Maximum concurrency for 131072 tokens per request: 3.67x |
|
INFO 10-21 17:18:01 model_runner.py:1402] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. |
|
INFO 10-21 17:18:01 model_runner.py:1406] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage. |
|
INFO 10-21 17:18:08 model_runner.py:1530] Graph capturing finished in 7 secs. |
|
../../Llama-3.1-8B/ |
|
load data |
|
Sampled Question: |
|
Question: Find the domain of the expression $\frac{\sqrt{x-2}}{\sqrt{5-x}}$.} |
|
Let's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \ge 0$, so $x\ge2$, and $5 - x \ge 0$, so $x \le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $[2,5)$. Final Answer: The answer is $[2,5)$. I hope it is correct. |
|
|
|
Question: If $\det \mathbf{A} = 2$ and $\det \mathbf{B} = 12,$ then find $\det (\mathbf{A} \mathbf{B}).$ |
|
Let's think step by step. We have that $\det (\mathbf{A} \mathbf{B}) = (\det \mathbf{A})(\det \mathbf{B}) = (2)(12) = 24.$ Final Answer: The answer is $24$. I hope it is correct. |
|
|
|
Question: Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight? |
|
Let's think step by step. If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\cdot 12\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\cdot15\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\begin{align*} |
|
30n&=480\ |
|
\Rightarrow\qquad n&=480/30=16 |
|
\end{align*} |
|
Final Answer: The answer is $16$. I hope it is correct. |
|
|
|
Question: If the system of equations |
|
|
|
\begin{align*} |
|
6x-4y&=a,\ |
|
6y-9x &=b. |
|
\end{align*} |
|
has a solution $(x, y)$ where $x$ and $y$ are both nonzero, find $\frac{a}{b},$ assuming $b$ is nonzero. |
|
Let's think step by step. If we multiply the first equation by $-\frac{3}{2}$, we obtain $$6y-9x=-\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have |
|
$$-\frac{3}{2}a=b\Rightarrow\frac{a}{b}=-\frac{2}{3}.$$ |
|
Final Answer: The answer is $-\frac{2}{3}$. I hope it is correct. |
|
|
|
Question: If 4 daps = 7 yaps, and 5 yaps = 3 baps, how many daps equal 42 baps? |
|
Let's think step by step. |
|
fitered 0 already |
|
0%| | 0/10 [00:00<?, ?it/s]
10%|β | 1/10 [00:32<04:52, 32.49s/it]
20%|ββ | 2/10 [01:03<04:15, 31.89s/it]
30%|βββ | 3/10 [01:35<03:42, 31.82s/it]
40%|ββββ | 4/10 [02:08<03:12, 32.04s/it]
50%|βββββ | 5/10 [02:40<02:41, 32.32s/it]
60%|ββββββ | 6/10 [03:13<02:10, 32.56s/it]
70%|βββββββ | 7/10 [03:45<01:37, 32.33s/it]
80%|ββββββββ | 8/10 [04:18<01:04, 32.38s/it]
90%|βββββββββ | 9/10 [04:50<00:32, 32.30s/it]
100%|ββββββββββ| 10/10 [05:17<00:00, 30.83s/it]
100%|ββββββββββ| 10/10 [05:17<00:00, 31.79s/it] |
|
|