rookiemango's picture
Upload folder using huggingface_hub
6cf3348 verified
[I1021 17:17:47.067897003 debug.cpp:49] [c10d] The debug level is set to INFO.
/opt/tiger/sparse_llm/lib/python3.9/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
No module named 'vllm._version'
from vllm.version import __version__ as VLLM_VERSION
llama3.1
*****************************
Namespace(cot_trigger_no=1, dataset='bd_math', data_path='bd_math_test.json', batch_size=64, eval_method='', model_path='../../Llama-3.1-8B/', model_type='llama3.1', output_dir='generate_result/zero_shot/bd_math/default/llama3.1/1/', lora_path='', method='zero_shot', data_question_key='question', data_answer_key='answer', sample_num=1, cuda_ind=0, tensor_parallel=1, cuda_start=0, cuda_num=8, load_in_8bit=False, rewrite=False, use_typewriter=0, temperature=0.7, top_p=1, iter_max_new_tokens=512, init_max_new_tokens=2048, min_new_tokens=1, correct_response_format='The correct response is:', cot_trigger="Let's think step by step.")
*****************************
WARNING 10-21 17:17:53 arg_utils.py:953] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False.
INFO 10-21 17:17:53 config.py:1005] Chunked prefill is enabled with max_num_batched_tokens=512.
INFO 10-21 17:17:53 llm_engine.py:237] Initializing an LLM engine (vdev) with config: model='../../Llama-3.1-8B/', speculative_config=None, tokenizer='../../Llama-3.1-8B/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=../../Llama-3.1-8B/, use_v2_block_manager=True, num_scheduler_steps=1, chunked_prefill_enabled=True multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, mm_processor_kwargs=None)
[I1021 17:17:55.299354664 TCPStore.cpp:312] [c10d - debug] The server has started on port = 39457.
[I1021 17:17:55.299507279 TCPStoreLibUvBackend.cpp:1067] [c10d - debug] Uv main loop running
[I1021 17:17:55.300507801 socket.cpp:720] [c10d - debug] The client socket will attempt to connect to an IPv6 address of (10.117.192.77, 39457).
[I1021 17:17:55.300631164 socket.cpp:884] [c10d] The client socket has connected to [n117-192-077.byted.org]:39457 on [n117-192-077.byted.org]:63684.
[I1021 17:17:55.303608993 TCPStore.cpp:350] [c10d - debug] TCP client connected to host 10.117.192.77:39457
[W1021 17:17:55.304066608 CUDAAllocatorConfig.h:28] Warning: expandable_segments not supported on this platform (function operator())
[I1021 17:17:55.304127511 ProcessGroupNCCL.cpp:852] [PG 0 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
[I1021 17:17:55.304136181 ProcessGroupNCCL.cpp:861] [PG 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
[rank0]:[I1021 17:17:55.304687998 ProcessGroupNCCL.cpp:852] [PG 1 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xb4c4d70, SPLIT_COLOR: 3389850942126204093, PG Name: 1
[rank0]:[I1021 17:17:55.304706666 ProcessGroupNCCL.cpp:861] [PG 1 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
[rank0]:[I1021 17:17:55.318136354 ProcessGroupNCCL.cpp:852] [PG 3 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xb4c4d70, SPLIT_COLOR: 3389850942126204093, PG Name: 3
[rank0]:[I1021 17:17:55.318156378 ProcessGroupNCCL.cpp:861] [PG 3 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
[rank0]:[I1021 17:17:55.319864667 ProcessGroupNCCL.cpp:852] [PG 5 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0xb4c4d70, SPLIT_COLOR: 3389850942126204093, PG Name: 5
[rank0]:[I1021 17:17:55.319881146 ProcessGroupNCCL.cpp:861] [PG 5 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.20.5, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: INFO, TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK: 0, TORCH_NCCL_ENABLE_MONITORING: 1, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 600, TORCH_NCCL_TRACE_BUFFER_SIZE: 0, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0
INFO 10-21 17:17:55 model_runner.py:1060] Starting to load model ../../Llama-3.1-8B/...
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:00<00:02, 1.13it/s]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:01<00:01, 1.66it/s]
Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:02<00:00, 1.30it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00, 1.10it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00, 1.18it/s]
INFO 10-21 17:17:59 model_runner.py:1071] Loading model weights took 14.9888 GB
INFO 10-21 17:18:00 gpu_executor.py:122] # GPU blocks: 30099, # CPU blocks: 2048
INFO 10-21 17:18:00 gpu_executor.py:126] Maximum concurrency for 131072 tokens per request: 3.67x
INFO 10-21 17:18:01 model_runner.py:1402] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 10-21 17:18:01 model_runner.py:1406] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 10-21 17:18:08 model_runner.py:1530] Graph capturing finished in 7 secs.
../../Llama-3.1-8B/
load data
Sampled Question:
Question: Find the domain of the expression $\frac{\sqrt{x-2}}{\sqrt{5-x}}$.}
Let's think step by step. The expressions inside each square root must be non-negative. Therefore, $x-2 \ge 0$, so $x\ge2$, and $5 - x \ge 0$, so $x \le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $[2,5)$. Final Answer: The answer is $[2,5)$. I hope it is correct.
Question: If $\det \mathbf{A} = 2$ and $\det \mathbf{B} = 12,$ then find $\det (\mathbf{A} \mathbf{B}).$
Let's think step by step. We have that $\det (\mathbf{A} \mathbf{B}) = (\det \mathbf{A})(\det \mathbf{B}) = (2)(12) = 24.$ Final Answer: The answer is $24$. I hope it is correct.
Question: Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight?
Let's think step by step. If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\cdot 12\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\cdot15\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\begin{align*}
30n&=480\
\Rightarrow\qquad n&=480/30=16
\end{align*}
Final Answer: The answer is $16$. I hope it is correct.
Question: If the system of equations
\begin{align*}
6x-4y&=a,\
6y-9x &=b.
\end{align*}
has a solution $(x, y)$ where $x$ and $y$ are both nonzero, find $\frac{a}{b},$ assuming $b$ is nonzero.
Let's think step by step. If we multiply the first equation by $-\frac{3}{2}$, we obtain $$6y-9x=-\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have
$$-\frac{3}{2}a=b\Rightarrow\frac{a}{b}=-\frac{2}{3}.$$
Final Answer: The answer is $-\frac{2}{3}$. I hope it is correct.
Question: If 4 daps = 7 yaps, and 5 yaps = 3 baps, how many daps equal 42 baps?
Let's think step by step.
fitered 0 already
0%| | 0/10 [00:00<?, ?it/s] 10%|β–ˆ | 1/10 [00:32<04:52, 32.49s/it] 20%|β–ˆβ–ˆ | 2/10 [01:03<04:15, 31.89s/it] 30%|β–ˆβ–ˆβ–ˆ | 3/10 [01:35<03:42, 31.82s/it] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 4/10 [02:08<03:12, 32.04s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 5/10 [02:40<02:41, 32.32s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 6/10 [03:13<02:10, 32.56s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 7/10 [03:45<01:37, 32.33s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 8/10 [04:18<01:04, 32.38s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 9/10 [04:50<00:32, 32.30s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [05:17<00:00, 30.83s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [05:17<00:00, 31.79s/it]