Li Tan's picture

39 1 9

Li Tan PRO

tanliboy

·

https://github.com/tanliboy

AI & ML interests

None yet

Recent Activity

liked a model 17 days ago

deepseek-ai/DeepSeek-R1

new activity 17 days ago

moonshotai/Moonlight-16B-A3B:Thank you!

updated a model about 1 month ago

tanliboy/Qwen2.5-14B-Instruct-1M-AWQ

View all activity

Organizations

tanliboy's activity

New activity in moonshotai/Moonlight-16B-A3B 17 days ago

Thank you!

#2 opened 17 days ago by

New activity in rombodawg/Rombos-LLM-V2.5-Qwen-72b 4 months ago

what is your "continuous finetuning"

#2 opened 5 months ago by

New activity in google/gemma-2-9b-it 4 months ago

Batch Inference causes degraded performance

#43 opened 7 months ago by

New activity in Qwen/Qwen2.5-7B-Instruct 5 months ago

Scorecard on popular benchmarks

#2 opened 6 months ago by

New activity in ContextualAI/ultrafeedback_clair_32k 6 months ago

Phi-2-Instruct-APO: aligned with Anchored Preference Optimization

#3 opened 6 months ago by

New activity in Qwen/Qwen2.5-Math-RM-72B 6 months ago

Preference Alignment

#6 opened 6 months ago by

New activity in meta-llama/Llama-3.1-8B 6 months ago

Text Classification with LLMs

#30 opened 7 months ago by

New activity in NousResearch/Hermes-3-Llama-3.1-8B 6 months ago

IFEVAL drop

#16 opened 6 months ago by

New activity in Alibaba-NLP/gte-Qwen2-7B-instruct 6 months ago

bfloat16 vs. float32

#34 opened 6 months ago by

New activity in Alibaba-NLP/gte-Qwen2-1.5B-instruct 6 months ago

Qwen 2.5 1.5B retrain?

#12 opened 6 months ago by

New activity in meta-llama/Llama-3.1-8B-Instruct 6 months ago

GSM8K Evaluation Result: 84.5 vs. 76.95

#81 opened 7 months ago by

New activity in Qwen/Qwen2-VL-7B-Instruct 6 months ago

Finetuning script using HuggingFace (No llama-factory)

#32 opened 6 months ago by

New activity in meta-llama/Llama-3.1-8B-Instruct 6 months ago

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.

#120 opened 7 months ago by

New activity in Qwen/Qwen2-VL-7B-Instruct 6 months ago

Have you deleted your GitHub page?

#10 opened 6 months ago by

New activity in google/gemma-2-9b-it 6 months ago

Sliding window vs. Global Attention

#41 opened 7 months ago by

New activity in google/gemma-2-2b 7 months ago

Gemma2-2b training uses much more momory!

#23 opened 7 months ago by

New activity in google/gemma-2b 7 months ago

GemmaSdpaAttention vs GemmaAttention

#71 opened 7 months ago by

New activity in meta-llama/Llama-3.1-70B-Instruct 7 months ago

Fix Llama 3.1 Chat Template to Properly Handle add_generation_prompt

#26 opened 7 months ago by

New activity in Qwen/Qwen2-VL-7B-Instruct 7 months ago

🍭 Fine-tuning support for Qwen2-VL-7B-Instruct

#1 opened 7 months ago by

New activity in google/recurrentgemma-9b-it 7 months ago

Evaluation Result

#15 opened 7 months ago by