Li Tan PRO
tanliboy
AI & ML interests
None yet
Recent Activity
updated
a model
8 days ago
tanliboy/Qwen2.5-14B-Instruct-1M-AWQ
published
a model
8 days ago
tanliboy/Qwen2.5-14B-Instruct-1M-AWQ
updated
a model
8 days ago
tanliboy/DeepSeek-R1-Distill-Qwen-32B-AWQ
Organizations
tanliboy's activity
what is your "continuous finetuning"
7
#2 opened 4 months ago
by
MaziyarPanahi
![](https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/gZXHW5dd9R86AV9LMZ--y.png)
Batch Inference causes degraded performance
3
#43 opened 6 months ago
by
tanliboy
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6448b3266ffed6ece10335ba/HLC0SfOHjssWXB99eyxt8.png)
Scorecard on popular benchmarks
2
#2 opened 5 months ago
by
tanliboy
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6448b3266ffed6ece10335ba/HLC0SfOHjssWXB99eyxt8.png)
Phi-2-Instruct-APO: aligned with Anchored Preference Optimization
16
#3 opened 5 months ago
by
rasyosef
Preference Alignment
4
#6 opened 5 months ago
by
tanliboy
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6448b3266ffed6ece10335ba/HLC0SfOHjssWXB99eyxt8.png)
Text Classification with LLMs
7
#30 opened 6 months ago
by
dss107
IFEVAL drop
#16 opened 5 months ago
by
tanliboy
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6448b3266ffed6ece10335ba/HLC0SfOHjssWXB99eyxt8.png)
bfloat16 vs. float32
#34 opened 5 months ago
by
tanliboy
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6448b3266ffed6ece10335ba/HLC0SfOHjssWXB99eyxt8.png)
Qwen 2.5 1.5B retrain?
4
#12 opened 5 months ago
by
tomaarsen
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6317233cc92fd6fee317e030/cJHSvvimr1kqgQfHOjO5n.png)
GSM8K Evaluation Result: 84.5 vs. 76.95
17
#81 opened 6 months ago
by
tanliboy
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6448b3266ffed6ece10335ba/HLC0SfOHjssWXB99eyxt8.png)
Finetuning script using HuggingFace (No llama-factory)
36
#32 opened 5 months ago
by
2U1
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
8
#120 opened 6 months ago
by
erildo
Have you deleted your GitHub page?
7
#10 opened 5 months ago
by
xwzy6
Sliding window vs. Global Attention
6
#41 opened 6 months ago
by
tanliboy
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6448b3266ffed6ece10335ba/HLC0SfOHjssWXB99eyxt8.png)
Gemma2-2b training uses much more momory!
2
#23 opened 6 months ago
by
bubbleseller
GemmaSdpaAttention vs GemmaAttention
2
#71 opened 6 months ago
by
canqin001
Fix Llama 3.1 Chat Template to Properly Handle add_generation_prompt
9
#26 opened 6 months ago
by
Tostino
🍭 Fine-tuning support for Qwen2-VL-7B-Instruct
5
#1 opened 6 months ago
by
study-hjt
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60a5dd8e4ecc5d054c8ad948/xKpdUo9JReJqp8sIgMbJT.jpeg)
Evaluation Result
1
#15 opened 6 months ago
by
tanliboy
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6448b3266ffed6ece10335ba/HLC0SfOHjssWXB99eyxt8.png)
How is this dataset supposed to be used to evaluate the model?
4
#1 opened 6 months ago
by
realdanielbyrne