Resources for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
Xin Lai
xinlai
AI & ML interests
Multimodal LLM, LLM Reasoning, Point Cloud Segmentation, Image Segmentation
Recent Activity
upvoted
a
paper
about 2 months ago
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
upvoted
a
paper
2 months ago
VisionZip: Longer is Better but Not Necessary in Vision Language Models
updated
a model
6 months ago
Fanbin/arc_qwen_0.5b
Organizations
None yet
Collections
1
Papers
2
models
21
xinlai/Qwen2-7B-Instruct-Step-DPO
Text Generation
•
Updated
•
40
•
2
xinlai/Qwen2-57B-A14B-SFT-Step-DPO
Text Generation
•
Updated
•
8
•
1
xinlai/Qwen1.5-32B-SFT-Step-DPO
Text Generation
•
Updated
•
9
•
1
xinlai/Llama-3-70B-SFT-Step-DPO
Text Generation
•
Updated
•
5
xinlai/DeepSeekMath-Base-SFT-Step-DPO
Text Generation
•
Updated
•
13
xinlai/Qwen2-7B-SFT-Step-DPO
Text Generation
•
Updated
•
10
xinlai/Qwen2-72B-Instruct-Step-DPO
Text Generation
•
Updated
•
10
xinlai/DeepSeekMath-RL-Step-DPO
Text Generation
•
Updated
•
9
•
2
xinlai/Qwen2-57B-A14B-SFT
Text Generation
•
Updated
•
11
xinlai/Qwen1.5-32B-SFT
Text Generation
•
Updated
•
7