Was the 7.5T Token Continual Pre-Training Performed on the Instruction-Tuned Model or the Base PLM?
2
#10 opened 2 days ago
by
Jinhwan
I hope you guys can provide a 32B dense model
π
1
#9 opened 3 days ago
by
zletpm
MLX Convert Error
3
#8 opened 4 days ago
by
baggaindia
main
#7 opened 6 days ago
by
zwb19820615
Where's the knowledge?
β€οΈ
π§
7
4
#5 opened 8 days ago
by
phil111
Can we expect a 20b~32b parameter minimax model to fit into a single 4090?
π
π₯
32
#3 opened 8 days ago
by
win10

WHAT a benchmarks graph
π
13
1
#2 opened 8 days ago
by
CyborgPaloma
gguf weights for llama.cpp?
π
π§
18
#1 opened 8 days ago
by
segmond