![](https://cdn-avatars.huggingface.co/v1/production/uploads/60f2fc91b92afccb7c34b8ed/W2-Nay12Ef4Ltyaf8EKE9.jpeg)
gabrielmbmb/Upcycled-Qwen1.5-MoE2.7B
Text Generation
•
Updated
•
12
•
2
Models I pre-trained initialising SMoE models using dense model weights and the upcycling process used for Qwen1.5-MoE2.7BA (or something similar)
Note This model hasn't been trained, just initialised using upcycling process and the weights from Qwen/Qwen1.5-18B.
Note Using LoRA and targeting up_proj, gate_proj, down_proj, gate, and shared_expert_gate. LoRA rank was 8. About 126M trainable parameters. Dataset used was wiki_demo from LLaMa Factory.
Note Using LoRA targeting all the layers and rank 32. About 500m trainable parameters.