本项目基于Meta发布的llama3-8B-Instruct模型进行开发。即将MLP复制8份,创建一个随机初始化的router,其余参数权重保持不变,搭建一个热启动的MoE模型。这种方式能够极大地降低从头开始训练一个MoE模型的成本,便于快速的在下游任务中微调使用。

其中 router_warmboot表示使用chines-mixtral-Instruct版本中的router参数进行llama3-MoE——Instruct参数的初始化,router_random是router随机初始化的版本。

详情请见github仓库https://github.com/cooper12121/llama3-8x8b-MoE

generate

  import sys
  sys.path.append("/apdcephfs_qy3/share_301372554/share_info/qianggao/")

  from modeling_file.llama3_moe.modeling_llama_moe import LlamaMoEForCausalLM
  from modeling_file.llama3_moe.tokenization_llama_fast import LlamaTokenizerFast
  model_ckpt = "/apdcephfs_qy3/share_301372554/share_info/qianggao/ckpt/llama3-8x8b-MoE-base"
  tokenizer = LlamaTokenizerFast.from_pretrained(model_ckpt)
  # print(tokenizer)
  model = LlamaMoEForCausalLM.from_pretrained(model_ckpt,device_map="auto",use_cache=False)
  text_list = ["hello,what is your name?","你好,你叫什么名字"]
  
  tokenizer.pad_token = tokenizer.eos_token
  tokenizer.pad_token_id = tokenizer.eos_token_id

  inputs = tokenizer(text_list,return_tensors="pt", padding=True).to("cuda")
 
  output = model.generate(**inputs,pad_token_id=tokenizer.eos_token_id,max_new_tokens=100)
  print(tokenizer.batch_decode(output))

其中modeling_file文件可从github仓库获取

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.