Base model?
#1
by
mpasila
- opened
What model is this based on? It uses Llama architecture and similar special tokens from Llama 3 series (further look into it, the tokenizer is the same from 3.1). Is this just an upscaled Llama 3 model? If so wouldn't this then use Llama 3's license? Which would make the custom license invalid.
This is pretrained using llama's arch and tokenizer
Can you give any information about pre-training? How many GPUs were used, how big of a dataset was used, filtering etc. Because many of these models appear to just be upscales of existing models like Qwen3, Qwen 2.5, Mixtral etc. (when looking at those different architectures and sizes and context windows)