Base model?

by mpasila - opened 2 days ago

2 days ago

•

What model is this based on? It uses Llama architecture and similar special tokens from Llama 3 series (further look into it, the tokenizer is the same from 3.1). Is this just an upscaled Llama 3 model? If so wouldn't this then use Llama 3's license? Which would make the custom license invalid.

Abhaykoul

HelpingAI org about 17 hours ago

This is pretrained using llama's arch and tokenizer

mpasila

about 16 hours ago

Can you give any information about pre-training? How many GPUs were used, how big of a dataset was used, filtering etc. Because many of these models appear to just be upscales of existing models like Qwen3, Qwen 2.5, Mixtral etc. (when looking at those different architectures and sizes and context windows)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment