TinyMistral-248M-v2 / README.md
Locutusque's picture
Update README.md
4e4bea5
|
raw
history blame
821 Bytes
metadata
license: apache-2.0
language:
  - en
pipeline_tag: text-generation

Work in progress...

Like version 1, this model will be trained on a single GPU, with hopes of getting better peformance.

Roadmap

  • Train on 1,000,000 examples of Skylion007/openwebtext at a learning rate of 3e-4 and batch size of 32

  • Once perplexity reaches an average of ~100, a cosine scheduler will be applied, and batch size will be increased to 4096

  • After trained on 3,000,000 - 5,000,000 examples of Skylion007/openwebtext, the model will be trained on graelo/wikipedia and mattymchen/refinedweb-3m, and the batch size will be increased to 49,152.

  • I'm open to any suggestions to modify this roadmap if you feel it isn't sufficient!

Disclaimer

This model may be cancelled if performance improvement is not seen over its predecessor