metadata

license: apache-2.0
language:
  - en
pipeline_tag: text-generation

Work in progress...

Like version 1, this model will be trained on a single GPU, with hopes of getting better peformance.

Roadmap

Train on 1,000,000 examples of Skylion007/openwebtext at a learning rate of 3e-4 and batch size of 32
Once perplexity reaches an average of ~100, a cosine scheduler will be applied, and batch size will be increased to 4096
After trained on 3,000,000 - 5,000,000 examples of Skylion007/openwebtext, the model will be trained on graelo/wikipedia and mattymchen/refinedweb-3m, and the batch size will be increased to 49,152.
I'm open to any suggestions to modify this roadmap if you feel it isn't sufficient!

Disclaimer

This model may be cancelled if performance improvement is not seen over its predecessor