This model was a QLoRA of LLaMA 2-13B base finetuned on a desuarchive dump of the 4channel /mlp/ board and then merged with the base model (as most GGML loading apps don't support LoRAs), and quantized for llama.cpp-based frontends. Was trained with 1024 context length

There are two options, depending on the resources you have:

  • Q5_K_M: Low quality loss K-quantized 5 bits model. Max RAM consumption is 11.73 GB, recommended if you have 12GB of VRAM to load 40 layers
  • Q4_K_S: Compact K-quantized 4 bits. Max RAM consumption is 9.87 GB

This not an instruction tuned model, it was trained on raw text, so treat it like an autocomplete.

Specifically, the dataset was a dump of all the board's posts, from the time of its creation to about late 2019. Prompting it appropriately will cause it to write greentext.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.