Lambent
/

Falcon3-Continued-0.3-10B-Base

Model card Files Files and versions Community

Lambent commited on Dec 23, 2024

Commit

df72c74

·

verified ·

1 Parent(s): be555c6

Update README.md

Files changed (1) hide show

README.md +11 -0

README.md CHANGED Viewed

@@ -8,6 +8,17 @@ base_model:
 - tiiuae/Falcon3-10B-Base
 ---
 hf (pretrained=Lambent/Falcon3-Continued-0.3-10B-Base,dtype=auto,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: auto
 | Tasks  |Version|Filter|n-shot|     Metric      |   | Value  |   |Stderr|
 |--------|------:|------|-----:|-----------------|---|-------:|---|-----:|

 - tiiuae/Falcon3-10B-Base
 ---
+This model uses qLoRA with UnSloth to continuously pretrain Falcon3-10B-Base on an additional 30,720 rows from PleIAs/common_corpus, cyclically.
+Rows trained at a time varied between 2048, 4096, and 8192, using cosine decay. A merged model was saved and tested every 10240 rows.
+Adapters ranged from rank 32 to rank 128, with ranks 64 and 128 being the most common. Weight decay was 0.01.
+Trained context length ranged from 4096 to the full 32678, with 32678 being the most common. Sample packing was not used.
+Long documents, if present, were truncated.
+Training continued until no improvement in eq_bench was demonstrated from this method. Most other benchmarks stayed similar.
 hf (pretrained=Lambent/Falcon3-Continued-0.3-10B-Base,dtype=auto,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: auto
 | Tasks  |Version|Filter|n-shot|     Metric      |   | Value  |   |Stderr|
 |--------|------:|------|-----:|-----------------|---|-------:|---|-----:|