Update README.md
Browse files
README.md
CHANGED
@@ -8,6 +8,17 @@ base_model:
|
|
8 |
- tiiuae/Falcon3-10B-Base
|
9 |
---
|
10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
hf (pretrained=Lambent/Falcon3-Continued-0.3-10B-Base,dtype=auto,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: auto
|
12 |
| Tasks |Version|Filter|n-shot| Metric | | Value | |Stderr|
|
13 |
|--------|------:|------|-----:|-----------------|---|-------:|---|-----:|
|
|
|
8 |
- tiiuae/Falcon3-10B-Base
|
9 |
---
|
10 |
|
11 |
+
This model uses qLoRA with UnSloth to continuously pretrain Falcon3-10B-Base on an additional 30,720 rows from PleIAs/common_corpus, cyclically.
|
12 |
+
|
13 |
+
Rows trained at a time varied between 2048, 4096, and 8192, using cosine decay. A merged model was saved and tested every 10240 rows.
|
14 |
+
|
15 |
+
Adapters ranged from rank 32 to rank 128, with ranks 64 and 128 being the most common. Weight decay was 0.01.
|
16 |
+
|
17 |
+
Trained context length ranged from 4096 to the full 32678, with 32678 being the most common. Sample packing was not used.
|
18 |
+
Long documents, if present, were truncated.
|
19 |
+
|
20 |
+
Training continued until no improvement in eq_bench was demonstrated from this method. Most other benchmarks stayed similar.
|
21 |
+
|
22 |
hf (pretrained=Lambent/Falcon3-Continued-0.3-10B-Base,dtype=auto,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: auto
|
23 |
| Tasks |Version|Filter|n-shot| Metric | | Value | |Stderr|
|
24 |
|--------|------:|------|-----:|-----------------|---|-------:|---|-----:|
|