Lambent commited on
Commit
df72c74
·
verified ·
1 Parent(s): be555c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -0
README.md CHANGED
@@ -8,6 +8,17 @@ base_model:
8
  - tiiuae/Falcon3-10B-Base
9
  ---
10
 
 
 
 
 
 
 
 
 
 
 
 
11
  hf (pretrained=Lambent/Falcon3-Continued-0.3-10B-Base,dtype=auto,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: auto
12
  | Tasks |Version|Filter|n-shot| Metric | | Value | |Stderr|
13
  |--------|------:|------|-----:|-----------------|---|-------:|---|-----:|
 
8
  - tiiuae/Falcon3-10B-Base
9
  ---
10
 
11
+ This model uses qLoRA with UnSloth to continuously pretrain Falcon3-10B-Base on an additional 30,720 rows from PleIAs/common_corpus, cyclically.
12
+
13
+ Rows trained at a time varied between 2048, 4096, and 8192, using cosine decay. A merged model was saved and tested every 10240 rows.
14
+
15
+ Adapters ranged from rank 32 to rank 128, with ranks 64 and 128 being the most common. Weight decay was 0.01.
16
+
17
+ Trained context length ranged from 4096 to the full 32678, with 32678 being the most common. Sample packing was not used.
18
+ Long documents, if present, were truncated.
19
+
20
+ Training continued until no improvement in eq_bench was demonstrated from this method. Most other benchmarks stayed similar.
21
+
22
  hf (pretrained=Lambent/Falcon3-Continued-0.3-10B-Base,dtype=auto,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: auto
23
  | Tasks |Version|Filter|n-shot| Metric | | Value | |Stderr|
24
  |--------|------:|------|-----:|-----------------|---|-------:|---|-----:|