athirdpath commited on
Commit
dd9fc5a
·
verified ·
1 Parent(s): dea7640

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -4,4 +4,10 @@ license: llama3.1
4
 
5
  Llama 3.1 **Instruct**, continually pretrained with a full epoch (1169 steps @ total batch 115) of the same 1.5gb private dataset that underpins Iambe
6
 
7
- Instruction is broken, needs to be reSFT'd
 
 
 
 
 
 
 
4
 
5
  Llama 3.1 **Instruct**, continually pretrained with a full epoch (1169 steps @ total batch 115) of the same 1.5gb private dataset that underpins Iambe
6
 
7
+ Instruction is broken, needs to be reSFT'd
8
+
9
+ -----
10
+
11
+ Why do this? I have a niche use case where I cannot increase compute over 8b, and L3/3.1 are the only models in this size category that meet my needs for logic. However, both versions of L3/3.1 have the damn repetition/token overconfidence problem, and this is meant to disrupt that certainty without disrupting the model's ability to function.
12
+
13
+ By the way, I *think* it's the lm_head that is causing the looping, but it might be the embeddings being too separated. I'm not going to pay two more times to test them separately, however :p