Blackroot commited on
Commit
117fe8f
·
verified ·
1 Parent(s): 153ca6d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -1,7 +1,7 @@
1
  Test network using differential attention instead of classical attention. Other than some alterations to the attention, this is otherwise the same configuration as https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct
2
 
3
  # Notes:
4
- Compared to the control model of Smollm2, this is bordering on incoherent. Potentially this model size is too small to correctly leverage differential attention. It's clearly picked up on some ideas in language, but is generally worse than the control model using GQA.
5
 
6
 
7
  # Training Metrics
 
1
  Test network using differential attention instead of classical attention. Other than some alterations to the attention, this is otherwise the same configuration as https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct
2
 
3
  # Notes:
4
+ Compared to the control model of Smollm2, this is bordering on incoherent. Potentially this model size is too small to correctly leverage differential attention. It's clearly picked up on some ideas in language, but is generally worse than the control model using GQA in terms of human output.
5
 
6
 
7
  # Training Metrics