SparseLLM
/

sparsing-law-0.1b-relu

Text Generation

Model card Files Files and versions

demerzel-iv commited on Nov 4, 2024

Commit

a728903

·

verified ·

1 Parent(s): cab8a9c

Update README.md

Files changed (1) hide show

README.md +15 -3

README.md CHANGED Viewed

@@ -1,3 +1,15 @@
----
-license: mit
----

+---
+license: mit
+language:
+- en
+- zh
+---
+# Model Card for sparsing-law-0.1b-relu
+- **Paper [optional]:** [paper](todo)
+- **Repository and demo code:** [github](https://github.com/thunlp/SparsingLaw)
+This model is ReLU-activated and contains approximately 0.1 billion non-embedding parameters.
+The model was trained from scratch using the pre-training dataset described in our paper, with the WSD (Warmup-Stable-Decay) learning rate scheduler. It represents the final checkpoint of the stable stage in WSD, meaning it has not undergone the decay stage.