nkpz
/

falcon-thought-7b-v0-2

Model card Files Files and versions Community

nkpz commited on 9 days ago

Commit

5d12050

·

verified ·

1 Parent(s): 5888a6a

slightly reorganize readme

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -8,9 +8,9 @@ base_model:
 Fine-tune of https://huggingface.co/suayptalha/Falcon3-Jessi-v0.4-7B-Slerp using a custom training script + custom optimizer.
-Third try at creating a small reasoning model that uses its own symbolic language to represent different reasoning steps.
-Still in early stages, and I plan on building publicly and releasing as often as possible.
 **Changes from previous versions**
 - Unlike the prior versions, this only modifies `mlp.gate_proj`, leaving `self_attn` from the original model fully intact.
@@ -38,7 +38,7 @@ This is the syntax for the DSL I trained it on, which is called ROL (Reasoning O
 `↺` Self-Reflect - Reconsider assumptions, mitigate overconfidence. Usually begins with the word "Wait" or "Alternatively" or "Actually". Use this at least 3 times.
 `➤` Output - Structure tone, format, intent.
 ```
-ROL was invented by Deepseek R1 and tweaked by me. The spec for it was not included in the training data - the model figured out how it works based on a synthetic hand-edited dataset with 65 samples.
 The total training time for this was under 15 minutes.

 Fine-tune of https://huggingface.co/suayptalha/Falcon3-Jessi-v0.4-7B-Slerp using a custom training script + custom optimizer.
+Small early-stage reasoning model project that uses its own symbolic language to represent different reasoning steps.
+Trained on a synthetic hand-edited dataset with 65 samples.
 **Changes from previous versions**
 - Unlike the prior versions, this only modifies `mlp.gate_proj`, leaving `self_attn` from the original model fully intact.
 `↺` Self-Reflect - Reconsider assumptions, mitigate overconfidence. Usually begins with the word "Wait" or "Alternatively" or "Actually". Use this at least 3 times.
 `➤` Output - Structure tone, format, intent.
 ```
+ROL was invented by Deepseek R1 and tweaked by me. The spec for it was not included in the training data - the model figured out how it works based on training data.
 The total training time for this was under 15 minutes.