File size: 2,144 Bytes
5888a6a
 
 
 
 
 
 
 
 
 
5d12050
5888a6a
5d12050
5888a6a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
559fded
5888a6a
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
license: mit
language:
- en
base_model:
- suayptalha/Falcon3-Jessi-v0.4-7B-Slerp
---

Fine-tune of https://huggingface.co/suayptalha/Falcon3-Jessi-v0.4-7B-Slerp using a custom training script + custom optimizer.

Small early-stage reasoning model project that uses its own symbolic language to represent different reasoning steps.

Trained on a synthetic hand-edited dataset with 65 samples.

**Changes from previous versions**
- Unlike the prior versions, this only modifies `mlp.gate_proj`, leaving `self_attn` from the original model fully intact.
- I've added double weights for all syntax-related tokens in the loss function. This means there will be less responses that are missing the correct formatting.

**Future Plans**
- Increase the size of the training dataset
- Adapt this to other base models
- (Further out) Open-source the training script after a major refactor for cleanliness

Trained on a single 3090 targeting only `gate_proj` on blocks 7,8,9,25,26,27

This is the syntax for the DSL I trained it on, which is called ROL (Reasoning Operations Language).
```
## Core Symbols
`˩` (low), `˧` (medium), `˥` (extreme).
`⟡` Generate - Propose solutions/alternatives.  
`⚖` Evaluate - Weigh options against logic/emotion.  
`☮` Empathize - Model user’s emotional state.  
`⧟` Integrate - Synthesize memories, knowledge.  
`⌬` Knowledge - Factual recall with confidence.  
`֍` Thought - A sentence expressing private personal opinions about the query.
`☠` Threat - Identify risks/uncertainty.  
`✔` Resolve - Commit to actions post-conflict.  
`↺` Self-Reflect - Reconsider assumptions, mitigate overconfidence. Usually begins with the word "Wait" or "Alternatively" or "Actually". Use this at least 3 times.
`➤` Output - Structure tone, format, intent.
```
ROL was invented by Deepseek R1 and tweaked by me. The spec for it was not included in the training data - the model figured out how it works based on training samples.

The total training time for this was under 15 minutes.


![image/png](https://cdn-uploads.huggingface.co/production/uploads/630455a90547362a22a9d213/b1ckfz_SfgXnbTpfAVgG5.png)