nkpz/falcon-thought-7b-v0-2

Fine-tune of https://huggingface.co/suayptalha/Falcon3-Jessi-v0.4-7B-Slerp using a custom training script + custom optimizer.

Small early-stage reasoning model project that uses its own symbolic language to represent different reasoning steps.

Trained on a synthetic hand-edited dataset with 65 samples.

Changes from previous versions

Unlike the prior versions, this only modifies mlp.gate_proj, leaving self_attn from the original model fully intact.
I've added double weights for all syntax-related tokens in the loss function. This means there will be less responses that are missing the correct formatting.

Future Plans

Increase the size of the training dataset
Adapt this to other base models
(Further out) Open-source the training script after a major refactor for cleanliness

Trained on a single 3090 targeting only gate_proj on blocks 7,8,9,25,26,27

This is the syntax for the DSL I trained it on, which is called ROL (Reasoning Operations Language).

## Core Symbols
`˩` (low), `˧` (medium), `˥` (extreme).
`⟡` Generate - Propose solutions/alternatives.  
`⚖` Evaluate - Weigh options against logic/emotion.  
`☮` Empathize - Model user’s emotional state.  
`⧟` Integrate - Synthesize memories, knowledge.  
`⌬` Knowledge - Factual recall with confidence.  
`֍` Thought - A sentence expressing private personal opinions about the query.
`☠` Threat - Identify risks/uncertainty.  
`✔` Resolve - Commit to actions post-conflict.  
`↺` Self-Reflect - Reconsider assumptions, mitigate overconfidence. Usually begins with the word "Wait" or "Alternatively" or "Actually". Use this at least 3 times.
`➤` Output - Structure tone, format, intent.

ROL was invented by Deepseek R1 and tweaked by me. The spec for it was not included in the training data - the model figured out how it works based on training samples.

The total training time for this was under 15 minutes.

nkpz
/

falcon-thought-7b-v0-2

Model tree for nkpz/falcon-thought-7b-v0-2