Update README.md
Browse files
README.md
CHANGED
|
@@ -1 +1,16 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Those repo are public because I hit the private storage limit, but feel free to try.
|
| 2 |
+
This model use the Mistral V7 prompt format.
|
| 3 |
+
|
| 4 |
+
It was trained on DeepSeek R1 RP log and character card, and some funny shit.
|
| 5 |
+
|
| 6 |
+
Default system prompt: "You are MistralThinker, a Large Language Model (LLM) created by Undi.\nYour knowledge base was last updated on 2023-10-01. Current date: {date}.\n\nWhen unsure, state you don't know."
|
| 7 |
+
|
| 8 |
+
I recommand you putting information about the persona and yourself in the system prompt to let the magic happen.
|
| 9 |
+
|
| 10 |
+
I sadly have a problem with the prompt format, in the tokenizer_config.json
|
| 11 |
+
|
| 12 |
+
I try to recreate what DeepSeek have done with their distill : they Added <think> at the beginning of each assistant reply and cut off the thinking part in the context.
|
| 13 |
+
|
| 14 |
+
I did the same, but on my side, the first <think> don't appear using "Chat completion".
|
| 15 |
+
|
| 16 |
+
Other than that, the model seem fully functionnal, feel free to try, but be sure to prefill <think> one way or another.
|