This model series is part of active research. The research paper will be published between the versions 0.5 and 1.0, under the name "Multilingual Babies: Analyzing Multilingual GPT Models with under one Billion Parameters".

Model is still in its testing phase. We don't recommend it for high-end production enviroments, it's only a model for story-generation. We suspect this model may be slightly overfitted. This will be fixed with the coming version. Model trained using LLaMA-Factory by Asadullah Hamzah at XeTute Technologies.

Image for Decoration Image for Decoration, hope you like it =)

Phantasor V0.2

We introduce Phantasor V0.2, the continuation of Phantasor V0.1. It has been trained ontop of V0.1 using a new dataset (more details below) and the old datasets.
Licensed under MIT, feel free to use it in your personal projects, both commercially and privately, Since this is V0.2, we're open to feedback to improve our project(s).
The Chat-Template used is Alpaca. For correct usage, insert your prompt as a system prompt. The model can also be used without any template to continue a sequence of text.
You can find the FP32 version here.

Dataset Details

As noted above, to create this model, we took V0.1, finetuned it on the datasets used before, but this time:

  • We also used MatanP/emotion_mapped_story_dataset, which strongly enriched the mix of Short-Story relevant generation in the English language
  • We instructed the old (V0.1) model to generate stories and keywords for these 132 times, let a 'teacher' model refine these outputs to make them highest-quality, which resulted in this dataset — This model is trained on this dataset too.

Example =)

"System-Prompt": "A young detective investigates a series of bizarre murders in a small town.",
"Response":
"As the investigation unfolds, Detective Darlene Thompson finds herself drawn to the enigmatic figure known as "The Black Widow." The Black Widow, with her dark hair and piercing blue eyes, is said to possess an uncanny ability to manipulate reality. She possesses a hidden knowledge of the past, and she is determined to uncover the truth behind the events that have taken place.
Detective Darlene Thompson's quest begins when she meets a mysterious woman named Amelia. Amelia has a peculiar history, and she believes that the Black Widow is responsible for the murders that took place in the town. However, she also knows that the Black Widow has a secret plan to bring justice to the town.
Detective Darlene Thompson confronts Amelia about her involvement in the murders, and she realizes that Amelia's motives are not just selfish but also dangerous. She must confront the Black Widow before she can make a decision."
"System-Prompt": "Given a paragraph, continue the story."
"Input":
"But they knew that they were leaving with something even more precious than their survival – the bonds they had formed with each other.
They hugged and said their goodbyes, knowing that they had become friends for life."
"Response":
"As they went on, they talked about their journey and the importance of friendship, but it was not until they had reached the end of the road that they realized the true meaning of friendship. They had made friends who had become close friends, and they had shared their experiences and memories with each other."

Training

This model was trained on all samples, tokens included in:

for exactly 4.0 epochs on all model parameters. Following is the loss curve, updated with each training step over all four epochs. training_loss.png Instead of AdamW, which is often used for large GPTs, we used SGD, which enabled the model to generalize better, which can be seen when using the model on non-dataset prompts.

Finished Model

  • ~137M Parameters, all of which are trainable
  • 1024 / 1k input tokens / context length, from which all were used
  • A loss ~1.2 on all samples (see Files => train_results.json)

This is very good performance for the V0.2.

Our platforms

Socials

BlueSky | YouTube | HuggingFace 🤗 | Ko-Fi / Financially Support Us

Our Platforms

Our Webpage | PhantasiaAI

Have a great day!

Downloads last month
24
GGUF
Model size
163M params
Architecture
gpt2
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for XeTute/Phantasor_V0.2-137M-GGUF

Quantized
(2)
this model

Datasets used to train XeTute/Phantasor_V0.2-137M-GGUF

Collection including XeTute/Phantasor_V0.2-137M-GGUF