|
--- |
|
license: apache-2.0 |
|
base_model: pszemraj/verysmol_llama-v7-KIx2 |
|
tags: |
|
- generated_from_trainer |
|
metrics: |
|
- accuracy |
|
inference: |
|
parameters: |
|
max_new_tokens: 64 |
|
do_sample: true |
|
temperature: 0.85 |
|
repetition_penalty: 1.35 |
|
no_repeat_ngram_size: 5 |
|
eta_cutoff: 0.0006 |
|
renormalize_logits: true |
|
widget: |
|
- text: My name is El Microondas the Wise and |
|
example_title: El Microondas |
|
- text: Kennesaw State University is a public |
|
example_title: Kennesaw State University |
|
- text: >- |
|
Bungie Studios is an American video game developer. They are most famous for |
|
developing the award winning Halo series of video games. They also made |
|
Destiny. The studio was founded |
|
example_title: Bungie |
|
- text: The Mona Lisa is a world-renowned painting created by |
|
example_title: Mona Lisa |
|
- text: >- |
|
The Harry Potter series, written by J.K. Rowling, begins with the book |
|
titled |
|
example_title: Harry Potter Series |
|
- text: >- |
|
Question: I have cities, but no houses. I have mountains, but no trees. I |
|
have water, but no fish. What am I? |
|
|
|
Answer: |
|
example_title: Riddle |
|
- text: The process of photosynthesis involves the conversion of |
|
example_title: Photosynthesis |
|
- text: >- |
|
Jane went to the store to buy some groceries. She picked up apples, oranges, |
|
and a loaf of bread. When she got home, she realized she forgot |
|
example_title: Story Continuation |
|
- text: >- |
|
Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph, and |
|
another train leaves Station B at 10:00 AM and travels at 80 mph, when will |
|
they meet if the distance between the stations is 300 miles? |
|
|
|
To determine |
|
example_title: Math Problem |
|
- text: In the context of computer programming, an algorithm is |
|
example_title: Algorithm Definition |
|
pipeline_tag: text-generation |
|
datasets: |
|
- JeanKaddour/minipile |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# BEE-spoke-data/verysmol_llama-v8-minipile_x2 |
|
|
|
This is still a work-in-progress and should be treated as such. |
|
|
|
## Model description |
|
|
|
This is an autogressive smol language model. It generates text. |
|
|
|
It achieves the following results on the evaluation set: |
|
- Loss: 2.7521 |
|
- Accuracy: 0.4686 |
|
|
|
## Intended uses & limitations |
|
|
|
Doing things. Limitations are that it is smol. |
|
|
|
Additionally, _<insert generic, emotionless, and corporate statement about bias in language models here>_. |
|
|
|
## Data |
|
|
|
Most recent training run was on `JeanKaddour/minipile` for 2 epochs. Otherwise, please refer to the below quote: |
|
|
|
> UnFoRtUnAtElY We'rE UnAbLe tO ShArE DeTaIlS AbOuT ThE TrAiNiNg aNd tHe dAtAsEtS (eXtRaCtEd fRoM ThE OpEn wEb) DuE To tHe hIgHlY CoMpEtItIvE NaTuRe oF ThE FiElD. |
|
|
|
|
|
## evals |
|
|
|
| **eval metrics** | | |
|
| -------------------------------------------------- | -------------------------- | |
|
| epoch | 2.0 | |
|
| eval_accuracy | 0.4685 | |
|
| eval_loss | 2.7521 | |
|
| eval_runtime | 0:00:03.89 | |
|
| eval_samples | 300 | |
|
| eval_samples_per_second | 77.049 | |
|
| eval_steps_per_second | 9.759 | |
|
| perplexity | 15.675 | |
|
|
|
|
|
### harness |
|
|
|
> some improvements and some degradations over prev versions. May indicate the last dataset in curricula matters/needs to be chosen specifically |
|
|
|
`hf-causal-experimental (pretrained=BEE-spoke-data/verysmol_llama-v8-minipile_x2,revision=main,trust_remote_code=True,dtype='float'), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16` |
|
|
|
|
|
| Task |Version| Metric | Value | |Stderr| |
|
|--------------|------:|--------|-------:|---|-----:| |
|
|arc_easy | 0|acc | 0.3662|± |0.0099| |
|
| | |acc_norm| 0.3460|± |0.0098| |
|
|boolq | 1|acc | 0.6052|± |0.0085| |
|
|lambada_openai| 0|ppl |156.8153|± |6.5985| |
|
| | |acc | 0.2010|± |0.0056| |
|
|openbookqa | 0|acc | 0.1280|± |0.0150| |
|
| | |acc_norm| 0.2660|± |0.0198| |
|
|piqa | 0|acc | 0.5865|± |0.0115| |
|
| | |acc_norm| 0.5805|± |0.0115| |
|
|winogrande | 0|acc | 0.5217|± |0.0140| |
|
|
|
|
|
| Task |Version| Metric |Value | |Stderr| |
|
|-------------|------:|--------|-----:|---|-----:| |
|
|arc_challenge| 0|acc |0.1877|± |0.0114| |
|
| | |acc_norm|0.2235|± |0.0122| |
|
|
|
| Task |Version| Metric |Value | |Stderr| |
|
|---------|------:|--------|-----:|---|-----:| |
|
|hellaswag| 0|acc |0.2622|± |0.0088| |
|
| | |acc_norm|0.2777|± |0.0089| |
|
|
|
| Task |Version|Metric|Value | |Stderr| |
|
|-------------|------:|------|-----:|---|-----:| |
|
|truthfulqa_mc| 1|mc1 |0.2705|± |0.0156| |
|
| | |mc2 |0.4729|± |0.0155| |
|
|
|
--- |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.00015 |
|
- train_batch_size: 8 |
|
- eval_batch_size: 8 |
|
- seed: 5404 |
|
- gradient_accumulation_steps: 16 |
|
- total_train_batch_size: 128 |
|
- optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-07 |
|
- lr_scheduler_type: cosine |
|
- lr_scheduler_warmup_ratio: 0.05 |
|
- num_epochs: 2.0 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Accuracy | |
|
|:-------------:|:-----:|:-----:|:---------------:|:--------:| |
|
| 2.7625 | 0.02 | 200 | 2.8982 | 0.4457 | |
|
| 2.7377 | 0.03 | 400 | 2.8812 | 0.4477 | |
|
| 2.6883 | 0.05 | 600 | 2.8774 | 0.4489 | |
|
| 2.7654 | 0.06 | 800 | 2.8811 | 0.4479 | |
|
| 2.744 | 0.08 | 1000 | 2.8838 | 0.4464 | |
|
| 2.6922 | 0.09 | 1200 | 2.8921 | 0.4461 | |
|
| 2.7416 | 0.11 | 1400 | 2.8930 | 0.4464 | |
|
| 2.7337 | 0.12 | 1600 | 2.8972 | 0.4465 | |
|
| 2.7046 | 0.14 | 1800 | 2.8933 | 0.4472 | |
|
| 2.673 | 0.15 | 2000 | 2.8926 | 0.4483 | |
|
|
|
|
|
... |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Accuracy | |
|
|:-------------:|:-----:|:-----:|:---------------:|:--------:| |
|
| 2.5155 | 1.88 | 24800 | 2.7524 | 0.4685 | |
|
| 2.5092 | 1.89 | 25000 | 2.7522 | 0.4686 | |
|
| 2.5093 | 1.91 | 25200 | 2.7523 | 0.4685 | |
|
| 2.4574 | 1.92 | 25400 | 2.7521 | 0.4686 | |
|
| 2.5137 | 1.94 | 25600 | 2.7522 | 0.4686 | |
|
| 2.4598 | 1.95 | 25800 | 2.7521 | 0.4686 | |
|
| 2.515 | 1.97 | 26000 | 2.7521 | 0.4685 | |
|
| 2.5429 | 1.98 | 26200 | 2.7521 | 0.4686 | |
|
| 2.4789 | 2.0 | 26400 | 2.7521 | 0.4686 | |