File size: 4,761 Bytes
ea4f657 c86016e ea4f657 3e6aefa bfea63e ea4f657 340a54f 209ef09 340a54f b207e9f ea4f657 25e1d06 ea43757 bfea63e ea43757 764a7ec ea4f657 13ec2df 99ce23b ea4f657 209ef09 ea4f657 1ca8c20 ea4f657 a4597ce ea4f657 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
---
library_name: transformers
language:
- sv
- en
---
# Lynx 2B (micro)

## Model Details
### Model Description
This is the first release of a series of Swedish large language models we call "Lynx". Micro is a small model (2 billion params), but punches way above its weight!
Lynx micro is a fine-tune of Google DeepMind Gemma 2B, scores just below GPT-3.5 Turbo on [Scandeval](https://scandeval.com/swedish-nlg/). In fact, the only non OpenAI model (currently) topping the Swedish NLG board on scandeval is a fine-tune of Llama-3 by AI Sweden based on our data recipe.
We believe that this is a really good model (for its size), but keep in mind that it is still a small model and hasn't memorized as much as larger models tend to do.
- **Funded, Developed and shared by:** [42 Labs](https://www.42labs.ai)
- **Model type:** Auto-regressive transformer
- **Language(s) (NLP):** Swedish and English
- **License:** Gemma terms of use
- **Finetuned from model:** [Gemma 2B, 1.1 instruct](https://huggingface.co/google/gemma-1.1-2b-it)
## How to Get Started with the Model
```python
import torch
from transformers import pipeline
from transformers import TextStreamer
from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM
model_name = 'four-two-labs/lynx-micro'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map='cuda',
torch_dtype=torch.bfloat16,
use_flash_attention_2=True, # Remove if flash attention isn't available
)
pipe = pipeline(
'text-generation',
model=model,
tokenizer=tokenizer,
streamer=TextStreamer(tokenizer=tokenizer)
)
messages = [
#{'role': 'user', 'content': 'Lös ekvationen 2x^2-5 = 9'},
#{'role': 'user', 'content': 'Vad är fel med denna mening: "Hej! Jag idag bra mår."'},
#{'role': 'user', 'content': """Översätt till svenska: Hayashi, the Japanese government spokesperson, said Monday that Tokyo is answering the Chinese presence around the islands with vessels of its own.\n\n“We ensure a comprehensive security system for territorial waters by deploying Coast Guard patrol vessels that are consistently superior to other party’s capacity,” Hayashi said.\n\nAny Japanese-Chinese incident in the Senkakus raises the risk of a wider conflict, analysts note, due to Japan’s mutual defense treaty with the United States.\n\nWashington has made clear on numerous occasions that it considers the Senkakus to be covered by the mutual defense pact."""},
#{'role': 'user', 'content': """Vad handlar texten om?\n\nHayashi, the Japanese government spokesperson, said Monday that Tokyo is answering the Chinese presence around the islands with vessels of its own.\n\n“We ensure a comprehensive security system for territorial waters by deploying Coast Guard patrol vessels that are consistently superior to other party’s capacity,” Hayashi said.\n\nAny Japanese-Chinese incident in the Senkakus raises the risk of a wider conflict, analysts note, due to Japan’s mutual defense treaty with the United States.\n\nWashington has made clear on numerous occasions that it considers the Senkakus to be covered by the mutual defense pact."""},
#{'role': 'user', 'content': """Skriv en sci-fi novell som utspelar sig över millenium på en planet runt ett binärt stjärnsystem."""},
{'role': 'user', 'content': 'Hur många helikoptrar kan en människa äta på en gång?'},
]
r = pipe(
messages,
max_length=4096,
do_sample=False,
eos_token_id=tokenizer.vocab['<end_of_turn>']
)
```
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
[More Information Needed]
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Preprocessing [optional]
[More Information Needed]
#### Training Hyperparameters
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
[More Information Needed]
## Evaluation
The model has been evaluated on [Scandeval](https://scandeval.com/swedish-nlg/) swedish subset.


## Environmental Impact
- **Hardware Type:** 8xH100
- **Hours used:** ~96 GPU hours
- **Cloud Provider:** runpod.io
- **Compute Region:** Canada
- **Carbon Emitted:** Minimal
|