rasbt commited on
Commit
3acb7a1
·
verified ·
1 Parent(s): 137c45d

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +237 -0
README.md ADDED
@@ -0,0 +1,237 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - pytorch
7
+ - llama
8
+ - llama-3.2
9
+ ---
10
+
11
+ # Llama 3.2 From Scratch
12
+
13
+
14
+ This repository contains a from-scratch, educational PyTorch implementation of **Llama 3.2 text models** with **minimal code dependencies**. The implementation is **optimized for readability** and intended for learning and research purposes.
15
+
16
+ The from-scratch Llama 3.2 code is based on my code implementation [standalone-llama32-mem-opt.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/07_gpt_to_llama/standalone-llama32-mem-opt.ipynb).
17
+
18
+ ![Llama 3.2 From Scratch](https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/gpt-to-llama/llama32.webp)
19
+
20
+ The model weights included here are PyTorch state dicts converted from the official weights provided by Meta. For original weights, usage terms, and license information, please refer to the original model repositories linked below:
21
+
22
+ - [https://huggingface.co/meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)
23
+ - [https://huggingface.co/meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B)
24
+ - [https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)
25
+ - [https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)
26
+
27
+ Please refer to these repositories above for more information about the models and license information.
28
+
29
+
30
+
31
+  
32
+ ## Usage
33
+
34
+ The section below explain how the model weights can be used via the from-scratch implementation provided in the [`model.py`](model.py) and [`tokenizer.py`](tokenizer.py) files.
35
+
36
+ Alternatively, you can also modify and run the [`generate_example.py`](generate_example.py) file via:
37
+
38
+ ```bash
39
+ python generate_example.py
40
+ ```
41
+
42
+ which uses the Llama 3.2 1B Instruct model by default and prints:
43
+
44
+ ```
45
+ Time: 4.12 sec
46
+ Max memory allocated: 2.91 GB
47
+
48
+
49
+ Output text:
50
+
51
+ Llamas are herbivores, which means they primarily eat plants. Their diet consists mainly of:
52
+
53
+ 1. Grasses: Llamas love to graze on various types of grasses, including tall grasses and grassy meadows.
54
+ 2. Hay: Llamas also eat hay, which is a dry, compressed form of grass or other plants.
55
+ 3. Alfalfa: Alfalfa is a legume that is commonly used as a hay substitute in llama feed.
56
+ 4. Other plants: Llamas will also eat other plants, such as clover, dandelions, and wild grasses.
57
+
58
+ It's worth noting that the specific diet of llamas can vary depending on factors such as the breed,
59
+ ```
60
+
61
+
62
+  
63
+ ### 1) Setup
64
+
65
+ The only dependencies are `torch`, `tiktoken`, and `blobfile`, which can be installed as follows:
66
+
67
+ ```python
68
+ pip install torch tiktoken blobfile
69
+ ```
70
+
71
+ Optionally, you can install the [llms-from-scratch](https://pypi.org/project/llms-from-scratch/) PyPI package if you prefer not to have the `model.py` and `tokenizer.py` files in your local directory:
72
+
73
+ ```python
74
+ pip install llms_from_scratch
75
+ ```
76
+
77
+
78
+  
79
+ ### 2) Model and text generation settings
80
+
81
+ Specify which model to use:
82
+
83
+ ```python
84
+ MODEL_FILE = "llama3.2-1B-instruct.pth"
85
+ # MODEL_FILE = "llama3.2-1B-base.pth"
86
+ # MODEL_FILE = "llama3.2-3B-instruct.pth"
87
+ # MODEL_FILE = "llama3.2-3B-base.pth"
88
+ ```
89
+
90
+ Basic text generation settings that can be defined by the user. Note that the recommended 8192-token context size requires approximately 3 GB of VRAM for the text generation example.
91
+
92
+ ```
93
+ MODEL_CONTEXT_LENGTH = 8192 # Supports up to 131_072
94
+
95
+ # Text generation settings
96
+ if "instruct" in MODEL_FILE:
97
+ PROMPT = "What do llamas eat?"
98
+ else:
99
+ PROMPT = "Llamas eat"
100
+
101
+ MAX_NEW_TOKENS = 150
102
+ TEMPERATURE = 0.
103
+ TOP_K = 1
104
+ ```
105
+
106
+  
107
+ ### 3) Weight download and loading
108
+
109
+ This automatically downloads the weight file based on the model choice above:
110
+
111
+ ```python
112
+ url = f"https://huggingface.co/rasbt/llama-3.2-from-scratch/resolve/main/{MODEL_FILE}"
113
+
114
+ if not os.path.exists(MODEL_FILE):
115
+ urllib.request.urlretrieve(url, MODEL_FILE)
116
+ print(f"Downloaded to {MODEL_FILE}")
117
+ ```
118
+
119
+ The model weights are then loaded as follows:
120
+
121
+ ```python
122
+ import torch
123
+ from model import Llama3Model
124
+ # Alternatively:
125
+ # from llms_from_scratch.llama3 import Llama3Model
126
+
127
+ if "1B" in MODEL_FILE:
128
+ from model import LLAMA32_CONFIG_1B as LLAMA32_CONFIG
129
+ elif "3B" in MODEL_FILE:
130
+ from model import LLAMA32_CONFIG_3B as LLAMA32_CONFIG
131
+ else:
132
+ raise ValueError("Incorrect model file name")
133
+
134
+ LLAMA32_CONFIG["context_length"] = MODEL_CONTEXT_LENGTH
135
+
136
+ model = Llama3Model(LLAMA32_CONFIG)
137
+ model.load_state_dict(torch.load(MODEL_FILE, weights_only=True))
138
+
139
+ device = (
140
+ torch.device("cuda") if torch.cuda.is_available() else
141
+ torch.device("mps") if torch.backends.mps.is_available() else
142
+ torch.device("cpu")
143
+ )
144
+ model.to(device)
145
+ ```
146
+
147
+  
148
+ ### 4) Initialize tokenizer
149
+
150
+ The following code downloads and initializes the tokenizer:
151
+
152
+ ```python
153
+ from tokenizer import Llama3Tokenizer, ChatFormat, clean_text
154
+ # Alternatively:
155
+ # from llms_from_scratch.llama3 Llama3Tokenizer, ChatFormat, clean_text
156
+
157
+ TOKENIZER_FILE = "tokenizer.model"
158
+
159
+ url = f"https://huggingface.co/rasbt/llama-3.2-from-scratch/resolve/main/{TOKENIZER_FILE}"
160
+
161
+ if not os.path.exists(TOKENIZER_FILE):
162
+ urllib.request.urlretrieve(url, TOKENIZER_FILE)
163
+ print(f"Downloaded to {TOKENIZER_FILE}")
164
+
165
+ tokenizer = Tokenizer("tokenizer.model")
166
+
167
+ if "instruct" in MODEL_FILE:
168
+ tokenizer = ChatFormat(tokenizer)
169
+ ```
170
+
171
+  
172
+ ### 5) Generating text
173
+
174
+ Lastly, we can generate text via the following code:
175
+
176
+ ```python
177
+ from model import (
178
+ generate,
179
+ text_to_token_ids,
180
+ token_ids_to_text
181
+ )
182
+ # Alternatively:
183
+ # from llms_from_scratch.ch05 import (
184
+ # generate,
185
+ # text_to_token_ids,
186
+ # token_ids_to_text
187
+ # )
188
+
189
+ torch.manual_seed(123)
190
+
191
+ start = time.time()
192
+
193
+ token_ids = generate(
194
+ model=model,
195
+ idx=text_to_token_ids(PROMPT, tokenizer).to(device),
196
+ max_new_tokens=MAX_NEW_TOKENS,
197
+ context_size=LLAMA32_CONFIG["context_length"],
198
+ top_k=TOP_K,
199
+ temperature=TEMPERATURE
200
+ )
201
+
202
+ print(f"Time: {time.time() - start:.2f} sec")
203
+
204
+ if torch.cuda.is_available():
205
+ max_mem_bytes = torch.cuda.max_memory_allocated()
206
+ max_mem_gb = max_mem_bytes / (1024 ** 3)
207
+ print(f"Max memory allocated: {max_mem_gb:.2f} GB")
208
+
209
+ output_text = token_ids_to_text(token_ids, tokenizer)
210
+
211
+ if "instruct" in MODEL_FILE:
212
+ output_text = clean_text(output_text)
213
+
214
+ print("\n\nOutput text:\n\n", output_text)
215
+ ```
216
+
217
+ When using the Llama 3.2 1B Instruct model, the output should look similar to the one shown below:
218
+
219
+
220
+
221
+ ```
222
+ Time: 4.12 sec
223
+ Max memory allocated: 2.91 GB
224
+
225
+
226
+ Output text:
227
+
228
+ Llamas are herbivores, which means they primarily eat plants. Their diet consists mainly of:
229
+
230
+ 1. Grasses: Llamas love to graze on various types of grasses, including tall grasses and grassy meadows.
231
+ 2. Hay: Llamas also eat hay, which is a dry, compressed form of grass or other plants.
232
+ 3. Alfalfa: Alfalfa is a legume that is commonly used as a hay substitute in llama feed.
233
+ 4. Other plants: Llamas will also eat other plants, such as clover, dandelions, and wild grasses.
234
+
235
+ It's worth noting that the specific diet of llamas can vary depending on factors such as the breed,
236
+ ```
237
+