Update README.md
Browse files
README.md
CHANGED
@@ -11,11 +11,7 @@ language:
|
|
11 |
- vi
|
12 |
pipeline_tag: text-generation
|
13 |
---
|
14 |
-
# Model Card for Model
|
15 |
-
|
16 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
17 |
-
|
18 |
-
This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
|
19 |
|
20 |
## Model Details
|
21 |
|
@@ -23,22 +19,20 @@ This modelcard aims to be a base template for new models. It has been generated
|
|
23 |
|
24 |
<!-- Provide a longer summary of what this model is. -->
|
25 |
|
|
|
26 |
|
|
|
|
|
|
|
|
|
27 |
|
28 |
-
|
29 |
-
- **Shared by [optional]:** [More Information Needed]
|
30 |
-
- **Model type:** [More Information Needed]
|
31 |
-
- **Language(s) (NLP):** [More Information Needed]
|
32 |
-
- **License:** [More Information Needed]
|
33 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
34 |
-
|
35 |
-
### Model Sources [optional]
|
36 |
|
37 |
<!-- Provide the basic links for the model. -->
|
38 |
|
39 |
-
- **Repository:** [
|
40 |
-
- **
|
41 |
-
- **Demo
|
42 |
|
43 |
## Uses
|
44 |
|
@@ -48,25 +42,25 @@ This modelcard aims to be a base template for new models. It has been generated
|
|
48 |
|
49 |
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
50 |
|
51 |
-
|
52 |
|
53 |
-
### Downstream Use
|
54 |
|
55 |
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
56 |
|
57 |
-
|
58 |
|
59 |
### Out-of-Scope Use
|
60 |
|
61 |
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
62 |
|
63 |
-
|
64 |
|
65 |
## Bias, Risks, and Limitations
|
66 |
|
67 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
68 |
|
69 |
-
|
70 |
|
71 |
### Recommendations
|
72 |
|
@@ -78,7 +72,40 @@ Users (both direct and downstream) should be made aware of the risks, biases and
|
|
78 |
|
79 |
Use the code below to get started with the model.
|
80 |
|
81 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
82 |
|
83 |
## Training Details
|
84 |
|
@@ -86,65 +113,31 @@ Use the code below to get started with the model.
|
|
86 |
|
87 |
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
88 |
|
89 |
-
[
|
90 |
|
91 |
### Training Procedure
|
92 |
|
93 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
94 |
|
95 |
-
#### Preprocessing
|
96 |
|
97 |
-
[
|
98 |
|
99 |
|
100 |
#### Training Hyperparameters
|
101 |
|
102 |
-
- **Training regime:**
|
103 |
-
|
104 |
-
#### Speeds, Sizes, Times [optional]
|
105 |
|
106 |
-
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
107 |
-
|
108 |
-
[More Information Needed]
|
109 |
|
110 |
## Evaluation
|
111 |
|
112 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
113 |
|
114 |
-
|
115 |
-
|
116 |
-
#### Testing Data
|
117 |
-
|
118 |
-
<!-- This should link to a Data Card if possible. -->
|
119 |
-
|
120 |
-
[More Information Needed]
|
121 |
-
|
122 |
-
#### Factors
|
123 |
-
|
124 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
125 |
-
|
126 |
-
[More Information Needed]
|
127 |
-
|
128 |
-
#### Metrics
|
129 |
-
|
130 |
-
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
131 |
-
|
132 |
-
[More Information Needed]
|
133 |
-
|
134 |
-
### Results
|
135 |
-
|
136 |
-
[More Information Needed]
|
137 |
|
138 |
#### Summary
|
139 |
|
140 |
|
141 |
-
|
142 |
-
## Model Examination [optional]
|
143 |
-
|
144 |
-
<!-- Relevant interpretability work for the model goes here -->
|
145 |
-
|
146 |
-
[More Information Needed]
|
147 |
-
|
148 |
## Environmental Impact
|
149 |
|
150 |
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
@@ -175,7 +168,7 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
|
|
175 |
|
176 |
[More Information Needed]
|
177 |
|
178 |
-
## Citation
|
179 |
|
180 |
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
181 |
|
@@ -183,24 +176,6 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
|
|
183 |
|
184 |
[More Information Needed]
|
185 |
|
186 |
-
**APA:**
|
187 |
-
|
188 |
-
[More Information Needed]
|
189 |
-
|
190 |
-
## Glossary [optional]
|
191 |
-
|
192 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
193 |
-
|
194 |
-
[More Information Needed]
|
195 |
-
|
196 |
-
## More Information [optional]
|
197 |
-
|
198 |
-
[More Information Needed]
|
199 |
-
|
200 |
-
## Model Card Authors [optional]
|
201 |
-
|
202 |
-
[More Information Needed]
|
203 |
-
|
204 |
## Model Card Contact
|
205 |
|
206 |
[More Information Needed]
|
|
|
11 |
- vi
|
12 |
pipeline_tag: text-generation
|
13 |
---
|
14 |
+
# Model Card for WangChanGLM 🐘 - The Thai-Turned-Multilingual Instruction-Following Model
|
|
|
|
|
|
|
|
|
15 |
|
16 |
## Model Details
|
17 |
|
|
|
19 |
|
20 |
<!-- Provide a longer summary of what this model is. -->
|
21 |
|
22 |
+
WangChanGLM is a Thai-turned-multilingual, instruction-finetuned Facebook XGLM-7.5B using open-source, commercially permissible datasets (LAION OIG chip2 and infill_dbpedia, DataBricks Dolly v2, OpenAI TL;DR, and Hello-SimpleAI HC3; about 400k examples), released under CC-BY SA 4.0. The models are trained to perform a subset of instruction-following tasks we found most relevant namely: reading comprehension, brainstorming, and creative writing. We provide the weights for a model finetuned on an English-only dataset ([wangchanglm-7.5B-sft-en](https://huggingface.co/pythainlp/wangchanglm-7.5B-sft-en)) and another checkpoint further finetuned on Google-Translated Thai dataset ([wangchanglm-7.5B-sft-enth](https://huggingface.co/pythainlp/wangchanglm-7.5B-sft-enth)). We perform Vicuna-style evaluation using both humans and ChatGPT (in our case, `gpt-3.5-turbo` since we are still on the waitlist for `gpt-4`) and observe some discrepancies between the two types of annoators. All training and evaluation codes are shared under the [Apache-2.0 license](https://github.com/pythainlp/wangchanglm/blob/main/LICENSE) in our Github, as well as datasets and model weights on [HuggingFace](https://huggingface.co/pythainlp). See our live demo [here]().
|
23 |
|
24 |
+
- **Developed by:** [PyThaiNLP](https://www.github.com/pythainlp) and [VISTEC-depa AI Research Institute of Thailand](https://huggingface.co/airesearch)
|
25 |
+
- **Model type:** Finetuned [XGLM-7.5B](https://huggingface.co/facebook/xglm-7.5B)
|
26 |
+
- **Language(s) (NLP)**: `en`, `th`, `ja`, `vi` capacibilities evaluated, theoretically all 30 languages of [XGLM-7.5B](https://huggingface.co/facebook/xglm-7.5B)
|
27 |
+
- **License:** [CC-BY SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/)
|
28 |
|
29 |
+
### Model Sources
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
|
31 |
<!-- Provide the basic links for the model. -->
|
32 |
|
33 |
+
- **Repository:** [pythainlp/wangchanglm](https://www.github.com/pythainlp/wangchanglm)
|
34 |
+
- **Blog:** [Medium]()
|
35 |
+
- **Demo:** [Colab notebook]()
|
36 |
|
37 |
## Uses
|
38 |
|
|
|
42 |
|
43 |
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
44 |
|
45 |
+
Intended to be use as an instruction-following model for reading comprehension, brainstorming and creative writing.
|
46 |
|
47 |
+
### Downstream Use
|
48 |
|
49 |
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
50 |
|
51 |
+
The model can be finetuned for any typical instruction-following use cases.
|
52 |
|
53 |
### Out-of-Scope Use
|
54 |
|
55 |
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
56 |
|
57 |
+
We do not expect the models to perform well in math problems, reasoning, and factfulness. We intentionally filter out training examples from these use cases.
|
58 |
|
59 |
## Bias, Risks, and Limitations
|
60 |
|
61 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
62 |
|
63 |
+
We noticed similar limitations to other finetuned instruction followers such as math problems, reasoning, and factfulness. Even though the models do not perform on the level that we expect them to be abused, they do contain undesirable biases and toxicity and should be further optimized for your particular use cases.
|
64 |
|
65 |
### Recommendations
|
66 |
|
|
|
72 |
|
73 |
Use the code below to get started with the model.
|
74 |
|
75 |
+
```
|
76 |
+
name_model = "pythainlp/wangchanglm-7.5B-sft-en"
|
77 |
+
model = AutoModelForCausalLM.from_pretrained(
|
78 |
+
model_name,
|
79 |
+
return_dict=True,
|
80 |
+
load_in_8bit=True ,
|
81 |
+
device_map="auto",
|
82 |
+
torch_dtype=torch.float16,
|
83 |
+
offload_folder="./",
|
84 |
+
low_cpu_mem_usage=True,
|
85 |
+
)
|
86 |
+
text = "เล่นหุ้นยังไงให้รวย"
|
87 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
88 |
+
batch = tokenizer(text, return_tensors="pt")
|
89 |
+
with torch.cuda.amp.autocast():
|
90 |
+
output_tokens = model.generate(
|
91 |
+
input_ids=batch["input_ids"],
|
92 |
+
max_new_tokens=max_gen_len, # 512
|
93 |
+
begin_suppress_tokens = exclude_ids,
|
94 |
+
no_repeat_ngram_size=2,
|
95 |
+
|
96 |
+
#oasst k50
|
97 |
+
top_k=50,
|
98 |
+
top_p=top_p, # 0.95
|
99 |
+
typical_p=1.,
|
100 |
+
temperature=temperature, # 0.9
|
101 |
+
|
102 |
+
# #oasst typical3
|
103 |
+
# typical_p = 0.3,
|
104 |
+
# temperature = 0.8,
|
105 |
+
# repetition_penalty = 1.2,
|
106 |
+
)
|
107 |
+
tokenizer.decode(output_tokens[0], skip_special_tokens=True)
|
108 |
+
```
|
109 |
|
110 |
## Training Details
|
111 |
|
|
|
113 |
|
114 |
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
115 |
|
116 |
+
Finetuning datasets are sourced from [LAION OIG chip2 and infill_dbpedia](https://huggingface.co/datasets/laion/OIG) ([Apache-2.0](https://github.com/pythainlp/wangchanglm/blob/main/LICENSE)), [DataBricks Dolly v2](https://github.com/databrickslabs/dolly) ([Apache-2.0](https://github.com/pythainlp/wangchanglm/blob/main/LICENSE)), [OpenAI TL;DR](https://github.com/openai/summarize-from-feedback) ([MIT](https://opensource.org/license/mit/)), and [Hello-SimpleAI HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3) ([CC-BY SA](https://creativecommons.org/licenses/by-sa/4.0/)).
|
117 |
|
118 |
### Training Procedure
|
119 |
|
120 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
121 |
|
122 |
+
#### Preprocessing
|
123 |
|
124 |
+
See [pythainlp/wangchanglm](https://www.github.com/pythainlp/wangchanglm).
|
125 |
|
126 |
|
127 |
#### Training Hyperparameters
|
128 |
|
129 |
+
- **Training regime:** LoRA with 4 GPUs. See more details at [pythainlp/wangchanglm](https://www.github.com/pythainlp/wangchanglm).
|
|
|
|
|
130 |
|
|
|
|
|
|
|
131 |
|
132 |
## Evaluation
|
133 |
|
134 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
135 |
|
136 |
+
We performed automatic evaluation in the style of [Vicuna](https://vicuna.lmsys.org/) and human evaluation. See more details from our [blog]().
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
137 |
|
138 |
#### Summary
|
139 |
|
140 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
141 |
## Environmental Impact
|
142 |
|
143 |
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
|
|
168 |
|
169 |
[More Information Needed]
|
170 |
|
171 |
+
## Citation
|
172 |
|
173 |
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
174 |
|
|
|
176 |
|
177 |
[More Information Needed]
|
178 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
179 |
## Model Card Contact
|
180 |
|
181 |
[More Information Needed]
|