Text Generation
Transformers
PyTorch
xglm
cstorm125 commited on
Commit
31e4df4
·
1 Parent(s): e48de3b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -80
README.md CHANGED
@@ -11,11 +11,7 @@ language:
11
  - vi
12
  pipeline_tag: text-generation
13
  ---
14
- # Model Card for Model ID
15
-
16
- <!-- Provide a quick summary of what the model is/does. -->
17
-
18
- This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
19
 
20
  ## Model Details
21
 
@@ -23,22 +19,20 @@ This modelcard aims to be a base template for new models. It has been generated
23
 
24
  <!-- Provide a longer summary of what this model is. -->
25
 
 
26
 
 
 
 
 
27
 
28
- - **Developed by:** [More Information Needed]
29
- - **Shared by [optional]:** [More Information Needed]
30
- - **Model type:** [More Information Needed]
31
- - **Language(s) (NLP):** [More Information Needed]
32
- - **License:** [More Information Needed]
33
- - **Finetuned from model [optional]:** [More Information Needed]
34
-
35
- ### Model Sources [optional]
36
 
37
  <!-- Provide the basic links for the model. -->
38
 
39
- - **Repository:** [More Information Needed]
40
- - **Paper [optional]:** [More Information Needed]
41
- - **Demo [optional]:** [More Information Needed]
42
 
43
  ## Uses
44
 
@@ -48,25 +42,25 @@ This modelcard aims to be a base template for new models. It has been generated
48
 
49
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
50
 
51
- [More Information Needed]
52
 
53
- ### Downstream Use [optional]
54
 
55
  <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
56
 
57
- [More Information Needed]
58
 
59
  ### Out-of-Scope Use
60
 
61
  <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
62
 
63
- [More Information Needed]
64
 
65
  ## Bias, Risks, and Limitations
66
 
67
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
68
 
69
- [More Information Needed]
70
 
71
  ### Recommendations
72
 
@@ -78,7 +72,40 @@ Users (both direct and downstream) should be made aware of the risks, biases and
78
 
79
  Use the code below to get started with the model.
80
 
81
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
 
83
  ## Training Details
84
 
@@ -86,65 +113,31 @@ Use the code below to get started with the model.
86
 
87
  <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
88
 
89
- [More Information Needed]
90
 
91
  ### Training Procedure
92
 
93
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
94
 
95
- #### Preprocessing [optional]
96
 
97
- [More Information Needed]
98
 
99
 
100
  #### Training Hyperparameters
101
 
102
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
103
-
104
- #### Speeds, Sizes, Times [optional]
105
 
106
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
107
-
108
- [More Information Needed]
109
 
110
  ## Evaluation
111
 
112
  <!-- This section describes the evaluation protocols and provides the results. -->
113
 
114
- ### Testing Data, Factors & Metrics
115
-
116
- #### Testing Data
117
-
118
- <!-- This should link to a Data Card if possible. -->
119
-
120
- [More Information Needed]
121
-
122
- #### Factors
123
-
124
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
125
-
126
- [More Information Needed]
127
-
128
- #### Metrics
129
-
130
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
131
-
132
- [More Information Needed]
133
-
134
- ### Results
135
-
136
- [More Information Needed]
137
 
138
  #### Summary
139
 
140
 
141
-
142
- ## Model Examination [optional]
143
-
144
- <!-- Relevant interpretability work for the model goes here -->
145
-
146
- [More Information Needed]
147
-
148
  ## Environmental Impact
149
 
150
  <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
@@ -175,7 +168,7 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
175
 
176
  [More Information Needed]
177
 
178
- ## Citation [optional]
179
 
180
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
181
 
@@ -183,24 +176,6 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
183
 
184
  [More Information Needed]
185
 
186
- **APA:**
187
-
188
- [More Information Needed]
189
-
190
- ## Glossary [optional]
191
-
192
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
193
-
194
- [More Information Needed]
195
-
196
- ## More Information [optional]
197
-
198
- [More Information Needed]
199
-
200
- ## Model Card Authors [optional]
201
-
202
- [More Information Needed]
203
-
204
  ## Model Card Contact
205
 
206
  [More Information Needed]
 
11
  - vi
12
  pipeline_tag: text-generation
13
  ---
14
+ # Model Card for WangChanGLM 🐘 - The Thai-Turned-Multilingual Instruction-Following Model
 
 
 
 
15
 
16
  ## Model Details
17
 
 
19
 
20
  <!-- Provide a longer summary of what this model is. -->
21
 
22
+ WangChanGLM is a Thai-turned-multilingual, instruction-finetuned Facebook XGLM-7.5B using open-source, commercially permissible datasets (LAION OIG chip2 and infill_dbpedia, DataBricks Dolly v2, OpenAI TL;DR, and Hello-SimpleAI HC3; about 400k examples), released under CC-BY SA 4.0. The models are trained to perform a subset of instruction-following tasks we found most relevant namely: reading comprehension, brainstorming, and creative writing. We provide the weights for a model finetuned on an English-only dataset ([wangchanglm-7.5B-sft-en](https://huggingface.co/pythainlp/wangchanglm-7.5B-sft-en)) and another checkpoint further finetuned on Google-Translated Thai dataset ([wangchanglm-7.5B-sft-enth](https://huggingface.co/pythainlp/wangchanglm-7.5B-sft-enth)). We perform Vicuna-style evaluation using both humans and ChatGPT (in our case, `gpt-3.5-turbo` since we are still on the waitlist for `gpt-4`) and observe some discrepancies between the two types of annoators. All training and evaluation codes are shared under the [Apache-2.0 license](https://github.com/pythainlp/wangchanglm/blob/main/LICENSE) in our Github, as well as datasets and model weights on [HuggingFace](https://huggingface.co/pythainlp). See our live demo [here]().
23
 
24
+ - **Developed by:** [PyThaiNLP](https://www.github.com/pythainlp) and [VISTEC-depa AI Research Institute of Thailand](https://huggingface.co/airesearch)
25
+ - **Model type:** Finetuned [XGLM-7.5B](https://huggingface.co/facebook/xglm-7.5B)
26
+ - **Language(s) (NLP)**: `en`, `th`, `ja`, `vi` capacibilities evaluated, theoretically all 30 languages of [XGLM-7.5B](https://huggingface.co/facebook/xglm-7.5B)
27
+ - **License:** [CC-BY SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/)
28
 
29
+ ### Model Sources
 
 
 
 
 
 
 
30
 
31
  <!-- Provide the basic links for the model. -->
32
 
33
+ - **Repository:** [pythainlp/wangchanglm](https://www.github.com/pythainlp/wangchanglm)
34
+ - **Blog:** [Medium]()
35
+ - **Demo:** [Colab notebook]()
36
 
37
  ## Uses
38
 
 
42
 
43
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
44
 
45
+ Intended to be use as an instruction-following model for reading comprehension, brainstorming and creative writing.
46
 
47
+ ### Downstream Use
48
 
49
  <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
50
 
51
+ The model can be finetuned for any typical instruction-following use cases.
52
 
53
  ### Out-of-Scope Use
54
 
55
  <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
56
 
57
+ We do not expect the models to perform well in math problems, reasoning, and factfulness. We intentionally filter out training examples from these use cases.
58
 
59
  ## Bias, Risks, and Limitations
60
 
61
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
62
 
63
+ We noticed similar limitations to other finetuned instruction followers such as math problems, reasoning, and factfulness. Even though the models do not perform on the level that we expect them to be abused, they do contain undesirable biases and toxicity and should be further optimized for your particular use cases.
64
 
65
  ### Recommendations
66
 
 
72
 
73
  Use the code below to get started with the model.
74
 
75
+ ```
76
+ name_model = "pythainlp/wangchanglm-7.5B-sft-en"
77
+ model = AutoModelForCausalLM.from_pretrained(
78
+ model_name,
79
+ return_dict=True,
80
+ load_in_8bit=True ,
81
+ device_map="auto",
82
+ torch_dtype=torch.float16,
83
+ offload_folder="./",
84
+ low_cpu_mem_usage=True,
85
+ )
86
+ text = "เล่นหุ้นยังไงให้รวย"
87
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
88
+ batch = tokenizer(text, return_tensors="pt")
89
+ with torch.cuda.amp.autocast():
90
+ output_tokens = model.generate(
91
+ input_ids=batch["input_ids"],
92
+ max_new_tokens=max_gen_len, # 512
93
+ begin_suppress_tokens = exclude_ids,
94
+ no_repeat_ngram_size=2,
95
+
96
+ #oasst k50
97
+ top_k=50,
98
+ top_p=top_p, # 0.95
99
+ typical_p=1.,
100
+ temperature=temperature, # 0.9
101
+
102
+ # #oasst typical3
103
+ # typical_p = 0.3,
104
+ # temperature = 0.8,
105
+ # repetition_penalty = 1.2,
106
+ )
107
+ tokenizer.decode(output_tokens[0], skip_special_tokens=True)
108
+ ```
109
 
110
  ## Training Details
111
 
 
113
 
114
  <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
115
 
116
+ Finetuning datasets are sourced from [LAION OIG chip2 and infill_dbpedia](https://huggingface.co/datasets/laion/OIG) ([Apache-2.0](https://github.com/pythainlp/wangchanglm/blob/main/LICENSE)), [DataBricks Dolly v2](https://github.com/databrickslabs/dolly) ([Apache-2.0](https://github.com/pythainlp/wangchanglm/blob/main/LICENSE)), [OpenAI TL;DR](https://github.com/openai/summarize-from-feedback) ([MIT](https://opensource.org/license/mit/)), and [Hello-SimpleAI HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3) ([CC-BY SA](https://creativecommons.org/licenses/by-sa/4.0/)).
117
 
118
  ### Training Procedure
119
 
120
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
121
 
122
+ #### Preprocessing
123
 
124
+ See [pythainlp/wangchanglm](https://www.github.com/pythainlp/wangchanglm).
125
 
126
 
127
  #### Training Hyperparameters
128
 
129
+ - **Training regime:** LoRA with 4 GPUs. See more details at [pythainlp/wangchanglm](https://www.github.com/pythainlp/wangchanglm).
 
 
130
 
 
 
 
131
 
132
  ## Evaluation
133
 
134
  <!-- This section describes the evaluation protocols and provides the results. -->
135
 
136
+ We performed automatic evaluation in the style of [Vicuna](https://vicuna.lmsys.org/) and human evaluation. See more details from our [blog]().
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137
 
138
  #### Summary
139
 
140
 
 
 
 
 
 
 
 
141
  ## Environmental Impact
142
 
143
  <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
 
168
 
169
  [More Information Needed]
170
 
171
+ ## Citation
172
 
173
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
 
 
176
 
177
  [More Information Needed]
178
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
179
  ## Model Card Contact
180
 
181
  [More Information Needed]