Commit
·
ac2bd19
1
Parent(s):
ab3f058
Update README.md
Browse files
README.md
CHANGED
@@ -120,30 +120,6 @@ Text classification model based on [`xlm-roberta-base`](https://huggingface.co/x
|
|
120 |
|
121 |
The model was fine-tuned on the "X-GENRE" dataset which consists of three genre datasets: CORE, FTD and GINCO dataset. Each of the datasets has their own genre schema, so they were combined into a joint schema ("X-GENRE" schema) based on the comparison of labels and cross-dataset experiments (described in details [here](https://github.com/TajaKuzman/Genre-Datasets-Comparison)).
|
122 |
|
123 |
-
## X-GENRE categories
|
124 |
-
|
125 |
-
List of labels:
|
126 |
-
```
|
127 |
-
labels_list=['Other', 'Information/Explanation', 'News', 'Instruction', 'Opinion/Argumentation', 'Forum', 'Prose/Lyrical', 'Legal', 'Promotion'],
|
128 |
-
|
129 |
-
labels_map={'Other': 0, 'Information/Explanation': 1, 'News': 2, 'Instruction': 3, 'Opinion/Argumentation': 4, 'Forum': 5, 'Prose/Lyrical': 6, 'Legal': 7, 'Promotion': 8}
|
130 |
-
|
131 |
-
```
|
132 |
-
|
133 |
-
Description of labels:
|
134 |
-
|
135 |
-
| Label | Description | Examples |
|
136 |
-
|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
137 |
-
| Information/Explanation | An objective text that describes or presents an event, a person, a thing, a concept etc. Its main purpose is to inform the reader about something. Common features: objective/factual, explanation/definition of a concept (x is …), enumeration. | research article, encyclopedia article, informational blog, product specification, course materials, general information, job description, manual, horoscope, travel guide, glossaries, historical article, biographical story/history. |
|
138 |
-
| Instruction | An objective text which instructs the readers on how to do something. Common features: multiple steps/actions, chronological order, 1st person plural or 2nd person, modality (must, have to, need to, can, etc.), adverbial clauses of manner (in a way that), of condition (if), of time (after …). | how-to texts, recipes, technical support |
|
139 |
-
| Legal | An objective formal text that contains legal terms and is clearly structured. The name of the text type is often included in the headline (contract, rules, amendment, general terms and conditions, etc.). Common features: objective/factual, legal terms, 3rd person. | small print, software license, proclamation, terms and conditions, contracts, law, copyright notices, university regulation |
|
140 |
-
| News | An objective or subjective text which reports on an event recent at the time of writing or coming in the near future. Common features: adverbs/adverbial clauses of time and/or place (dates, places), many proper nouns, direct or reported speech, past tense. | news report, sports report, travel blog, reportage, police report, announcement |
|
141 |
-
| Opinion/Argumentation | A subjective text in which the authors convey their opinion or narrate their experience. It includes promotion of an ideology and other non-commercial causes. This genre includes a subjective narration of a personal experience as well. Common features: adjectives/adverbs that convey opinion, words that convey (un)certainty (certainly, surely), 1st person, exclamation marks. | review, blog (personal blog, travel blog), editorial, advice, letter to editor, persuasive article or essay, formal speech, pamphlet, political propaganda, columns, political manifesto |
|
142 |
-
| Promotion | A subjective text intended to sell or promote an event, product, or service. It addresses the readers, often trying to convince them to participate in something or buy something. Common features: contains adjectives/adverbs that promote something (high-quality, perfect, amazing), comparative and superlative forms of adjectives and adverbs (the best, the greatest, the cheapest), addressing the reader (usage of 2nd person), exclamation marks. | advertisement, promotion of a product (e-shops), promotion of an accommodation, promotion of company's services, invitation to an event |
|
143 |
-
| Forum | A text in which people discuss a certain topic in form of comments. Common features: multiple authors, informal language, subjective (the writers express their opinions), written in 1st person. | discussion forum, reader/viewer responses, QA forum |
|
144 |
-
| Prose/Lyrical | A literary text that consists of paragraphs or verses. A literary text is deemed to have no other practical purpose than to give pleasure to the reader. Often the author pays attention to the aesthetic appearance of the text. It can be considered as art. | lyrics, poem, prayer, joke, novel, short story |
|
145 |
-
| Other | A text that which does not fall under any of other genre categories. | |
|
146 |
-
|
147 |
### Fine-tuning hyperparameters
|
148 |
|
149 |
Fine-tuning was performed with `simpletransformers`. Beforehand, a brief hyperparameter optimization was performed and the presumed optimal hyperparameters are:
|
@@ -196,6 +172,31 @@ predictions
|
|
196 |
|
197 |
Use example for prediction on a dataset, using batch processing, is available via [Google Collab](https://colab.research.google.com/drive/1yC4L_p2t3oMViC37GqSjJynQH-EWyhLr?usp=sharing).
|
198 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
199 |
## Performance
|
200 |
|
201 |
### Comparison with other models at in-dataset and cross-dataset experiments
|
|
|
120 |
|
121 |
The model was fine-tuned on the "X-GENRE" dataset which consists of three genre datasets: CORE, FTD and GINCO dataset. Each of the datasets has their own genre schema, so they were combined into a joint schema ("X-GENRE" schema) based on the comparison of labels and cross-dataset experiments (described in details [here](https://github.com/TajaKuzman/Genre-Datasets-Comparison)).
|
122 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
123 |
### Fine-tuning hyperparameters
|
124 |
|
125 |
Fine-tuning was performed with `simpletransformers`. Beforehand, a brief hyperparameter optimization was performed and the presumed optimal hyperparameters are:
|
|
|
172 |
|
173 |
Use example for prediction on a dataset, using batch processing, is available via [Google Collab](https://colab.research.google.com/drive/1yC4L_p2t3oMViC37GqSjJynQH-EWyhLr?usp=sharing).
|
174 |
|
175 |
+
## X-GENRE categories
|
176 |
+
|
177 |
+
List of labels:
|
178 |
+
```
|
179 |
+
labels_list=['Other', 'Information/Explanation', 'News', 'Instruction', 'Opinion/Argumentation', 'Forum', 'Prose/Lyrical', 'Legal', 'Promotion'],
|
180 |
+
|
181 |
+
labels_map={'Other': 0, 'Information/Explanation': 1, 'News': 2, 'Instruction': 3, 'Opinion/Argumentation': 4, 'Forum': 5, 'Prose/Lyrical': 6, 'Legal': 7, 'Promotion': 8}
|
182 |
+
|
183 |
+
```
|
184 |
+
|
185 |
+
Description of labels:
|
186 |
+
|
187 |
+
| Label | Description | Examples |
|
188 |
+
|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
189 |
+
| Information/Explanation | An objective text that describes or presents an event, a person, a thing, a concept etc. Its main purpose is to inform the reader about something. Common features: objective/factual, explanation/definition of a concept (x is …), enumeration. | research article, encyclopedia article, informational blog, product specification, course materials, general information, job description, manual, horoscope, travel guide, glossaries, historical article, biographical story/history. |
|
190 |
+
| Instruction | An objective text which instructs the readers on how to do something. Common features: multiple steps/actions, chronological order, 1st person plural or 2nd person, modality (must, have to, need to, can, etc.), adverbial clauses of manner (in a way that), of condition (if), of time (after …). | how-to texts, recipes, technical support |
|
191 |
+
| Legal | An objective formal text that contains legal terms and is clearly structured. The name of the text type is often included in the headline (contract, rules, amendment, general terms and conditions, etc.). Common features: objective/factual, legal terms, 3rd person. | small print, software license, proclamation, terms and conditions, contracts, law, copyright notices, university regulation |
|
192 |
+
| News | An objective or subjective text which reports on an event recent at the time of writing or coming in the near future. Common features: adverbs/adverbial clauses of time and/or place (dates, places), many proper nouns, direct or reported speech, past tense. | news report, sports report, travel blog, reportage, police report, announcement |
|
193 |
+
| Opinion/Argumentation | A subjective text in which the authors convey their opinion or narrate their experience. It includes promotion of an ideology and other non-commercial causes. This genre includes a subjective narration of a personal experience as well. Common features: adjectives/adverbs that convey opinion, words that convey (un)certainty (certainly, surely), 1st person, exclamation marks. | review, blog (personal blog, travel blog), editorial, advice, letter to editor, persuasive article or essay, formal speech, pamphlet, political propaganda, columns, political manifesto |
|
194 |
+
| Promotion | A subjective text intended to sell or promote an event, product, or service. It addresses the readers, often trying to convince them to participate in something or buy something. Common features: contains adjectives/adverbs that promote something (high-quality, perfect, amazing), comparative and superlative forms of adjectives and adverbs (the best, the greatest, the cheapest), addressing the reader (usage of 2nd person), exclamation marks. | advertisement, promotion of a product (e-shops), promotion of an accommodation, promotion of company's services, invitation to an event |
|
195 |
+
| Forum | A text in which people discuss a certain topic in form of comments. Common features: multiple authors, informal language, subjective (the writers express their opinions), written in 1st person. | discussion forum, reader/viewer responses, QA forum |
|
196 |
+
| Prose/Lyrical | A literary text that consists of paragraphs or verses. A literary text is deemed to have no other practical purpose than to give pleasure to the reader. Often the author pays attention to the aesthetic appearance of the text. It can be considered as art. | lyrics, poem, prayer, joke, novel, short story |
|
197 |
+
| Other | A text that which does not fall under any of other genre categories. | |
|
198 |
+
|
199 |
+
|
200 |
## Performance
|
201 |
|
202 |
### Comparison with other models at in-dataset and cross-dataset experiments
|