update README text color
Browse files
README.md
CHANGED
|
@@ -12,13 +12,13 @@ tags:
|
|
| 12 |
license: cc-by-nc-sa-4.0
|
| 13 |
---
|
| 14 |
|
| 15 |
-
# BertForSequenceClassification model (Classical Chinese)
|
| 16 |
[](https://colab.research.google.com/drive/1jVu2LrNwkLolItPALKGNjeT6iCfzF8Ic?usp=sharing/)
|
| 17 |
|
| 18 |
-
This BertForSequenceClassification Classical Chinese model is intended to predict whether a Classical Chinese sentence is a letter title (书信标题) or not. This model is first inherited from the BERT base Chinese model (MLM), and finetuned using a large corpus of Classical Chinese language (3GB textual dataset), then concatenated with the BertForSequenceClassification architecture to perform a binary classification task.
|
| 19 |
-
* Labels: 0 = non-letter, 1 = letter
|
| 20 |
|
| 21 |
-
## Model description
|
| 22 |
|
| 23 |
The BertForSequenceClassification model architecture inherits the BERT base model and concatenates a fully-connected linear layer to perform a binary-class classification task.More precisely, it
|
| 24 |
was pretrained with two objectives:
|
|
@@ -27,17 +27,17 @@ was pretrained with two objectives:
|
|
| 27 |
|
| 28 |
- Sequence classification: the model concatenates a fully-connected linear layer to output the probability of each class. In our binary classification task, the final linear layer has two classes.
|
| 29 |
|
| 30 |
-
## Intended uses & limitations
|
| 31 |
|
| 32 |
Note that this model is primiarly aimed at predicting whether a Classical Chinese sentence is a letter title (书信标题) or not.
|
| 33 |
|
| 34 |
-
### How to use
|
| 35 |
|
| 36 |
Note that this model is primiarly aimed at predicting whether a Classical Chinese sentence is a letter title (书信标题) or not.
|
| 37 |
|
| 38 |
Here is how to use this model to get the features of a given text in PyTorch:
|
| 39 |
|
| 40 |
-
1. Import model and packages
|
| 41 |
```python
|
| 42 |
from transformers import BertTokenizer
|
| 43 |
from transformers import BertForSequenceClassification
|
|
@@ -51,7 +51,7 @@ model = BertForSequenceClassification.from_pretrained('cbdb/ClassicalChineseLett
|
|
| 51 |
output_hidden_states=False)
|
| 52 |
```
|
| 53 |
|
| 54 |
-
2. Make a prediction
|
| 55 |
```python
|
| 56 |
max_seq_len = 512
|
| 57 |
|
|
@@ -86,7 +86,7 @@ label2idx = {'not-letter': 0,'letter': 1}
|
|
| 86 |
idx2label = {v:k for k,v in label2idx.items()}
|
| 87 |
```
|
| 88 |
|
| 89 |
-
3. Change your sentence here
|
| 90 |
```python
|
| 91 |
label2idx = {'not-letter': 0,'letter': 1}
|
| 92 |
idx2label = {v:k for k,v in label2idx.items()}
|
|
@@ -97,8 +97,10 @@ print(f'The predicted probability for the {list(pred_class_proba.keys())[0]} cla
|
|
| 97 |
print(f'The predicted probability for the {list(pred_class_proba.keys())[1]} class: {list(pred_class_proba.values())[1]}')
|
| 98 |
>>> The predicted probability for the not-letter class: 0.002029061783105135
|
| 99 |
>>> The predicted probability for the letter class: 0.9979709386825562
|
| 100 |
-
|
|
|
|
| 101 |
pred_class = idx2label[np.argmax(list(pred_class_proba.values()))]
|
| 102 |
print(f'The predicted class is: {pred_class}')
|
| 103 |
>>> The predicted class is: letter
|
| 104 |
-
```
|
|
|
|
|
|
| 12 |
license: cc-by-nc-sa-4.0
|
| 13 |
---
|
| 14 |
|
| 15 |
+
# <font color="IndianRed"> BertForSequenceClassification model (Classical Chinese) </font>
|
| 16 |
[](https://colab.research.google.com/drive/1jVu2LrNwkLolItPALKGNjeT6iCfzF8Ic?usp=sharing/)
|
| 17 |
|
| 18 |
+
This BertForSequenceClassification Classical Chinese model is intended to predict whether a Classical Chinese sentence is <font color="IndianRed"> a letter title (书信标题) </font> or not. This model is first inherited from the BERT base Chinese model (MLM), and finetuned using a large corpus of Classical Chinese language (3GB textual dataset), then concatenated with the BertForSequenceClassification architecture to perform a binary classification task.
|
| 19 |
+
* <font color="Salmon"> Labels: 0 = non-letter, 1 = letter </font>
|
| 20 |
|
| 21 |
+
## <font color="IndianRed"> Model description </font>
|
| 22 |
|
| 23 |
The BertForSequenceClassification model architecture inherits the BERT base model and concatenates a fully-connected linear layer to perform a binary-class classification task.More precisely, it
|
| 24 |
was pretrained with two objectives:
|
|
|
|
| 27 |
|
| 28 |
- Sequence classification: the model concatenates a fully-connected linear layer to output the probability of each class. In our binary classification task, the final linear layer has two classes.
|
| 29 |
|
| 30 |
+
## <font color="IndianRed"> Intended uses & limitations </font>
|
| 31 |
|
| 32 |
Note that this model is primiarly aimed at predicting whether a Classical Chinese sentence is a letter title (书信标题) or not.
|
| 33 |
|
| 34 |
+
### <font color="IndianRed"> How to use </font>
|
| 35 |
|
| 36 |
Note that this model is primiarly aimed at predicting whether a Classical Chinese sentence is a letter title (书信标题) or not.
|
| 37 |
|
| 38 |
Here is how to use this model to get the features of a given text in PyTorch:
|
| 39 |
|
| 40 |
+
<font color="cornflowerblue"> 1. Import model and packages </font>
|
| 41 |
```python
|
| 42 |
from transformers import BertTokenizer
|
| 43 |
from transformers import BertForSequenceClassification
|
|
|
|
| 51 |
output_hidden_states=False)
|
| 52 |
```
|
| 53 |
|
| 54 |
+
<font color="cornflowerblue"> 2. Make a prediction </font>
|
| 55 |
```python
|
| 56 |
max_seq_len = 512
|
| 57 |
|
|
|
|
| 86 |
idx2label = {v:k for k,v in label2idx.items()}
|
| 87 |
```
|
| 88 |
|
| 89 |
+
<font color="cornflowerblue"> 3. Change your sentence here </font>
|
| 90 |
```python
|
| 91 |
label2idx = {'not-letter': 0,'letter': 1}
|
| 92 |
idx2label = {v:k for k,v in label2idx.items()}
|
|
|
|
| 97 |
print(f'The predicted probability for the {list(pred_class_proba.keys())[1]} class: {list(pred_class_proba.values())[1]}')
|
| 98 |
>>> The predicted probability for the not-letter class: 0.002029061783105135
|
| 99 |
>>> The predicted probability for the letter class: 0.9979709386825562
|
| 100 |
+
```
|
| 101 |
+
```python
|
| 102 |
pred_class = idx2label[np.argmax(list(pred_class_proba.values()))]
|
| 103 |
print(f'The predicted class is: {pred_class}')
|
| 104 |
>>> The predicted class is: letter
|
| 105 |
+
```
|
| 106 |
+
|