|
--- |
|
library_name: tf-keras |
|
license: mit |
|
language: |
|
- en |
|
--- |
|
|
|
## Model description |
|
This model is DistilBERT with some custom layers, finetuned on classifying financial documents. |
|
The labels that the model was trained on: |
|
- Esg/Sustainability report |
|
- Annual Report |
|
- Quarterly Report |
|
- Financial Report |
|
- Other Document |
|
|
|
The output are probabilities for each class. The ids of the output should be interpreted as follows: |
|
- 0 -> ESG/Sustainability Report |
|
- 1 -> Annual Report |
|
- 2 -> Other Document |
|
- 3 -> Quarterly Report |
|
- 4 -> Financial Report |
|
|
|
## Example use |
|
|
|
Download model: |
|
``` python |
|
from huggingface_hub import from_pretrained_keras |
|
from transformers import DistilBertTokenizer |
|
|
|
model_name = "esg-x/distilbert-esg-documents-classifier" |
|
tokenizer = DistilBertTokenizer.from_pretrained(model_name) |
|
model = from_pretrained_keras(model_name) |
|
model.compile() |
|
``` |
|
|
|
Get model output: |
|
``` python |
|
input_text = "Your input text" |
|
input = tokenizer(input_text, |
|
return_tensors = "tf", |
|
padding = "max_length", |
|
max_length = 512) |
|
|
|
output = model(input["input_ids"]) |
|
``` |
|
|
|
Convert output to a readable label: |
|
``` python |
|
import numpy as np |
|
|
|
labels ={ |
|
0: "ESG/Sustainability Report", |
|
1: "Annual Report", |
|
2: "Other Document", |
|
3: "Quarterly Report", |
|
4: "Financial Document" |
|
} |
|
|
|
def get_label(probabilities): |
|
return labels[np.argmax(probabilities)] |
|
|
|
get_label(output) |
|
``` |
|
|
|
## Limitations |
|
The max context size of the model is 512 tokens. |