File size: 1,494 Bytes
759b4e2 d551ca5 31ae20a 759b4e2 31ae20a 759b4e2 31ae20a 759b4e2 31ae20a 759b4e2 31ae20a 759b4e2 31ae20a 759b4e2 31ae20a 759b4e2 31ae20a 759b4e2 31ae20a 759b4e2 31ae20a 759b4e2 31ae20a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
---
library_name: tf-keras
license: mit
language:
- en
---
## Model description
This model is DistilBERT with some custom layers, finetuned on classifying financial documents.
The labels that the model was trained on:
- Esg/Sustainability report
- Annual Report
- Quarterly Report
- Financial Report
- Other Document
The output are probabilities for each class. The ids of the output should be interpreted as follows:
- 0 -> ESG/Sustainability Report
- 1 -> Annual Report
- 2 -> Other Document
- 3 -> Quarterly Report
- 4 -> Financial Report
## Example use
Download model:
``` python
from huggingface_hub import from_pretrained_keras
from transformers import DistilBertTokenizer
model_name = "esg-x/distilbert-esg-documents-classifier"
tokenizer = DistilBertTokenizer.from_pretrained(model_name)
model = from_pretrained_keras(model_name)
model.compile()
```
Get model output:
``` python
input_text = "Your input text"
input = tokenizer(input_text,
return_tensors = "tf",
padding = "max_length",
max_length = 512)
output = model(input["input_ids"])
```
Convert output to a readable label:
``` python
import numpy as np
labels ={
0: "ESG/Sustainability Report",
1: "Annual Report",
2: "Other Document",
3: "Quarterly Report",
4: "Financial Document"
}
def get_label(probabilities):
return labels[np.argmax(probabilities)]
get_label(output)
```
## Limitations
The max context size of the model is 512 tokens. |