File size: 1,494 Bytes
759b4e2
d551ca5
31ae20a
 
 
759b4e2
 
 
31ae20a
 
 
 
 
 
 
759b4e2
31ae20a
 
 
 
 
 
759b4e2
31ae20a
759b4e2
31ae20a
 
 
 
759b4e2
31ae20a
 
 
 
 
759b4e2
31ae20a
 
 
 
 
 
 
759b4e2
31ae20a
 
759b4e2
31ae20a
 
 
759b4e2
31ae20a
 
 
 
 
 
 
759b4e2
31ae20a
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
library_name: tf-keras
license: mit
language:
- en
---

## Model description
This model is DistilBERT with some custom layers, finetuned on classifying financial documents.
The labels that the model was trained on:
- Esg/Sustainability report
- Annual Report
- Quarterly Report
- Financial Report
- Other Document

The output are probabilities for each class. The ids of the output should be interpreted as follows:
- 0 -> ESG/Sustainability Report
- 1 -> Annual Report
- 2 -> Other Document
- 3 -> Quarterly Report
- 4 -> Financial Report

## Example use

Download model: 
``` python
from huggingface_hub import from_pretrained_keras
from transformers import DistilBertTokenizer

model_name = "esg-x/distilbert-esg-documents-classifier"
tokenizer = DistilBertTokenizer.from_pretrained(model_name)
model = from_pretrained_keras(model_name)
model.compile()
```

Get model output:
``` python
input_text = "Your input text"
input = tokenizer(input_text,
                  return_tensors = "tf",
                  padding = "max_length",
                  max_length = 512)

output = model(input["input_ids"])
```

Convert output to a readable label:
``` python
import numpy as np

labels ={
    0: "ESG/Sustainability Report",
    1: "Annual Report",
    2: "Other Document",
    3: "Quarterly Report",
    4: "Financial Document"
}

def get_label(probabilities):
  return labels[np.argmax(probabilities)]

get_label(output)
```

## Limitations
The max context size of the model is 512 tokens.