Model description

This model is DistilBERT with some custom layers, finetuned on classifying financial documents. The labels that the model was trained on:

  • Esg/Sustainability report
  • Annual Report
  • Quarterly Report
  • Financial Report
  • Other Document

The output are probabilities for each class. The ids of the output should be interpreted as follows:

  • 0 -> ESG/Sustainability Report
  • 1 -> Annual Report
  • 2 -> Other Document
  • 3 -> Quarterly Report
  • 4 -> Financial Report

Example use

Download model:

from huggingface_hub import from_pretrained_keras
from transformers import DistilBertTokenizer

model_name = "esg-x/distilbert-esg-documents-classifier"
tokenizer = DistilBertTokenizer.from_pretrained(model_name)
model = from_pretrained_keras(model_name)
model.compile()

Get model output:

input_text = "Your input text"
input = tokenizer(input_text,
                  return_tensors = "tf",
                  padding = "max_length",
                  max_length = 512)

output = model(input["input_ids"])

Convert output to a readable label:

import numpy as np

labels ={
    0: "ESG/Sustainability Report",
    1: "Annual Report",
    2: "Other Document",
    3: "Quarterly Report",
    4: "Financial Document"
}

def get_label(probabilities):
  return labels[np.argmax(probabilities)]

get_label(output)

Limitations

The max context size of the model is 512 tokens.

Downloads last month
19
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.