esg-x
/

distilbert-esg-documents-classifier

Model card Files Files and versions Community

distilbert-esg-documents-classifier / README.md

Artel255's picture

Update README.md

d551ca5 verified 9 months ago

|

history blame contribute delete

1.49 kB

	---
	library_name: tf-keras
	license: mit
	language:
	- en
	---

	## Model description
	This model is DistilBERT with some custom layers, finetuned on classifying financial documents.
	The labels that the model was trained on:
	- Esg/Sustainability report
	- Annual Report
	- Quarterly Report
	- Financial Report
	- Other Document

	The output are probabilities for each class. The ids of the output should be interpreted as follows:
	- 0 -> ESG/Sustainability Report
	- 1 -> Annual Report
	- 2 -> Other Document
	- 3 -> Quarterly Report
	- 4 -> Financial Report

	## Example use

	Download model:
	``` python
	from huggingface_hub import from_pretrained_keras
	from transformers import DistilBertTokenizer

	model_name = "esg-x/distilbert-esg-documents-classifier"
	tokenizer = DistilBertTokenizer.from_pretrained(model_name)
	model = from_pretrained_keras(model_name)
	model.compile()
	```

	Get model output:
	``` python
	input_text = "Your input text"
	input = tokenizer(input_text,
	return_tensors = "tf",
	padding = "max_length",
	max_length = 512)

	output = model(input["input_ids"])
	```

	Convert output to a readable label:
	``` python
	import numpy as np

	labels ={
	0: "ESG/Sustainability Report",
	1: "Annual Report",
	2: "Other Document",
	3: "Quarterly Report",
	4: "Financial Document"
	}

	def get_label(probabilities):
	return labels[np.argmax(probabilities)]

	get_label(output)
	```

	## Limitations
	The max context size of the model is 512 tokens.