Cal-BERT-SL1000 / README.md
rasmus-aau's picture
Update README.md
df6c204 verified
---
language:
- en
tags:
- financial NLP
- named entity recognition
- sequence labeling
- structured extraction
- hierarchical taxonomy
- XBRL
- iXBRL
- SEC filings
- financial-information-extraction
datasets:
- AAU-NLP/HiFi-KPI
model_name: "Cal-BERT-SL1000"
library_name: "transformers"
pipeline_tag: "token-classification"
base_model: "bert-base-uncased"
task_categories:
- token-classification
task_ids:
- named-entity-recognition
- financial-information-extraction
pretty_name: "Cal-BERT-SL1000: Sequence Labeling for Calculation Taxonomy KPI Extraction"
size_categories: "1M<n<10M"
languages:
- en
dataset_name: "HiFi-KPI"
model_description: |
Cal-BERT-SL1000 is a **BERT-based sequence labeling model** fine-tuned on the **HiFi-KPI dataset** for extracting
**financial key performance indicators (KPIs)** from **SEC earnings filings (10-K & 10-Q)**. It specializes in identifying
entities that are one level up the calculation taxonomy, such as revenueAbstract, earnings, and financial ratios, using **token classification**.
This model is trained specifically on n=1 with the **calculation taxonomy labels** from **HiFi-KPI**, focusing on structured extraction.
dataset_link: "https://huggingface.co/datasets/AAU-NLP/HiFi-KPI"
repo_link: "https://github.com/rasmus393/HiFi-KPI"
---
## **Cal-BERT-SL1000**
### **Model Description**
Cal-BERT-SL1000 is a **BERT-based sequence labeling model** fine-tuned on the **[HiFi-KPI dataset](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)** for extracting **financial key performance indicators (KPIs)** from **SEC earnings filings (10-K & 10-Q)**. It specializes in identifying entities, such as revenue, earnings, etc.
This model is trained on the [HiFi-KPI dataset](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI) and is focused on the calculation layer taxonomy with n=1
### **Use Cases**
- Extracting **financial KPIs** using **iXBRL calculation taxonomy**
- **Financial document parsing** with entity recognition
### **Performance**
- Trained on **1,000 most frequent labels** from the **[HiFi-KPI dataset](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)** with n=1 in the calculation taxonomy
### **Dataset & Code**
- **Dataset**: [HiFi-KPI on Hugging Face](https://huggingface.co/datasets/AAU-NLP/HiFi-KPI)
- **Code example**: [HiFi-KPI GitHub Repository](https://github.com/rasmus393/HiFi-KPI)