metadata
language:
- en
tags:
- financial NLP
- named entity recognition
- sequence labeling
- structured extraction
- hierarchical taxonomy
- XBRL
- iXBRL
- SEC filings
- financial-information-extraction
datasets:
- AAU-NLP/HiFi-KPI
model_name: Cal-BERT-SL1000
library_name: transformers
pipeline_tag: token-classification
base_model: bert-base-uncased
task_categories:
- token-classification
task_ids:
- named-entity-recognition
- financial-information-extraction
pretty_name: 'Cal-BERT-SL1000: Sequence Labeling for Calculation Taxonomy KPI Extraction'
size_categories: 1M<n<10M
languages:
- en
dataset_name: HiFi-KPI
model_description: >
Cal-BERT-SL1000 is a **BERT-based sequence labeling model** fine-tuned on the
**HiFi-KPI dataset** for extracting
**financial key performance indicators (KPIs)** from **SEC earnings filings
(10-K & 10-Q)**. It specializes in identifying
entities that are one level up the calculation taxonomy, such as
revenueAbstract, earnings, and financial ratios, using **token
classification**.
This model is trained specifically on n=1 with the **calculation taxonomy
labels** from **HiFi-KPI**, focusing on structured extraction.
dataset_link: https://huggingface.co/datasets/AAU-NLP/HiFi-KPI
repo_link: https://github.com/rasmus393/HiFi-KPI
Cal-BERT-SL1000
Model Description
Cal-BERT-SL1000 is a BERT-based sequence labeling model fine-tuned on the HiFi-KPI dataset for extracting financial key performance indicators (KPIs) from SEC earnings filings (10-K & 10-Q). It specializes in identifying entities, such as revenue, earnings, etc. This model is trained on the HiFi-KPI dataset and is focused on the calculation layer taxonomy with n=1
Use Cases
- Extracting financial KPIs using iXBRL calculation taxonomy
- Financial document parsing with entity recognition
Performance
- Trained on 1,000 most frequent labels from the HiFi-KPI dataset with n=1 in the calculation taxonomy
Dataset & Code
- Dataset: HiFi-KPI on Hugging Face
- Code example: HiFi-KPI GitHub Repository