Cal-BERT-SL1000 / README.md
rasmus-aau's picture
Update README.md
df6c204 verified
metadata
language:
  - en
tags:
  - financial NLP
  - named entity recognition
  - sequence labeling
  - structured extraction
  - hierarchical taxonomy
  - XBRL
  - iXBRL
  - SEC filings
  - financial-information-extraction
datasets:
  - AAU-NLP/HiFi-KPI
model_name: Cal-BERT-SL1000
library_name: transformers
pipeline_tag: token-classification
base_model: bert-base-uncased
task_categories:
  - token-classification
task_ids:
  - named-entity-recognition
  - financial-information-extraction
pretty_name: 'Cal-BERT-SL1000: Sequence Labeling for Calculation Taxonomy KPI Extraction'
size_categories: 1M<n<10M
languages:
  - en
dataset_name: HiFi-KPI
model_description: >
  Cal-BERT-SL1000 is a **BERT-based sequence labeling model** fine-tuned on the
  **HiFi-KPI dataset** for extracting 

  **financial key performance indicators (KPIs)** from **SEC earnings filings
  (10-K & 10-Q)**. It specializes in identifying 

  entities that are one level up the calculation taxonomy, such as
  revenueAbstract, earnings, and financial ratios, using **token
  classification**. 


  This model is trained specifically on n=1 with the **calculation taxonomy
  labels** from **HiFi-KPI**, focusing on structured extraction.
dataset_link: https://huggingface.co/datasets/AAU-NLP/HiFi-KPI
repo_link: https://github.com/rasmus393/HiFi-KPI

Cal-BERT-SL1000

Model Description

Cal-BERT-SL1000 is a BERT-based sequence labeling model fine-tuned on the HiFi-KPI dataset for extracting financial key performance indicators (KPIs) from SEC earnings filings (10-K & 10-Q). It specializes in identifying entities, such as revenue, earnings, etc. This model is trained on the HiFi-KPI dataset and is focused on the calculation layer taxonomy with n=1

Use Cases

  • Extracting financial KPIs using iXBRL calculation taxonomy
  • Financial document parsing with entity recognition

Performance

  • Trained on 1,000 most frequent labels from the HiFi-KPI dataset with n=1 in the calculation taxonomy

Dataset & Code