|
--- |
|
library_name: span-marker |
|
tags: |
|
- span-marker |
|
- token-classification |
|
- ner |
|
- named-entity-recognition |
|
- generated_from_span_marker_trainer |
|
datasets: |
|
- DFKI-SLT/few-nerd |
|
metrics: |
|
- precision |
|
- recall |
|
- f1 |
|
widget: |
|
- text: In response, in May or June 1125, a 3,000-strong Crusader coalition commanded |
|
by King Baldwin II of Jerusalem confronted and defeated the 15,000-strong Muslim |
|
coalition at the Battle of Azaz, raising the siege of the town. |
|
- text: Cardenal made several visits to Jesuit universities in the United States, |
|
including the University of Detroit Mercy in 2013, and the John Carroll University |
|
in 2014. |
|
- text: Other super-spreaders, defined as those that transmit SARS to at least eight |
|
other people, included the incidents at the Hotel Metropole in Hong Kong, the |
|
Amoy Gardens apartment complex in Hong Kong and one in an acute care hospital |
|
in Toronto, Ontario, Canada. |
|
- text: The District Court for the Northern District of California rejected 321 Studios' |
|
claims for declaratory relief, holding that both DVD Copy Plus and DVD-X Copy |
|
violated the DMCA and that the DMCA was not unconstitutional. |
|
- text: The Sunday Edition is a television programme broadcast on the ITV Network |
|
in the United Kingdom focusing on political interview and discussion, produced |
|
by ITV Productions. |
|
pipeline_tag: token-classification |
|
model-index: |
|
- name: SpanMarker |
|
results: |
|
- task: |
|
type: token-classification |
|
name: Named Entity Recognition |
|
dataset: |
|
name: Unknown |
|
type: DFKI-SLT/few-nerd |
|
split: test |
|
metrics: |
|
- type: f1 |
|
value: 0.703084859534267 |
|
name: F1 |
|
- type: precision |
|
value: 0.7034273336857051 |
|
name: Precision |
|
- type: recall |
|
value: 0.7027427186979075 |
|
name: Recall |
|
--- |
|
|
|
# SpanMarker |
|
|
|
This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained on the [DFKI-SLT/few-nerd](https://huggingface.co/datasets/DFKI-SLT/few-nerd) dataset that can be used for Named Entity Recognition. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
- **Model Type:** SpanMarker |
|
<!-- - **Encoder:** [Unknown](https://huggingface.co/unknown) --> |
|
- **Maximum Sequence Length:** 256 tokens |
|
- **Maximum Entity Length:** 8 words |
|
- **Training Dataset:** [DFKI-SLT/few-nerd](https://huggingface.co/datasets/DFKI-SLT/few-nerd) |
|
<!-- - **Language:** Unknown --> |
|
<!-- - **License:** Unknown --> |
|
|
|
### Model Sources |
|
|
|
- **Repository:** [SpanMarker on GitHub](https://github.com/tomaarsen/SpanMarkerNER) |
|
- **Thesis:** [SpanMarker For Named Entity Recognition](https://raw.githubusercontent.com/tomaarsen/SpanMarkerNER/main/thesis.pdf) |
|
|
|
### Model Labels |
|
| Label | Examples | |
|
|:-----------------------------------------|:---------------------------------------------------------------------------------------------------------| |
|
| art-broadcastprogram | "Street Cents", "Corazones", "The Gale Storm Show : Oh , Susanna" | |
|
| art-film | "L'Atlantide", "Shawshank Redemption", "Bosch" | |
|
| art-music | "Champion Lover", "Atkinson , Danko and Ford ( with Brockie and Hilton )", "Hollywood Studio Symphony" | |
|
| art-other | "Aphrodite of Milos", "The Today Show", "Venus de Milo" | |
|
| art-painting | "Production/Reproduction", "Cofiwch Dryweryn", "Touit" | |
|
| art-writtenart | "Time", "Imelda de ' Lambertazzi", "The Seven Year Itch" | |
|
| building-airport | "Sheremetyevo International Airport", "Luton Airport", "Newark Liberty International Airport" | |
|
| building-hospital | "Yeungnam University Hospital", "Memorial Sloan-Kettering Cancer Center", "Hokkaido University Hospital" | |
|
| building-hotel | "Radisson Blu Sea Plaza Hotel", "Flamingo Hotel", "The Standard Hotel" | |
|
| building-library | "British Library", "Berlin State Library", "Bayerische Staatsbibliothek" | |
|
| building-other | "Communiplex", "Henry Ford Museum", "Alpha Recording Studios" | |
|
| building-restaurant | "Carnegie Deli", "Trumbull", "Fatburger" | |
|
| building-sportsfacility | "Sports Center", "Boston Garden", "Glenn Warner Soccer Facility" | |
|
| building-theater | "Sanders Theatre", "Pittsburgh Civic Light Opera", "National Paris Opera" | |
|
| event-attack/battle/war/militaryconflict | "Vietnam War", "Jurist", "Easter Offensive" | |
|
| event-disaster | "1990s North Korean famine", "the 1912 North Mount Lyell Disaster", "1693 Sicily earthquake" | |
|
| event-election | "1982 Mitcham and Morden by-election", "Elections to the European Parliament", "March 1898 elections" | |
|
| event-other | "Eastwood Scoring Stage", "Union for a Popular Movement", "Masaryk Democratic Movement" | |
|
| event-protest | "French Revolution", "Iranian Constitutional Revolution", "Russian Revolution" | |
|
| event-sportsevent | "World Cup", "National Champions", "Stanley Cup" | |
|
| location-GPE | "Mediterranean Basin", "the Republic of Croatia", "Croatian" | |
|
| location-bodiesofwater | "Arthur Kill", "Atatürk Dam Lake", "Norfolk coast" | |
|
| location-island | "Staten Island", "new Samsat district", "Laccadives" | |
|
| location-mountain | "Miteirya Ridge", "Ruweisat Ridge", "Salamander Glacier" | |
|
| location-other | "Northern City Line", "Victoria line", "Cartuther" | |
|
| location-park | "Painted Desert Community Complex Historic District", "Gramercy Park", "Shenandoah National Park" | |
|
| location-road/railway/highway/transit | "NJT", "Newark-Elizabeth Rail Link", "Friern Barnet Road" | |
|
| organization-company | "Church 's Chicken", "Texas Chicken", "Dixy Chicken" | |
|
| organization-education | "Barnard College", "MIT", "Belfast Royal Academy and the Ulster College of Physical Education" | |
|
| organization-government/governmentagency | "Diet", "Supreme Court", "Congregazione dei Nobili" | |
|
| organization-media/newspaper | "Al Jazeera", "Clash", "TimeOut Melbourne" | |
|
| organization-other | "Defence Sector C", "4th Army", "IAEA" | |
|
| organization-politicalparty | "Al Wafa ' Islamic", "Shimpotō", "Kenseitō" | |
|
| organization-religion | "Jewish", "UPCUSA", "Christian" | |
|
| organization-showorganization | "Mr. Mister", "Lizzy", "Bochumer Symphoniker" | |
|
| organization-sportsleague | "NHL", "First Division", "China League One" | |
|
| organization-sportsteam | "Arsenal", "Luc Alphand Aventures", "Tottenham" | |
|
| other-astronomything | "Algol", "Zodiac", "`` Caput Larvae ''" | |
|
| other-award | "Order of the Republic of Guinea and Nigeria", "GCON", "Grand Commander of the Order of the Niger" | |
|
| other-biologything | "Amphiphysin", "BAR", "N-terminal lipid" | |
|
| other-chemicalthing | "sulfur", "uranium", "carbon dioxide" | |
|
| other-currency | "$", "Travancore Rupee", "lac crore" | |
|
| other-disease | "hypothyroidism", "bladder cancer", "French Dysentery Epidemic of 1779" | |
|
| other-educationaldegree | "BSc ( Hons ) in physics", "Master", "Bachelor" | |
|
| other-god | "El", "Raijin", "Fujin" | |
|
| other-language | "Latin", "English", "Breton-speaking" | |
|
| other-law | "United States Freedom Support Act", "Thirty Years ' Peace", "Leahy–Smith America Invents Act ( AIA" | |
|
| other-livingthing | "insects", "monkeys", "patchouli" | |
|
| other-medical | "pediatrician", "Pediatrics", "amitriptyline" | |
|
| person-actor | "Edmund Payne", "Tchéky Karyo", "Ellaline Terriss" | |
|
| person-artist/author | "Gaetano Donizett", "George Axelrod", "Hicks" | |
|
| person-athlete | "Tozawa", "Jaguar", "Neville" | |
|
| person-director | "Bob Swaim", "Frank Darabont", "Richard Quine" | |
|
| person-other | "Holden", "Richard Benson", "Campbell" | |
|
| person-politician | "Rivière", "Emeric", "William" | |
|
| person-scholar | "Stalmine", "Wurdack", "Stedman" | |
|
| person-soldier | "Krukenberg", "Joachim Ziegler", "Helmuth Weidling" | |
|
| product-airplane | "EC135T2 CPDS", "Spey-equipped FGR.2s", "Luton" | |
|
| product-car | "100EX", "Corvettes - GT1 C6R", "Phantom" | |
|
| product-food | "yakiniku", "V. labrusca", "red grape" | |
|
| product-game | "Airforce Delta", "Splinter Cell", "Hardcore RPG" | |
|
| product-other | "X11", "Fairbottom Bobs", "PDP-1" | |
|
| product-ship | "Essex", "HMS `` Chinkara ''", "Congress" | |
|
| product-software | "Wikipedia", "Apdf", "AmiPDF" | |
|
| product-train | "High Speed Trains", "Royal Scots Grey", "55022" | |
|
| product-weapon | "ZU-23-2M Wróbel", "AR-15 's", "ZU-23-2MR Wróbel II" | |
|
|
|
## Evaluation |
|
|
|
### Metrics |
|
| Label | Precision | Recall | F1 | |
|
|:-----------------------------------------|:----------|:-------|:-------| |
|
| **all** | 0.7034 | 0.7027 | 0.7031 | |
|
| art-broadcastprogram | 0.6024 | 0.5904 | 0.5963 | |
|
| art-film | 0.7761 | 0.7533 | 0.7645 | |
|
| art-music | 0.7825 | 0.7551 | 0.7685 | |
|
| art-other | 0.4193 | 0.3327 | 0.3710 | |
|
| art-painting | 0.5882 | 0.5263 | 0.5556 | |
|
| art-writtenart | 0.6819 | 0.6488 | 0.6649 | |
|
| building-airport | 0.8064 | 0.8352 | 0.8205 | |
|
| building-hospital | 0.7282 | 0.8022 | 0.7634 | |
|
| building-hotel | 0.7033 | 0.7245 | 0.7138 | |
|
| building-library | 0.7550 | 0.7380 | 0.7464 | |
|
| building-other | 0.5867 | 0.5840 | 0.5853 | |
|
| building-restaurant | 0.6205 | 0.5216 | 0.5667 | |
|
| building-sportsfacility | 0.6113 | 0.7976 | 0.6921 | |
|
| building-theater | 0.7060 | 0.7495 | 0.7271 | |
|
| event-attack/battle/war/militaryconflict | 0.7945 | 0.7395 | 0.7660 | |
|
| event-disaster | 0.5604 | 0.5604 | 0.5604 | |
|
| event-election | 0.4286 | 0.1484 | 0.2204 | |
|
| event-other | 0.4885 | 0.4400 | 0.4629 | |
|
| event-protest | 0.3798 | 0.4759 | 0.4225 | |
|
| event-sportsevent | 0.6198 | 0.6162 | 0.6180 | |
|
| location-GPE | 0.8157 | 0.8552 | 0.8350 | |
|
| location-bodiesofwater | 0.7268 | 0.7690 | 0.7473 | |
|
| location-island | 0.7504 | 0.6842 | 0.7158 | |
|
| location-mountain | 0.7352 | 0.7298 | 0.7325 | |
|
| location-other | 0.4427 | 0.3104 | 0.3649 | |
|
| location-park | 0.7153 | 0.6856 | 0.7001 | |
|
| location-road/railway/highway/transit | 0.7090 | 0.7324 | 0.7205 | |
|
| organization-company | 0.6963 | 0.7061 | 0.7012 | |
|
| organization-education | 0.7994 | 0.7986 | 0.7990 | |
|
| organization-government/governmentagency | 0.5524 | 0.4533 | 0.4980 | |
|
| organization-media/newspaper | 0.6513 | 0.6656 | 0.6584 | |
|
| organization-other | 0.5978 | 0.5375 | 0.5661 | |
|
| organization-politicalparty | 0.6793 | 0.7315 | 0.7044 | |
|
| organization-religion | 0.5575 | 0.6131 | 0.5840 | |
|
| organization-showorganization | 0.6035 | 0.5839 | 0.5935 | |
|
| organization-sportsleague | 0.6393 | 0.6610 | 0.6499 | |
|
| organization-sportsteam | 0.7259 | 0.7796 | 0.7518 | |
|
| other-astronomything | 0.7794 | 0.8024 | 0.7907 | |
|
| other-award | 0.7180 | 0.6649 | 0.6904 | |
|
| other-biologything | 0.6864 | 0.6238 | 0.6536 | |
|
| other-chemicalthing | 0.5688 | 0.6036 | 0.5856 | |
|
| other-currency | 0.6996 | 0.8423 | 0.7643 | |
|
| other-disease | 0.6591 | 0.7410 | 0.6977 | |
|
| other-educationaldegree | 0.6114 | 0.6198 | 0.6156 | |
|
| other-god | 0.6486 | 0.7181 | 0.6816 | |
|
| other-language | 0.6507 | 0.8313 | 0.7300 | |
|
| other-law | 0.6934 | 0.7331 | 0.7127 | |
|
| other-livingthing | 0.6019 | 0.6605 | 0.6298 | |
|
| other-medical | 0.5124 | 0.5214 | 0.5169 | |
|
| person-actor | 0.8384 | 0.8051 | 0.8214 | |
|
| person-artist/author | 0.7122 | 0.7531 | 0.7321 | |
|
| person-athlete | 0.8318 | 0.8422 | 0.8370 | |
|
| person-director | 0.7083 | 0.7365 | 0.7221 | |
|
| person-other | 0.6833 | 0.6737 | 0.6785 | |
|
| person-politician | 0.6807 | 0.6836 | 0.6822 | |
|
| person-scholar | 0.5397 | 0.5209 | 0.5301 | |
|
| person-soldier | 0.5053 | 0.5920 | 0.5452 | |
|
| product-airplane | 0.6617 | 0.6692 | 0.6654 | |
|
| product-car | 0.7313 | 0.7132 | 0.7222 | |
|
| product-food | 0.5787 | 0.5787 | 0.5787 | |
|
| product-game | 0.7364 | 0.7140 | 0.7250 | |
|
| product-other | 0.5567 | 0.4210 | 0.4795 | |
|
| product-ship | 0.6842 | 0.6842 | 0.6842 | |
|
| product-software | 0.6495 | 0.6648 | 0.6570 | |
|
| product-train | 0.5942 | 0.5924 | 0.5933 | |
|
| product-weapon | 0.6435 | 0.5353 | 0.5844 | |
|
|
|
## Uses |
|
|
|
### Direct Use for Inference |
|
|
|
```python |
|
from span_marker import SpanMarkerModel |
|
|
|
# Download from the 🤗 Hub |
|
model = SpanMarkerModel.from_pretrained("supreethrao/instructNER_fewnerd_xl") |
|
# Run inference |
|
entities = model.predict("The Sunday Edition is a television programme broadcast on the ITV Network in the United Kingdom focusing on political interview and discussion, produced by ITV Productions.") |
|
``` |
|
|
|
### Downstream Use |
|
You can finetune this model on your own dataset. |
|
|
|
<details><summary>Click to expand</summary> |
|
|
|
```python |
|
from span_marker import SpanMarkerModel, Trainer |
|
|
|
# Download from the 🤗 Hub |
|
model = SpanMarkerModel.from_pretrained("supreethrao/instructNER_fewnerd_xl") |
|
|
|
# Specify a Dataset with "tokens" and "ner_tag" columns |
|
dataset = load_dataset("conll2003") # For example CoNLL2003 |
|
|
|
# Initialize a Trainer using the pretrained model & dataset |
|
trainer = Trainer( |
|
model=model, |
|
train_dataset=dataset["train"], |
|
eval_dataset=dataset["validation"], |
|
) |
|
trainer.train() |
|
trainer.save_model("supreethrao/instructNER_fewnerd_xl-finetuned") |
|
``` |
|
</details> |
|
|
|
<!-- |
|
### Out-of-Scope Use |
|
|
|
*List how the model may foreseeably be misused and address what users ought not to do with the model.* |
|
--> |
|
|
|
<!-- |
|
## Bias, Risks and Limitations |
|
|
|
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* |
|
--> |
|
|
|
<!-- |
|
### Recommendations |
|
|
|
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* |
|
--> |
|
|
|
## Training Details |
|
|
|
### Training Set Metrics |
|
| Training set | Min | Median | Max | |
|
|:----------------------|:----|:--------|:----| |
|
| Sentence length | 1 | 24.4945 | 267 | |
|
| Entities per sentence | 0 | 2.5832 | 88 | |
|
|
|
### Training Hyperparameters |
|
- learning_rate: 5e-05 |
|
- train_batch_size: 16 |
|
- eval_batch_size: 16 |
|
- seed: 42 |
|
- distributed_type: multi-GPU |
|
- num_devices: 2 |
|
- total_train_batch_size: 32 |
|
- total_eval_batch_size: 32 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- lr_scheduler_warmup_ratio: 0.1 |
|
- num_epochs: 3 |
|
- mixed_precision_training: Native AMP |
|
|
|
### Framework Versions |
|
- Python: 3.10.13 |
|
- SpanMarker: 1.5.0 |
|
- Transformers: 4.35.2 |
|
- PyTorch: 2.1.1 |
|
- Datasets: 2.15.0 |
|
- Tokenizers: 0.15.0 |
|
|
|
## Citation |
|
|
|
### BibTeX |
|
``` |
|
@software{Aarsen_SpanMarker, |
|
author = {Aarsen, Tom}, |
|
license = {Apache-2.0}, |
|
title = {{SpanMarker for Named Entity Recognition}}, |
|
url = {https://github.com/tomaarsen/SpanMarkerNER} |
|
} |
|
``` |
|
|
|
<!-- |
|
## Glossary |
|
|
|
*Clearly define terms in order to be accessible across audiences.* |
|
--> |
|
|
|
<!-- |
|
## Model Card Authors |
|
|
|
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* |
|
--> |
|
|
|
<!-- |
|
## Model Card Contact |
|
|
|
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* |
|
--> |