--- license: mit base_model: camembert-base metrics: - precision - recall - f1 - accuracy model-index: - name: Camembert-base-frenchNER_4entities results: [] datasets: - CATIE-AQ/frenchNER_4entities language: - fr widget: - text: "Assurés de disputer l'Euro 2024 en Allemagne l'été prochain (du 14 juin au 14 juillet) depuis leur victoire aux Pays-Bas, les Bleus ont fait le nécessaire pour avoir des certitudes. Avec six victoires en six matchs officiels et un seul but encaissé, Didier Deschamps a consolidé les acquis de la dernière Coupe du monde. Les joueurs clés sont connus : Kylian Mbappé, Aurélien Tchouameni, Antoine Griezmann, Ibrahima Konaté ou encore Mike Maignan." inference: parameters: aggregation_strategy: "max" library_name: transformers pipeline_tag: token-classification co2_eq_emissions: 20 --- # Camembert-base-frenchNER_4entities ## Model Description We present **Camembert-base-frenchNER_4entities**, which is a [CamemBERT base](https://huggingface.co/camembert-base) fine-tuned for the Name Entity Recognition task for the French language on four French NER datasets for 4 entities (LOC, PER, ORG, MISC). All these datasets were concatenated and cleaned into a single dataset that we called [frenchNER_4entities](https://huggingface.co/datasets/CATIE-AQ/frenchNER_4entities). There are a total of **384,773** rows, of which **328,757** are for training, **24,131** for validation and **31,885** for testing. Our methodology is described in a blog post available in [English](https://blog.vaniila.ai/en/NER_en/) or [French](https://blog.vaniila.ai/NER/). ## Dataset The dataset used is [frenchNER_4entities](https://huggingface.co/datasets/CATIE-AQ/frenchNER_4entities), which represents ~385k sentences labeled in 4 categories : * PER: personality ; * LOC: location ; * ORG: organization ; * MISC: miscellaneous ; * O: background (Outside entity). The distribution of the entities is as follows:
Splits |
O |
PER |
LOC |
ORG |
MISC |
train |
7,539,692 |
307,144 |
286,746 |
127,089 |
799,494 |
---|---|---|---|---|---|
validation |
544,580 |
24,034 |
21,585 |
5,927 |
18,221 |
test |
720,623 |
32,870 |
29,683 |
7,911 |
21,760 |
Model |
Metrics |
PER |
LOC |
ORG |
MISC |
O |
Overall |
---|---|---|---|---|---|---|---|
Camembert-base-frenchNER_4entities |
Precision |
A |
B |
C |
D |
E |
F |
Recall |
A |
B |
C |
D |
E |
F |
|
F1 | A |
B |
C |
D |
E |
F |
|
Number |
A |
B |
C |
D |
E |
F |
Model |
Metrics |
PER |
LOC |
ORG |
MISC |
O |
Overall |
---|---|---|---|---|---|---|---|
Camembert-base-frenchNER_4entities |
Precision |
A |
B |
C |
D |
E |
F |
Recall |
A |
B |
C |
D |
E |
F |
|
F1 | A |
B |
C |
D |
E |
F |
|
Number |
A |
B |
C |
D |
E |
F |
Model |
Metrics |
PER |
LOC |
ORG |
MISC |
O |
Overall |
---|---|---|---|---|---|---|---|
Camembert-base-frenchNER_4entities |
Precision |
A |
B |
C |
D |
E |
F |
Recall |
A |
B |
C |
D |
E |
F |
|
F1 | A |
B |
C |
D |
E |
F |
|
Number |
A |
B |
C |
D |
E |
F |
Model |
Metrics |
PER |
LOC |
ORG |
MISC |
O |
Overall |
---|---|---|---|---|---|---|---|
Camembert-base-frenchNER_4entities |
Precision |
A |
B |
C |
D |
E |
F |
Recall |
A |
B |
C |
D |
E |
F |
|
F1 | A |
B |
C |
D |
E |
F |
|
Number |
A |
B |
C |
D |
E |
F |