Spaces:
Runtime error
Runtime error
title: relation_extraction | |
datasets: | |
- none | |
tags: | |
- evaluate | |
- metric | |
description: >- | |
This metric is used for evaluating the F1 accuracy of input references and | |
predictions. | |
sdk: gradio | |
sdk_version: 3.19.1 | |
app_file: app.py | |
pinned: false | |
license: apache-2.0 | |
# Metric Card for relation_extraction evalutation | |
This metric is used for evaluating the quality of relation extraction output. By calculating the Micro and Macro F1 score of every relation extraction outputs to ensure the quality. | |
## Metric Description | |
This metric can be used in relation extraction evaluation. | |
## How to Use | |
This metric takes 2 inputs, prediction and references(ground truth). Both of them are a list of list of dictionary of entity's name and entity's type: | |
```python | |
import evaluate | |
metric_path = "Ikala-allen/relation_extraction" | |
module = evaluate.load(metric_path) | |
references = [ | |
[ | |
{"head": "phip igments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
{"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
] | |
] | |
predictions = [ | |
[ | |
{"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
{"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
] | |
] | |
evaluation_scores = module.compute(predictions=predictions, references=references, mode = "strict") | |
``` | |
### Inputs | |
- **predictions** (`list` of `list` of `dictionary`): a list of list of dictionary with every element's relation_type and their entity name | |
- **references** (`list` of `list` of `dictionary`): a list of list of dictionary with every element's relation_type and their entity name | |
- **mode** (`str`): define strict or boundaries mode for evaluation, strict mode consider "head_type" and "tail_type", boundaries mode doesn't consider "head_type" and "tail_type" | |
- **only_all** (`bool`): True for only output ["ALL"] relation_type score. False for output every relation_type score, default True | |
- **relation_types** (`list`): define relation type that need to be evaluate, if not given, it will construct relation_types from ground truth, default [] | |
### Output Values | |
**output** (`dictionary` of `dictionary`s) with multiple key-value pairs | |
- **sell** (`dictionary`): score of type sell | |
- **tp** : true positive count | |
- **fp** : false positive count | |
- **fn** : false negative count | |
- **p** : precision | |
- **r** : recall | |
- **f1** : micro f1 score | |
- **ALL** (`dictionary`): score of all of the type (sell and belongs to) | |
- **tp** : true positive count | |
- **fp** : false positive count | |
- **fn** : false negative count | |
- **p** : precision | |
- **r** : recall | |
- **f1** : micro f1 score | |
- **Macro_f1** : macro f1 score | |
- **Macro_p** : macro precision | |
- **Macro_r** : macro recall | |
Output Example: | |
```python | |
{'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0} | |
``` | |
Remind : Macro_f1、Macro_p、Macro_r、p、r、f1 are always a number between 0 and 1. And tp、fp、fn depend on how many data inputs. | |
### Examples | |
Example1 : only one prediction and reference, mode = strict, only output ALL relation score | |
```python | |
metric_path = "Ikala-allen/relation_extraction" | |
module = evaluate.load(metric_path) | |
references = [ | |
[ | |
{"head": "phipigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
{"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
{'head': 'A醛賦活緊緻精華', 'tail': 'Serum', 'head_type': 'product', 'tail_type': 'category', 'type': 'belongs_to'}, | |
] | |
] | |
# Example references (ground truth) | |
predictions = [ | |
[ | |
{"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
{"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
] | |
] | |
# Calculate evaluation scores using the loaded metric | |
evaluation_scores = module.compute(predictions=predictions, references=references, mode = "strict", only_all=True,relation_types = []) | |
print(evaluation_scores) | |
>>> {'tp': 1, 'fp': 1, 'fn': 2, 'p': 50.0, 'r': 33.333333333333336, 'f1': 40.0, 'Macro_f1': 25.0, 'Macro_p': 25.0, 'Macro_r': 25.0} | |
``` | |
Example2 : only one prediction and reference, mode = boundaries, only output ALL relation score | |
```python | |
metric_path = "Ikala-allen/relation_extraction" | |
module = evaluate.load(metric_path) | |
references = [ | |
[ | |
{"head": "phipigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
{"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
{'head': 'A醛賦活緊緻精華', 'tail': 'Serum', 'head_type': 'product', 'tail_type': 'category', 'type': 'belongs_to'}, | |
] | |
] | |
# Example references (ground truth) | |
predictions = [ | |
[ | |
{"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
{"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
] | |
] | |
# Calculate evaluation scores using the loaded metric | |
evaluation_scores = module.compute(predictions=predictions, references=references, mode = "strict", only_all=True,relation_types = []) | |
print(evaluation_scores) | |
>>> {'tp': 2, 'fp': 0, 'fn': 1, 'p': 100.0, 'r': 66.66666666666667, 'f1': 80.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0} | |
``` | |
Example3 : two or more prediction and reference, mode = boundaries, only output = False, output all relation type | |
```python | |
metric_path = "Ikala-allen/relation_extraction" | |
module = evaluate.load(metric_path) | |
# Define your predictions and references | |
references = [ | |
[ | |
{"head": "phipigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
{"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
], | |
[ | |
{'head': 'SABONTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'}, | |
{'head': 'A醛賦活緊緻精華', 'tail': 'Serum', 'head_type': 'product', 'tail_type': 'category', 'type': 'belongs_to'}, | |
] | |
] | |
# Example references (ground truth) | |
predictions = [ | |
[ | |
{"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
{"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
], | |
[ | |
{'head': 'SABONTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'}, | |
{'head': 'SNTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'} | |
] | |
] | |
# Calculate evaluation scores using the loaded metric | |
evaluation_scores = module.compute(predictions=predictions, references=references, mode = "boundaries", only_all = False, relation_types = []) | |
print(evaluation_scores) | |
>>> {'sell': {'tp': 3, 'fp': 1, 'fn': 0, 'p': 75.0, 'r': 100.0, 'f1': 85.71428571428571}, 'belongs_to': {'tp': 0, 'fp': 0, 'fn': 1, 'p': 0, 'r': 0, 'f1': 0}, 'ALL': {'tp': 3, 'fp': 1, 'fn': 1, 'p': 75.0, 'r': 75.0, 'f1': 75.0, 'Macro_f1': 42.857142857142854, 'Macro_p': 37.5, 'Macro_r': 50.0}} | |
``` | |
Example 4 with two or more prediction and reference: | |
```python | |
>>> metric_path = "Ikala-allen/relation_extraction" | |
>>> module = evaluate.load(metric_path) | |
>>> references = [ | |
... [ | |
... {"head": "phip igments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
... ],[ | |
... {'head': 'SABONTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'} | |
... ] | |
... ] | |
>>> predictions = [ | |
... [ | |
... {"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"}, | |
... ],[ | |
... {'head': 'SABONTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'}, | |
... {'head': 'SNTAIWAN', 'tail': '大馬士革玫瑰有機光燦系列', 'head_type': 'brand', 'tail_type': 'product', 'type': 'sell'} | |
... ] | |
... ] | |
>>> evaluation_scores = module.compute(predictions=predictions, references=references) | |
>>> print(evaluation_scores) | |
{'sell': {'tp': 2, 'fp': 2, 'fn': 1, 'p': 50.0, 'r': 66.66666666666667, 'f1': 57.142857142857146}, 'ALL': {'tp': 2, 'fp': 2, 'fn': 1, 'p': 50.0, 'r': 66.66666666666667, 'f1': 57.142857142857146, 'Macro_f1': 57.142857142857146, 'Macro_p': 50.0, 'Macro_r': 66.66666666666667}} | |
``` | |
## Limitations and Bias | |
This metric has strict filter mechanism, if any of the prediction's entity names, such as head, head_type, type, tail, or tail_type, is not exactly the same as the reference one. It will count as fp or fn. | |
## Citation | |
```bibtex | |
@Paper{ | |
author = {Bruno Taillé, Vincent Guigue, Geoffrey Scoutheeten, Patrick Gallinari}, | |
title = {Let's Stop Incorrect Comparisons in End-to-end Relation Extraction!}, | |
year = {2020}, | |
link = https://arxiv.org/abs/2009.10684 | |
} | |
``` | |
## Further References | |
This evaluation metric implementation uses | |
*https://github.com/btaille/sincere/blob/master/code/utils/evaluation.py* |