Spaces:
Runtime error
Runtime error
Commit
·
ee82c0f
1
Parent(s):
0d196a8
Update README.md
Browse files
README.md
CHANGED
@@ -5,11 +5,14 @@ datasets:
|
|
5 |
tags:
|
6 |
- evaluate
|
7 |
- metric
|
8 |
-
description:
|
|
|
|
|
9 |
sdk: gradio
|
10 |
sdk_version: 3.19.1
|
11 |
app_file: app.py
|
12 |
pinned: false
|
|
|
13 |
---
|
14 |
|
15 |
# Metric Card for relation_extraction evalutation
|
@@ -31,16 +34,14 @@ This metric takes 2 inputs, prediction and references(ground truth). Both of the
|
|
31 |
... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
|
32 |
... ]
|
33 |
... ]
|
34 |
-
|
35 |
>>> predictions = [
|
36 |
... [
|
37 |
... {"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
|
38 |
... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
|
39 |
... ]
|
40 |
... ]
|
41 |
-
|
42 |
-
>>> evaluation_scores
|
43 |
-
>>> print(evaluation_scores)
|
44 |
{'sell': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0}, 'ALL': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}}
|
45 |
```
|
46 |
|
@@ -126,10 +127,17 @@ Example with two or more prediction and reference:
|
|
126 |
```
|
127 |
|
128 |
## Limitations and Bias
|
129 |
-
This metric has
|
130 |
|
131 |
## Citation
|
132 |
-
|
133 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
134 |
## Further References
|
135 |
-
|
|
|
|
5 |
tags:
|
6 |
- evaluate
|
7 |
- metric
|
8 |
+
description: >-
|
9 |
+
This metric is used for evaluating the F1 accuracy of input references and
|
10 |
+
predictions.
|
11 |
sdk: gradio
|
12 |
sdk_version: 3.19.1
|
13 |
app_file: app.py
|
14 |
pinned: false
|
15 |
+
license: apache-2.0
|
16 |
---
|
17 |
|
18 |
# Metric Card for relation_extraction evalutation
|
|
|
34 |
... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
|
35 |
... ]
|
36 |
... ]
|
|
|
37 |
>>> predictions = [
|
38 |
... [
|
39 |
... {"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
|
40 |
... {"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
|
41 |
... ]
|
42 |
... ]
|
43 |
+
>>> evaluation_scores = module.compute(predictions=predictions, references=references)
|
44 |
+
>>> print(evaluation_scores)
|
|
|
45 |
{'sell': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0}, 'ALL': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}}
|
46 |
```
|
47 |
|
|
|
127 |
```
|
128 |
|
129 |
## Limitations and Bias
|
130 |
+
This metric has strict filter mechanism, if any of the prediction's entity names, such as head, head_type, type, tail, or tail_type, is not exactly the same as the reference one. It will count as fp or fn.
|
131 |
|
132 |
## Citation
|
133 |
+
```bibtex
|
134 |
+
@Paper{
|
135 |
+
author = {Bruno Taillé, Vincent Guigue, Geoffrey Scoutheeten, Patrick Gallinari},
|
136 |
+
title = {Let's Stop Incorrect Comparisons in End-to-end Relation Extraction!},
|
137 |
+
year = {2020},
|
138 |
+
}
|
139 |
+
*https://arxiv.org/abs/2009.10684*
|
140 |
+
```
|
141 |
## Further References
|
142 |
+
This evaluation metric implementation uses
|
143 |
+
*https://github.com/btaille/sincere/blob/master/code/utils/evaluation.py*
|