Spaces:
Runtime error
Runtime error
Commit
·
d772cf1
1
Parent(s):
88b0888
Update README.md
Browse files
README.md
CHANGED
@@ -12,22 +12,50 @@ app_file: app.py
|
|
12 |
pinned: false
|
13 |
---
|
14 |
|
15 |
-
# Metric Card for relation_extraction
|
|
|
16 |
|
17 |
-
***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
|
18 |
|
19 |
## Metric Description
|
20 |
-
|
21 |
|
22 |
## How to Use
|
23 |
-
|
|
|
|
|
24 |
|
25 |
-
|
|
|
|
|
26 |
|
27 |
-
|
28 |
-
|
29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
### Output Values
|
32 |
|
33 |
*Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
|
|
|
12 |
pinned: false
|
13 |
---
|
14 |
|
15 |
+
# Metric Card for relation_extraction evalutation
|
16 |
+
This metric is for evaluating the quality of relation extraction output. By calculating the Micro and Macro F1 score of every relation extraction outputs to ensure the quality.
|
17 |
|
|
|
18 |
|
19 |
## Metric Description
|
20 |
+
This metric can be used in relation extraction evaluation.
|
21 |
|
22 |
## How to Use
|
23 |
+
This metric takes 2 inputs, prediction and references(ground truth). Both of them are a list of list of dictionary of entity's name and entity's type:
|
24 |
+
```
|
25 |
+
import evaluate
|
26 |
|
27 |
+
# load metric
|
28 |
+
metric_path = "Ikala-allen/relation_extraction"
|
29 |
+
module = evaluate.load(metric_path)
|
30 |
|
31 |
+
# Define your predictions and references
|
32 |
+
references = [
|
33 |
+
[
|
34 |
+
{"head": "phip igments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
|
35 |
+
{"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
|
36 |
+
]
|
37 |
+
]
|
38 |
+
|
39 |
+
# Example references (ground truth)
|
40 |
+
predictions = [
|
41 |
+
[
|
42 |
+
{"head": "phipigments", "head_type": "product", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
|
43 |
+
{"head": "tinadaviespigments", "head_type": "brand", "type": "sell", "tail": "國際認證之色乳", "tail_type": "product"},
|
44 |
+
]
|
45 |
+
]
|
46 |
+
|
47 |
+
# Calculate evaluation scores using the loaded metric
|
48 |
+
evaluation_scores = module.compute(predictions=predictions, references=references)
|
49 |
|
50 |
+
print(evaluation_scores)
|
51 |
+
>>> {'sell': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0}, 'ALL': {'tp': 1, 'fp': 1, 'fn': 1, 'p': 50.0, 'r': 50.0, 'f1': 50.0, 'Macro_f1': 50.0, 'Macro_p': 50.0, 'Macro_r': 50.0}}
|
52 |
+
```
|
53 |
+
|
54 |
+
|
55 |
+
### Inputs
|
56 |
+
- **predictions** (`list` of `list`s of `dictionary`s): relation and its type of prediction
|
57 |
+
- **references** (`list` of `list`s of `dictionary`s): references for each relation and its type.
|
58 |
+
-
|
59 |
### Output Values
|
60 |
|
61 |
*Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
|