Spaces:
Sleeping
Sleeping
score
Browse files- README.md +22 -30
- requirements.txt +2 -1
- signwriting_similarity.py +95 -51
- tests.py +43 -11
README.md
CHANGED
@@ -1,48 +1,40 @@
|
|
1 |
-
---
|
2 |
-
title: SignWriting Similarity
|
3 |
-
tags:
|
4 |
-
- evaluate
|
5 |
-
- metric
|
6 |
-
description: "TODO: add a description here"
|
7 |
-
sdk: gradio
|
8 |
-
sdk_version: 3.19.1
|
9 |
-
app_file: app.py
|
10 |
-
pinned: false
|
11 |
-
---
|
12 |
-
|
13 |
# Metric Card for SignWriting Similarity
|
14 |
|
15 |
-
***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
|
16 |
-
|
17 |
## Metric Description
|
18 |
-
|
19 |
|
20 |
## How to Use
|
21 |
-
|
22 |
-
|
23 |
-
*Provide simplest possible example for using the metric*
|
24 |
|
25 |
### Inputs
|
26 |
-
|
27 |
-
|
|
|
|
|
|
|
|
|
28 |
|
29 |
### Output Values
|
30 |
|
31 |
-
|
32 |
|
33 |
-
|
|
|
|
|
34 |
|
35 |
-
|
36 |
-
*Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
|
37 |
|
38 |
-
|
39 |
-
*
|
|
|
40 |
|
41 |
## Limitations and Bias
|
42 |
-
|
|
|
|
|
|
|
|
|
43 |
|
44 |
## Citation
|
45 |
-
*Cite the source where this metric was introduced.*
|
46 |
|
47 |
-
|
48 |
-
*Add any useful further references.*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Metric Card for SignWriting Similarity
|
2 |
|
|
|
|
|
3 |
## Metric Description
|
4 |
+
The Symbol Distance Metric is a novel evaluation metric specifically designed for SignWriting, a visual writing system for signed languages. Unlike traditional string-based metrics (e.g., BLEU, chrF), this metric directly considers the visual and spatial properties of individual symbols used in SignWriting, such as base shape, orientation, rotation, and position. It is primarily used to evaluate model outputs in SignWriting transcription and translation tasks, offering a similarity score between a predicted and a reference sign.
|
5 |
|
6 |
## How to Use
|
7 |
+
The metric is used by passing two SignWriting signs (as sets of symbols) and computing a similarity score that reflects how closely they match in terms of symbol content and layout.
|
|
|
|
|
8 |
|
9 |
### Inputs
|
10 |
+
|
11 |
+
* **hypothesis** *(List\[Symbol]):* The output sign, represented as a list of symbols with visual and spatial properties.
|
12 |
+
* **reference** *(List\[Symbol]):* The gold/reference sign, in the same format.
|
13 |
+
* **alpha** *(float, default=2.0):* Controls exponential scaling of symbol distance normalization.
|
14 |
+
* **beta** *(float, default=2.0):* Controls the penalty for sign length mismatches.
|
15 |
+
* **gamma** *(float, default=1.0):* Controls final exponential scaling of the overall score.
|
16 |
|
17 |
### Output Values
|
18 |
|
19 |
+
Returns a dictionary like:
|
20 |
|
21 |
+
```python
|
22 |
+
{"score": 0.83}
|
23 |
+
```
|
24 |
|
25 |
+
This metric outputs a score between 0 and 1:
|
|
|
26 |
|
27 |
+
* **1.0**: Perfect similarity (identical signs)
|
28 |
+
* **0.0**: Complete dissimilarity
|
29 |
+
Higher scores are better. A score above 0.8 is typically considered very good for single sign comparisons.
|
30 |
|
31 |
## Limitations and Bias
|
32 |
+
|
33 |
+
* The metric relies on a manually defined distance function for symbol attributes, which may not fully capture perceptual similarity.
|
34 |
+
* Performance has primarily been validated qualitatively; quantitative alignment with human judgment is ongoing.
|
35 |
+
* It assumes symbol independence and uses a Hungarian matching algorithm, which may miss some higher-order structural patterns in complex signs.
|
36 |
+
* Currently more suitable for evaluating single signs than continuous signing sequences.
|
37 |
|
38 |
## Citation
|
|
|
39 |
|
40 |
+
Amit Moryossef, Rotem Zilberman, Ohad Langer (2024). *Effective Sign Language Evaluation via SignWriting*. [arXiv:2410.13668](https://arxiv.org/abs/2410.13668)
|
|
requirements.txt
CHANGED
@@ -1 +1,2 @@
|
|
1 |
-
git+https://github.com/huggingface/evaluate@main
|
|
|
|
1 |
+
git+https://github.com/huggingface/evaluate@main
|
2 |
+
git+https://github.com/sign-language-processing/signwriting-evaluation
|
signwriting_similarity.py
CHANGED
@@ -11,85 +11,129 @@
|
|
11 |
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
12 |
# See the License for the specific language governing permissions and
|
13 |
# limitations under the License.
|
14 |
-
"""
|
15 |
|
16 |
import evaluate
|
17 |
import datasets
|
|
|
18 |
|
19 |
-
|
20 |
-
# TODO: Add BibTeX citation
|
21 |
_CITATION = """\
|
22 |
-
@
|
23 |
-
title
|
24 |
-
|
25 |
-
year={
|
|
|
|
|
|
|
|
|
26 |
}
|
27 |
"""
|
28 |
|
29 |
-
# TODO: Add description of the module here
|
30 |
_DESCRIPTION = """\
|
31 |
-
|
32 |
"""
|
33 |
|
34 |
-
|
35 |
-
# TODO: Add description of the arguments of the module here
|
36 |
_KWARGS_DESCRIPTION = """
|
37 |
-
|
|
|
38 |
Args:
|
39 |
-
predictions
|
40 |
-
|
41 |
-
references
|
42 |
-
|
43 |
Returns:
|
44 |
-
|
45 |
-
another_score: description of the second score,
|
46 |
Examples:
|
47 |
-
|
48 |
-
|
|
|
|
|
|
|
|
|
|
|
49 |
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
|
56 |
-
|
57 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
58 |
|
59 |
|
60 |
@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
|
61 |
class SignWritingSimilarity(evaluate.Metric):
|
62 |
-
|
63 |
|
64 |
def _info(self):
|
65 |
-
# TODO: Specifies the evaluate.EvaluationModuleInfo object
|
66 |
return evaluate.MetricInfo(
|
67 |
-
# This is the description that will appear on the modules page.
|
68 |
module_type="metric",
|
69 |
description=_DESCRIPTION,
|
70 |
citation=_CITATION,
|
71 |
inputs_description=_KWARGS_DESCRIPTION,
|
72 |
-
|
73 |
-
features=
|
74 |
-
|
75 |
-
|
76 |
-
|
77 |
-
|
78 |
-
|
79 |
-
|
80 |
-
|
81 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
82 |
)
|
83 |
|
84 |
-
def _download_and_prepare(self, dl_manager):
|
85 |
-
"""Optional: download external resources useful to compute the scores"""
|
86 |
-
# TODO: Download external resources if needed
|
87 |
-
pass
|
88 |
-
|
89 |
def _compute(self, predictions, references):
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
return {
|
94 |
-
"accuracy": accuracy,
|
95 |
-
}
|
|
|
11 |
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
12 |
# See the License for the specific language governing permissions and
|
13 |
# limitations under the License.
|
14 |
+
"""SignWriting Similarity metric from the signwriting-evaluation package"""
|
15 |
|
16 |
import evaluate
|
17 |
import datasets
|
18 |
+
from signwriting_evaluation.metrics.similarity import SignWritingSimilarityMetric
|
19 |
|
|
|
|
|
20 |
_CITATION = """\
|
21 |
+
@misc{moryossef2024signwritingevaluationeffectivesignlanguage,
|
22 |
+
title={signwriting-evaluation: Effective Sign Language Evaluation via SignWriting},
|
23 |
+
author={Amit Moryossef and Rotem Zilberman and Ohad Langer},
|
24 |
+
year={2024},
|
25 |
+
eprint={2410.13668},
|
26 |
+
archivePrefix={arXiv},
|
27 |
+
primaryClass={cs.CL},
|
28 |
+
url={https://arxiv.org/abs/2410.13668},
|
29 |
}
|
30 |
"""
|
31 |
|
|
|
32 |
_DESCRIPTION = """\
|
33 |
+
SignWriting Similarity metric from the signwriting-evaluation package
|
34 |
"""
|
35 |
|
|
|
|
|
36 |
_KWARGS_DESCRIPTION = """
|
37 |
+
Produces similarity scores for hypotheses given reference translations.
|
38 |
+
|
39 |
Args:
|
40 |
+
predictions (list of str):
|
41 |
+
The predicted sentences.
|
42 |
+
references (list of list of str):
|
43 |
+
The references. There should be one reference sub-list for each prediction sentence.
|
44 |
Returns:
|
45 |
+
score (float): The similarity score between 0 and 1
|
|
|
46 |
Examples:
|
47 |
+
Example 1 -- basic similarity score:
|
48 |
+
>>> predictions = ["M530x538S37602508x462S15a11493x494S20e00488x510S22f03469x517"]
|
49 |
+
>>> references = [["M519x534S37900497x466S3770b497x485S15a51491x501S22f03481x513"]]
|
50 |
+
>>> metric = evaluate.load("signwriting_similarity")
|
51 |
+
>>> results = metric.compute(predictions=predictions, references=references)
|
52 |
+
>>> print(results)
|
53 |
+
{'score': 0.5509574768254414}
|
54 |
|
55 |
+
Example 2 -- identical signs in different order:
|
56 |
+
>>> predictions = ["M530x538S37602508x462S15a11493x494S20e00488x510S22f03469x517"]
|
57 |
+
>>> references = [["M530x538S22f03469x517S37602508x462S20e00488x510S15a11493x494"]]
|
58 |
+
>>> metric = evaluate.load("signwriting_similarity")
|
59 |
+
>>> results = metric.compute(predictions=predictions, references=references)
|
60 |
+
>>> print(results)
|
61 |
+
{'score': 1.0}
|
62 |
+
|
63 |
+
Example 3 -- slightly different symbols:
|
64 |
+
>>> predictions = ["M530x538S17600508x462S15a11493x494S20e00488x510S22f03469x517"]
|
65 |
+
>>> references = [["M530x538S17600508x462S12a11493x494S20e00488x510S22f13469x517"]]
|
66 |
+
>>> metric = evaluate.load("signwriting_similarity")
|
67 |
+
>>> results = metric.compute(predictions=predictions, references=references)
|
68 |
+
>>> print(results)
|
69 |
+
{'score': 0.8326259781509948}
|
70 |
+
|
71 |
+
Example 4 -- multiple references, one good and one bad:
|
72 |
+
>>> predictions = ["M530x538S17600508x462S15a11493x494S20e00488x510S22f03469x517"]
|
73 |
+
>>> references = [["M530x538S17600508x462S12a11493x494S20e00488x510S22f13469x517"], ["M530x538S17600508x462"]]
|
74 |
+
>>> metric = evaluate.load("signwriting_similarity")
|
75 |
+
>>> results = metric.compute(predictions=predictions, references=references)
|
76 |
+
>>> print(results)
|
77 |
+
{'score': 0.8326259781509948}
|
78 |
|
79 |
+
Example 5 -- multiple signs in hypothesis:
|
80 |
+
>>> predictions = ["M530x538S17600508x462S15a11493x494S20e00488x510S22f03469x517 M530x538S17600508x462S15a11493x494S20e00488x510S22f03469x517"]
|
81 |
+
>>> references = [["M530x538S17600508x462S12a11493x494S20e00488x510S22f13469x517"]]
|
82 |
+
>>> metric = evaluate.load("signwriting_similarity")
|
83 |
+
>>> results = metric.compute(predictions=predictions, references=references)
|
84 |
+
>>> print(results)
|
85 |
+
{'score': 0.4163129890754974}
|
86 |
+
|
87 |
+
Example 6 -- sign order does not affect similarity:
|
88 |
+
>>> predictions = ["M530x538S17600508x462S15a11493x494S20e00488x510S22f03469x517 M530x538S17600508x462S12a11493x494S20e00488x510S22f13469x517"]
|
89 |
+
>>> references = [["M530x538S17600508x462S12a11493x494S20e00488x510S22f13469x517 M530x538S17600508x462S15a11493x494S20e00488x510S22f03469x517"]]
|
90 |
+
>>> metric = evaluate.load("signwriting_similarity")
|
91 |
+
>>> results = metric.compute(predictions=predictions, references=references)
|
92 |
+
>>> print(results)
|
93 |
+
{'score': 1.0}
|
94 |
+
|
95 |
+
Example 7 -- invalid FSW input should result in 0 score:
|
96 |
+
>>> predictions = ["M<s><s>M<s>p483"]
|
97 |
+
>>> references = [["M<s><s>M<s>p483"]]
|
98 |
+
>>> metric = evaluate.load("signwriting_similarity")
|
99 |
+
>>> results = metric.compute(predictions=predictions, references=references)
|
100 |
+
>>> print(results)
|
101 |
+
{'score': 0.0}
|
102 |
+
"""
|
103 |
|
104 |
|
105 |
@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
|
106 |
class SignWritingSimilarity(evaluate.Metric):
|
107 |
+
metric = SignWritingSimilarityMetric()
|
108 |
|
109 |
def _info(self):
|
|
|
110 |
return evaluate.MetricInfo(
|
|
|
111 |
module_type="metric",
|
112 |
description=_DESCRIPTION,
|
113 |
citation=_CITATION,
|
114 |
inputs_description=_KWARGS_DESCRIPTION,
|
115 |
+
homepage="https://github.com/sign-language-processing/signwriting-evaluation",
|
116 |
+
features=[
|
117 |
+
datasets.Features(
|
118 |
+
{
|
119 |
+
"predictions": datasets.Value("string", id="sequence"),
|
120 |
+
"references": datasets.Sequence(datasets.Value("string", id="sequence"), id="references"),
|
121 |
+
}
|
122 |
+
),
|
123 |
+
datasets.Features(
|
124 |
+
{
|
125 |
+
"predictions": datasets.Value("string", id="sequence"),
|
126 |
+
"references": datasets.Value("string", id="sequence"),
|
127 |
+
}
|
128 |
+
),
|
129 |
+
],
|
130 |
+
codebase_urls=["https://github.com/sign-language-processing/signwriting-evaluation"],
|
131 |
+
reference_urls=[
|
132 |
+
"https://github.com/sign-language-processing/signwriting-evaluation",
|
133 |
+
],
|
134 |
)
|
135 |
|
|
|
|
|
|
|
|
|
|
|
136 |
def _compute(self, predictions, references):
|
137 |
+
score = self.metric.corpus_score(predictions, references)
|
138 |
+
|
139 |
+
return {"score": score}
|
|
|
|
|
|
tests.py
CHANGED
@@ -1,17 +1,49 @@
|
|
1 |
test_cases = [
|
2 |
{
|
3 |
-
"predictions": [
|
4 |
-
"references": [
|
5 |
-
"result": {"
|
6 |
},
|
7 |
{
|
8 |
-
"predictions": [
|
9 |
-
"references": [
|
10 |
-
"result": {"
|
11 |
},
|
12 |
{
|
13 |
-
"predictions": [
|
14 |
-
"references": [
|
15 |
-
"result": {"
|
16 |
-
}
|
17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
test_cases = [
|
2 |
{
|
3 |
+
"predictions": ["M530x538S37602508x462S15a11493x494S20e00488x510S22f03469x517"],
|
4 |
+
"references": ["M519x534S37900497x466S3770b497x485S15a51491x501S22f03481x513"],
|
5 |
+
"result": {"score": 0.5509574768254414},
|
6 |
},
|
7 |
{
|
8 |
+
"predictions": ["M530x538S37602508x462S15a11493x494S20e00488x510S22f03469x517"],
|
9 |
+
"references": ["M530x538S22f03469x517S37602508x462S20e00488x510S15a11493x494"],
|
10 |
+
"result": {"score": 1.0},
|
11 |
},
|
12 |
{
|
13 |
+
"predictions": ["M530x538S17600508x462S15a11493x494S20e00488x510S22f03469x517"],
|
14 |
+
"references": ["M530x538S17600508x462S12a11493x494S20e00488x510S22f13469x517"],
|
15 |
+
"result": {"score": 0.8326259781509948},
|
16 |
+
},
|
17 |
+
{
|
18 |
+
"predictions": ["M530x538S17600508x462S15a11493x494S20e00488x510S22f03469x517"],
|
19 |
+
"references": [
|
20 |
+
"M530x538S17600508x462S12a11493x494S20e00488x510S22f13469x517",
|
21 |
+
"M530x538S17600508x462"
|
22 |
+
],
|
23 |
+
"result": {"score": 0.8326259781509948},
|
24 |
+
},
|
25 |
+
{
|
26 |
+
"predictions": [
|
27 |
+
"M530x538S17600508x462S15a11493x494S20e00488x510S22f03469x517 "
|
28 |
+
"M530x538S17600508x462S15a11493x494S20e00488x510S22f03469x517"
|
29 |
+
],
|
30 |
+
"references": ["M530x538S17600508x462S12a11493x494S20e00488x510S22f13469x517"],
|
31 |
+
"result": {"score": 0.4163129890754974},
|
32 |
+
},
|
33 |
+
{
|
34 |
+
"predictions": [
|
35 |
+
"M530x538S17600508x462S15a11493x494S20e00488x510S22f03469x517 "
|
36 |
+
"M530x538S17600508x462S12a11493x494S20e00488x510S22f13469x517"
|
37 |
+
],
|
38 |
+
"references": [
|
39 |
+
"M530x538S17600508x462S12a11493x494S20e00488x510S22f13469x517 "
|
40 |
+
"M530x538S17600508x462S15a11493x494S20e00488x510S22f03469x517"
|
41 |
+
],
|
42 |
+
"result": {"score": 1.0},
|
43 |
+
},
|
44 |
+
{
|
45 |
+
"predictions": ["M<s><s>M<s>p483"],
|
46 |
+
"references": ["M<s><s>M<s>p483"],
|
47 |
+
"result": {"score": 0.0},
|
48 |
+
},
|
49 |
+
]
|