Translation
COMET
File size: 3,169 Bytes
d68a353
 
d054ea6
d68a353
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f2d553f
d68a353
 
f2d553f
 
 
 
 
 
d68a353
f2d553f
 
 
 
 
 
 
 
 
 
d68a353
f2d553f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84b5afa
f2d553f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
---
pipeline_tag: translation
library_name: comet
language:
- multilingual
- af
- am
- ar
- as
- az
- be
- bg
- bn
- br
- bs
- ca
- cs
- cy
- da
- de
- el
- en
- eo
- es
- et
- eu
- fa
- fi
- fr
- fy
- ga
- gd
- gl
- gu
- ha
- he
- hi
- hr
- hu
- hy
- id
- is
- it
- ja
- jv
- ka
- kk
- km
- kn
- ko
- ku
- ky
- la
- lo
- lt
- lv
- mg
- mk
- ml
- mn
- mr
- ms
- my
- ne
- nl
- 'no'
- om
- or
- pa
- pl
- ps
- pt
- ro
- ru
- sa
- sd
- si
- sk
- sl
- so
- sq
- sr
- su
- sv
- sw
- ta
- te
- th
- tl
- tr
- ug
- uk
- ur
- uz
- vi
- xh
- yi
- zh
license: apache-2.0
base_model:
- FacebookAI/xlm-roberta-large
---

# PreCOMET-disc [![Paper](https://img.shields.io/badge/📜%20paper-481.svg)](https://arxiv.org/abs/2501.18251)

This is a source-only COMET model used for efficient evaluation subset selection.
Specifically this model predicts `discriminability` distilled from an IRT model from up to WMT2022 (inclusive).
The lower the scores, the better it is for evaluation because it will distinguish between two models of similar quality.
It is not compatible with the original Unbabel's COMET and to run it you have to install [github.com/zouharvi/PreCOMET](https://github.com/zouharvi/PreCOMET):
```bash
pip install pip3 install git+https://github.com/zouharvi/PreCOMET.git
```

You can then use it in Python:
```python
import precomet
model = precomet.load_from_checkpoint(precomet.download_model("zouharvi/PreCOMET-disc"))
model.predict([
  {"src": "This is an easy source sentence."},
  {"src": "this is a much more complicated source sen-tence that will pro·bably lead to loww scores 🤪"}
])["scores"]
> [1.4137403964996338, 0.6074056625366211]
```

The primary use of this model is from the [subset2evaluate](https://github.com/zouharvi/subset2evaluate) package:

```python
import subset2evaluate

data_full = subset2evaluate.utils.load_data("wmt23/en-cs")
data_random = subset2evaluate.select_subset.basic(data_full, method="random")
subset2evaluate.evaluate.eval_subset_clusters(data_random[:100])
> 1
subset2evaluate.evaluate.eval_subset_correlation(data_random[:100], data_full)
> 0.71
```
Random selection gives us only one cluster and system-level Spearman correlation of 0.71 when we have a budget for only 100 segments. However, by using this model:
```python
data_precomet = subset2evaluate.select_subset.basic(data_full, method="precomet_disc")
subset2evaluate.evaluate.eval_subset_clusters(data_precomet[:100])
> 1
subset2evaluate.evaluate.eval_subset_correlation(data_precomet[:100], data_full)
> 0.75
```
we get higher correlation.
Note that this is not the best PreCOMET model and you can expect a bigger effect on a larger scale, as described in the paper.


This work is described in [How to Select Datapoints for Efficient Human Evaluation of NLG Models?](https://arxiv.org/abs/2501.18251).
Cite as:
```
@misc{zouhar2025selectdatapointsefficienthuman,
    title={How to Select Datapoints for Efficient Human Evaluation of NLG Models?}, 
    author={Vilém Zouhar and Peng Cui and Mrinmaya Sachan},
    year={2025},
    eprint={2501.18251},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2501.18251}, 
}
```