Commit
·
76dcef5
1
Parent(s):
e3cbd20
Update README.md
Browse files
README.md
CHANGED
|
@@ -12,10 +12,12 @@ datasets:
|
|
| 12 |
---
|
| 13 |
|
| 14 |
This is a [ruBERT-conversational](https://huggingface.co/DeepPavlov/rubert-base-cased-conversational) model trained on the mixture of 3 paraphrase detection datasets:
|
| 15 |
-
- [ru_paraphraser](https://huggingface.co/merionum/ru_paraphraser)
|
| 16 |
- [RuPAWS](https://github.com/ivkrotova/rupaws_dataset)
|
| 17 |
- A dataset containing crowdsourced evaluation of content preservation in Russian text detoxification by [Dementieva et al, 2022](https://www.dialog-21.ru/media/5755/dementievadplusetal105.pdf).
|
| 18 |
|
|
|
|
|
|
|
| 19 |
Training notebook: `task_oriented_TST/similarity/cross_encoders/russian/train_russian_paraphrase_detector__fixed.ipynb` (in a private repo).
|
| 20 |
|
| 21 |
Training parameters:
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
This is a [ruBERT-conversational](https://huggingface.co/DeepPavlov/rubert-base-cased-conversational) model trained on the mixture of 3 paraphrase detection datasets:
|
| 15 |
+
- [ru_paraphraser](https://huggingface.co/merionum/ru_paraphraser) (with classes -1 and 0 merged)
|
| 16 |
- [RuPAWS](https://github.com/ivkrotova/rupaws_dataset)
|
| 17 |
- A dataset containing crowdsourced evaluation of content preservation in Russian text detoxification by [Dementieva et al, 2022](https://www.dialog-21.ru/media/5755/dementievadplusetal105.pdf).
|
| 18 |
|
| 19 |
+
The model can be used to assess semantic similarity of Russian sentences.
|
| 20 |
+
|
| 21 |
Training notebook: `task_oriented_TST/similarity/cross_encoders/russian/train_russian_paraphrase_detector__fixed.ipynb` (in a private repo).
|
| 22 |
|
| 23 |
Training parameters:
|