Update README.md
Browse files
README.md
CHANGED
|
@@ -7,27 +7,46 @@ datasets:
|
|
| 7 |
metrics:
|
| 8 |
- recall
|
| 9 |
tags:
|
| 10 |
-
-
|
| 11 |
-
- sentence-similarity
|
| 12 |
library_name: sentence-transformers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
-
|
| 16 |
|
| 17 |
-
|
| 18 |
-
<h4 align="center">
|
| 19 |
-
<p>
|
| 20 |
-
<a href=#usage>🛠️ Usage</a> |
|
| 21 |
-
<a href="#evaluation">📊 Evaluation</a> |
|
| 22 |
-
<a href="#train">🤖 Training</a> |
|
| 23 |
-
<a href="#citation">🔗 Citation</a>
|
| 24 |
-
<p>
|
| 25 |
-
</h4>
|
| 26 |
-
|
| 27 |
-
This is a [sentence-transformers](https://www.SBERT.net) model. It maps questions and paragraphs 768-dimensional dense vectors and should be used for semantic search.
|
| 28 |
The model uses an [CamemBERT-L10](https://huggingface.co/antoinelouis/camembert-L10) backbone, which is a pruned version of the pre-trained [CamemBERT](https://huggingface.co/camembert-base)
|
| 29 |
-
checkpoint with 13% less parameters, obtained by [dropping the top-layers](https://doi.org/10.48550/arXiv.2004.03844) from the original model.
|
| 30 |
-
The model was trained on the **French** portion of the [mMARCO](https://huggingface.co/datasets/unicamp-dl/mmarco) retrieval dataset.
|
| 31 |
|
| 32 |
## Usage
|
| 33 |
|
|
|
|
| 7 |
metrics:
|
| 8 |
- recall
|
| 9 |
tags:
|
| 10 |
+
- passage-retrieval
|
|
|
|
| 11 |
library_name: sentence-transformers
|
| 12 |
+
base_model: antoinelouis/camembert-L10
|
| 13 |
+
model-index:
|
| 14 |
+
- name: biencoder-camembert-L10-mmarcoFR
|
| 15 |
+
results:
|
| 16 |
+
- task:
|
| 17 |
+
type: sentence-similarity
|
| 18 |
+
name: Passage Retrieval
|
| 19 |
+
dataset:
|
| 20 |
+
type: unicamp-dl/mmarco
|
| 21 |
+
name: mMARCO-fr
|
| 22 |
+
config: french
|
| 23 |
+
split: validation
|
| 24 |
+
metrics:
|
| 25 |
+
- type: recall_at_500
|
| 26 |
+
name: Recall@500
|
| 27 |
+
value: 87.8
|
| 28 |
+
- type: recall_at_100
|
| 29 |
+
name: Recall@100
|
| 30 |
+
value: 76.7
|
| 31 |
+
- type: recall_at_10
|
| 32 |
+
name: Recall@10
|
| 33 |
+
value: 49.5
|
| 34 |
+
- type: mrr_at_10
|
| 35 |
+
name: MRR@10
|
| 36 |
+
value: 27.5
|
| 37 |
+
- type: ndcg_at_10
|
| 38 |
+
name: nDCG@10
|
| 39 |
+
value: 32.5
|
| 40 |
+
- type: map_at_10
|
| 41 |
+
name: MAP@10
|
| 42 |
+
value: 27.0
|
| 43 |
---
|
| 44 |
|
| 45 |
+
# biencoder-camembert-L10-mmarcoFR
|
| 46 |
|
| 47 |
+
This is a lightweight dense single-vector bi-encoder model for French. It maps questions and paragraphs 768-dimensional dense vectors and should be used for semantic search.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
The model uses an [CamemBERT-L10](https://huggingface.co/antoinelouis/camembert-L10) backbone, which is a pruned version of the pre-trained [CamemBERT](https://huggingface.co/camembert-base)
|
| 49 |
+
checkpoint with 13% less parameters, obtained by [dropping the top-layers](https://doi.org/10.48550/arXiv.2004.03844) from the original model.
|
|
|
|
| 50 |
|
| 51 |
## Usage
|
| 52 |
|