howard-hou
/

IACC-ranker-small

Feature Extraction

Model card Files Files and versions Community

howard-hou commited on Aug 28, 2023

Commit

39d1084

·

1 Parent(s): fb163e5

Update README.md

Files changed (1) hide show

README.md +41 -1

README.md CHANGED Viewed

@@ -3,4 +3,44 @@ license: apache-2.0
 language:
 - zh
 library_name: transformers
----

 language:
 - zh
 library_name: transformers
+---
+### RankingPrompter
+RankingPrompter是由人工智能与数字经济广东省实验室（深圳光明实验室）开发的一个开源的重排/精排模型。
+- 在大约1500万中文句对数据集上进行训练。
+- 在多项中文测试集上均取得最好的效果。
+如果希望使用RankingPrompter更加丰富的功能（如完整的文档编码-召回-精排链路），我们推荐使用配套代码库(To be released)。
+### 如何使用
+You can use this model simply as a re-ranker, note now the model is only available for Chinese.
+本模型可简单用作一个强力的重排/精排模型，现阶段仅支持中文。
+```python
+from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained("howard-hou/RankingPrompterForPreTraining-small")
+model = AutoModel.from_pretrained("howard-hou/RankingPrompterForPreTraining-small")
+#
+documents = [
+'水库诱发地震的震中多在库底和水库边缘。',
+'双标紫斑蝶广泛分布于南亚、东南亚、澳洲、新几内亚等地。台湾地区于本岛中海拔地区可见，多以特有亚种归类。',
+'月经停止是怀孕最显著也是最早的一个信号，如果在无避孕措施下进行了性生活而出现月经停止的话，很可能就是怀孕了。'
+]
+question = "什么是怀孕最显著也是最早的信号？"
+question_input = tokenizer('question', return_tensors="pt")
+docs_input = tokenizer(documents , padding="max_length", return_tensors="pt")
+# document input shape should be [batch_size, num_docs, seq_len]
+# so if only input one sample of documents, add one dim by unsqueeze(0)
+output = model(
+document_input_ids=docs_input.input_ids.unsqueeze(0),
+document_attention_mask=docs_input.attention_mask.unsqueeze(0),
+question_input_ids=question_input.input_ids,
+question_attention_mask=question_input.attention_mask
+)
+print("reranking scores: ", output.logits)
+```