Update BM25S model
Browse files- README.md +6 -6
- corpus.jsonl +2 -2
- corpus.mmindex.json +0 -0
- data.csc.index.npy +2 -2
- indices.csc.index.npy +2 -2
- indptr.csc.index.npy +2 -2
- params.index.json +2 -2
- vocab.index.json +2 -2
README.md
CHANGED
|
@@ -11,7 +11,7 @@ tags:
|
|
| 11 |
|
| 12 |
# BM25S Index
|
| 13 |
|
| 14 |
-
This is a BM25S index created with the [`bm25s` library](https://github.com/xhluca/bm25s) (version `0.2.
|
| 15 |
|
| 16 |
BM25S Related Links:
|
| 17 |
|
|
@@ -26,10 +26,10 @@ BM25S Related Links:
|
|
| 26 |
You can install the `bm25s` library with `pip`:
|
| 27 |
|
| 28 |
```bash
|
| 29 |
-
pip install "bm25s==0.2.
|
| 30 |
|
| 31 |
# Include extra dependencies like stemmer
|
| 32 |
-
pip install "bm25s[full]==0.2.
|
| 33 |
|
| 34 |
# For huggingface hub usage
|
| 35 |
pip install huggingface_hub
|
|
@@ -123,9 +123,9 @@ This dataset was created using the following data:
|
|
| 123 |
|
| 124 |
| Statistic | Value |
|
| 125 |
| --- | --- |
|
| 126 |
-
| Number of documents |
|
| 127 |
-
| Number of tokens |
|
| 128 |
-
| Average tokens per document | 10.
|
| 129 |
|
| 130 |
## Parameters
|
| 131 |
|
|
|
|
| 11 |
|
| 12 |
# BM25S Index
|
| 13 |
|
| 14 |
+
This is a BM25S index created with the [`bm25s` library](https://github.com/xhluca/bm25s) (version `0.2.7post1`), an ultra-fast implementation of BM25. It can be used for lexical retrieval tasks.
|
| 15 |
|
| 16 |
BM25S Related Links:
|
| 17 |
|
|
|
|
| 26 |
You can install the `bm25s` library with `pip`:
|
| 27 |
|
| 28 |
```bash
|
| 29 |
+
pip install "bm25s==0.2.7post1"
|
| 30 |
|
| 31 |
# Include extra dependencies like stemmer
|
| 32 |
+
pip install "bm25s[full]==0.2.7post1"
|
| 33 |
|
| 34 |
# For huggingface hub usage
|
| 35 |
pip install huggingface_hub
|
|
|
|
| 123 |
|
| 124 |
| Statistic | Value |
|
| 125 |
| --- | --- |
|
| 126 |
+
| Number of documents | 750312 |
|
| 127 |
+
| Number of tokens | 7592215 |
|
| 128 |
+
| Average tokens per document | 10.12 |
|
| 129 |
|
| 130 |
## Parameters
|
| 131 |
|
corpus.jsonl
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2b999015937322493801414a7628aea03373b140623efa2d7e247af03e4eb2b2
|
| 3 |
+
size 73690346
|
corpus.mmindex.json
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
data.csc.index.npy
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e30a1c11c7cc998cca6c57896b569da0e32ca94e6c59b92d14648706c2d670aa
|
| 3 |
+
size 30368988
|
indices.csc.index.npy
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5b2f427f0d29bab6a4744d4d14cad7f6205617efb8f2381fa0e82664698e1f92
|
| 3 |
+
size 30368988
|
indptr.csc.index.npy
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e01faf6b39cfa30b9e13fb00c062117e7487e6a6e8b4539522a37f6b30591a52
|
| 3 |
+
size 559348
|
params.index.json
CHANGED
|
@@ -6,7 +6,7 @@
|
|
| 6 |
"idf_method": "lucene",
|
| 7 |
"dtype": "float32",
|
| 8 |
"int_dtype": "int32",
|
| 9 |
-
"num_docs":
|
| 10 |
-
"version": "0.2.
|
| 11 |
"backend": "numpy"
|
| 12 |
}
|
|
|
|
| 6 |
"idf_method": "lucene",
|
| 7 |
"dtype": "float32",
|
| 8 |
"int_dtype": "int32",
|
| 9 |
+
"num_docs": 750312,
|
| 10 |
+
"version": "0.2.7post1",
|
| 11 |
"backend": "numpy"
|
| 12 |
}
|
vocab.index.json
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d224cd8071210250b768abb4e03969c446bef1a2d382ad711325ad63c54af4c9
|
| 3 |
+
size 2283889
|