Update BM25S model
Browse files- README.md +6 -6
- corpus.jsonl +2 -2
- corpus.mmindex.json +0 -0
- data.csc.index.npy +2 -2
- indices.csc.index.npy +2 -2
- indptr.csc.index.npy +2 -2
- params.index.json +2 -2
- vocab.index.json +2 -2
README.md
CHANGED
@@ -11,7 +11,7 @@ tags:
|
|
11 |
|
12 |
# BM25S Index
|
13 |
|
14 |
-
This is a BM25S index created with the [`bm25s` library](https://github.com/xhluca/bm25s) (version `0.2.
|
15 |
|
16 |
BM25S Related Links:
|
17 |
|
@@ -26,10 +26,10 @@ BM25S Related Links:
|
|
26 |
You can install the `bm25s` library with `pip`:
|
27 |
|
28 |
```bash
|
29 |
-
pip install "bm25s==0.2.
|
30 |
|
31 |
# Include extra dependencies like stemmer
|
32 |
-
pip install "bm25s[full]==0.2.
|
33 |
|
34 |
# For huggingface hub usage
|
35 |
pip install huggingface_hub
|
@@ -123,9 +123,9 @@ This dataset was created using the following data:
|
|
123 |
|
124 |
| Statistic | Value |
|
125 |
| --- | --- |
|
126 |
-
| Number of documents |
|
127 |
-
| Number of tokens |
|
128 |
-
| Average tokens per document |
|
129 |
|
130 |
## Parameters
|
131 |
|
|
|
11 |
|
12 |
# BM25S Index
|
13 |
|
14 |
+
This is a BM25S index created with the [`bm25s` library](https://github.com/xhluca/bm25s) (version `0.2.6`), an ultra-fast implementation of BM25. It can be used for lexical retrieval tasks.
|
15 |
|
16 |
BM25S Related Links:
|
17 |
|
|
|
26 |
You can install the `bm25s` library with `pip`:
|
27 |
|
28 |
```bash
|
29 |
+
pip install "bm25s==0.2.6"
|
30 |
|
31 |
# Include extra dependencies like stemmer
|
32 |
+
pip install "bm25s[full]==0.2.6"
|
33 |
|
34 |
# For huggingface hub usage
|
35 |
pip install huggingface_hub
|
|
|
123 |
|
124 |
| Statistic | Value |
|
125 |
| --- | --- |
|
126 |
+
| Number of documents | 210801 |
|
127 |
+
| Number of tokens | 1247449 |
|
128 |
+
| Average tokens per document | 5.92 |
|
129 |
|
130 |
## Parameters
|
131 |
|
corpus.jsonl
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:34a38e542a64bfd8e7e4f75c31ec7f9cae03b26196701e40904bc580cd971150
|
3 |
+
size 14491479
|
corpus.mmindex.json
CHANGED
The diff for this file is too large to render.
See raw diff
|
|
data.csc.index.npy
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:99d46b75c6d39fd6b858be4b194d6385f77073896e33955d707086366a09801c
|
3 |
+
size 4989924
|
indices.csc.index.npy
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6d615201000cc91e6961a8b2a102d79cffbfa87610645fbfe414753bcc592d6e
|
3 |
+
size 4989924
|
indptr.csc.index.npy
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ee295a3f6f37a7988013298b018103c98aef8d5861590e4d87391285d40d266c
|
3 |
+
size 248616
|
params.index.json
CHANGED
@@ -6,7 +6,7 @@
|
|
6 |
"idf_method": "lucene",
|
7 |
"dtype": "float32",
|
8 |
"int_dtype": "int32",
|
9 |
-
"num_docs":
|
10 |
-
"version": "0.2.
|
11 |
"backend": "numpy"
|
12 |
}
|
|
|
6 |
"idf_method": "lucene",
|
7 |
"dtype": "float32",
|
8 |
"int_dtype": "int32",
|
9 |
+
"num_docs": 210801,
|
10 |
+
"version": "0.2.6",
|
11 |
"backend": "numpy"
|
12 |
}
|
vocab.index.json
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:03d5613c6c043f095d746de405aab145523713820d2756469e919e871e02f5e4
|
3 |
+
size 995887
|