mavihsrr commited on
Commit
1b42c5c
·
verified ·
1 Parent(s): 6a6d840

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,585 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:174064
8
+ - loss:CosineSimilarityLoss
9
+ base_model: BAAI/bge-base-en
10
+ widget:
11
+ - source_sentence: Garlic Ganhiali Pickle. Description :Nutty Yogi Garlic Ganhiali
12
+ Pickle is hygienically processed from best quality gandhialo and garlic. This
13
+ traditional style pickle is a spicy and tasty complement to any meal. Its sour
14
+ and hot taste titillates the taste buds. This mouthwatering pickle is made from
15
+ gandhialo and garlic shreds, mixed with other spices and ingredients. It has natural
16
+ antioxidants, improving digestive health, providing vitamin C, and helps in generating
17
+ healthy gut-flora.!
18
+ sentences:
19
+ - Microwaveable Plastic Multiutility Bowl - Blue, New Coral, L2271 BL. Description
20
+ :It is made of high-quality food grade virgin plastic. These bowls come in beautiful
21
+ bright colours. Store fruits, dry fruits, snacks, biscuits etc. in these bowls.
22
+ These coral bowls are microwave safe, easy to clean and maintain.!
23
+ - Kung Pao Sauce. Description :Bechef Kung Pao Sauce is our take on Classic Sichuan
24
+ Chinese Dish of Kung Pao Chicken. This pure vegetarian sauce is a classic example
25
+ of Indo Chinese fusion where we have adapted the taste to Indian palate yet be
26
+ loyal to the original taste.!
27
+ - Popular Aluminium Outer Lid Pressure Cooker (10003). Description :The highly appreciated
28
+ Prestige Popular Plus comes with an induction base and host of other top-of-the-line
29
+ features. Manufactured from virgin Aluminium to ensure zero contamination, this
30
+ cooker boasts of the highest quality of raw materials used. Perfected over a period
31
+ of time with trademark Prestige engineering, expected the very best with up-to-date
32
+ innovations and features. Add to that, the elegant design and splendid finish,
33
+ the cooker is a sheer pleasure to cook with.Prestige Popular Plus base is Stainless
34
+ steel perforated (holes) plate, machine pressed to Aluminium cooker base. Suitable
35
+ for Induction cooktops and gas cooking, this versatile cooker provides you with
36
+ maximum utility.It is the first level of safety feature to release pressure above
37
+ 1kg/cm2, which makes the cooking safe and time saving for you. It is made up of
38
+ brass with steel coating on it for durability.This is the 2nd level of safety
39
+ provide in Prestige pressure cooker, in case there is any blockage of vent tube,
40
+ the gasket will bulge and steam will release through the hole at the top. Thus,
41
+ making it extremely safe for usage.It is the 3rd level of safety feature top fitted
42
+ to the lid, to release excess steam when pressure, rises beyond a safe level.
43
+ Thus, making it extremely safe to use.Benefits: More Economical.Faster Cooking.!
44
+ - source_sentence: Biscuits - Marie. Description :Bisk Farm Marie Biscuits are half
45
+ sweet in taste and are arranged using the best normal ingredients. These biscuits
46
+ are a wonderful mixture of wheat flour, vegetable oil, sugar. These Marie biscuits
47
+ are light and crunchy and are low on calories. Without these biscuits, the tea
48
+ feels unfinished.!
49
+ sentences:
50
+ - Dark Waffy Premium Vanilla Flavoured Choco Wafer. Description :Dukes Dark Waffy
51
+ Vanilla Flavoured Wafers is a premium wafer that is layered with a yummy, creamy
52
+ vanilla flavour with a whole new mix of a chocolate dark crunchy wafer. It comes
53
+ with a delightful aroma and a delicious taste to cherish about. It is a wonderful
54
+ snack for a hungry stomach during the day or while going on short journeys.!
55
+ - Organic Maai Ka Ladoo - Sugar-Free, Ultra-Low GI, No Preservatives. Description
56
+ :Known all over India as an irresistible classic, we gave this traditional sweet
57
+ a healthy upgrade! Expertly crafted with only the best organic dry fruits, every
58
+ bite of this Maai ka Laddu is generously scrumptious and entirely guilt-free!
59
+ Our goal is to bring you an unparalleled experience of a comforting homemade dessert.
60
+ Using selective no nasty and all nutritious ingredients, it leaves you feeling
61
+ healthy and satisfied! Apt for people on carbohydrate-controlled diets and with
62
+ diabetes, D-Alive's Maai ka Laddu is your new one-stop craving, Helps lower and
63
+ stabilise blood sugar levels, keeps you fuller for longer and aids weight management.
64
+ In spite of being a low Glycemic Index (GI) & Low Carb, our pride lies in the
65
+ taste of the product.  Using selective ‘no nasty’ organic, ultra-low GI, nutrient-dense,
66
+ slow-releasing ingredients, this superfood leaves you feeling satisfied!!
67
+ - 'Organic - Til/Ellu White. Description :Sesame is an important ingredient in cooking.
68
+ Sesame seeds give a rich delicate nutty flavour. Almost invisible crunch to your
69
+ dishes. Sesame seeds are excellent sources of copper. A very good source of manganese
70
+ calcium phosphorus magnesium iron zinc molybdenum vitamin B1 selenium. Although
71
+ much of its calorie comes from fats, sesame contains several notable health-benefiting
72
+ nutrients, minerals, antioxidants and vitamins.
73
+
74
+ The seeds are especially rich in mono-unsaturated fatty acid, oleic acid, which
75
+ comprises up to 50 percent of fatty acids in them. Oleic acid helps lower LDL
76
+ or "bad cholesterol" and increases HDL or "good cholesterol" in the blood. The
77
+ seeds are also very valuable sources of dietary protein with fine quality amino
78
+ acids that are essential for growth, especially in children.!'
79
+ - source_sentence: Amritsari Mutton Curry 130 g + Goan Chicken Cafreal 115 g. Description
80
+ :Amritsari Mutton Curry:Full of flavour, free of preservatives, our mutton curry
81
+ is a 100% homestyle recipe, wrapped in the flavours and textures of Amritsari
82
+ homes. This ready to cook, easy to use curry paste gets a sumptuous meal for 4,
83
+ ready in just 20 minutes. Feed your urge to cook something new today with tasty
84
+ tales.  Explore. Play. Create.Goan Chicken Cafreal:Full of flavour and free of
85
+ preservatives, our Chicken Cafreal is a native homestyle recipe, wrapped in the
86
+ flavours & textures of a Goan home. This ready to cook, easy to use curry paste
87
+ gets a sumptuous meal for 4, ready in just 20 minutes. Feed your urge to cook
88
+ something new today with Tasty Tales - Explore. Play. Create.!
89
+ sentences:
90
+ - Trendy Stainless Steel Bottle With Sipper Cap - Steel Matt Finish, PXP 1002 DQ.
91
+ Description :Now free your environment, and yourself from the unhealthy plastic
92
+ bottles and get a healthier one-time product for all your needs. These high-grade
93
+ stainless steel bottles are here to enhance your dining and travelling experience,
94
+ saving you from the negative effects of plastic. The single-walled steel bottles
95
+ are perfect add-ons to your kitchen collection if you are looking for light-weighed,
96
+ durable, classy looking product. The bottle comes with sipper & wide mouth steel
97
+ cap, catering to double usage. Be it going to the gym, or sending it with the
98
+ kids to school, the colourful sipper can always make it a very convenient, handy
99
+ and more importantly a style-statement product. You can take it to the office
100
+ or just keep on the dinner table. Open the wide mouth lid and use at ease. The
101
+ bottles come with the major USP of inter-changeable lid facility. Now you can
102
+ make the same steel cap bottle as a sipper bottle by just interchanging the lids.
103
+ Hence, get 2 of two-in-one featured bottles of same model and capacities and get
104
+ the best of both, with a variety of colours!!
105
+ - Round Plastic Container - Black. Description :These storage containers are made
106
+ from high-quality plastic for everyday use. It seals the food effectively and
107
+ has an easily stackable design for smart storage. The food-grade quality of these
108
+ multipurpose airtight storage containers, with attractive design, make them hygienic
109
+ for use. They are freezer safe without lid and have a strong and durable body
110
+ for longevity.!
111
+ - Stainless Steel Lunch Box/Tiffin Set - Blue, BB 575 2. Description :Easily pack
112
+ lunch for your loved ones with this blue lunch box by Tedemei. Made of high-quality
113
+ stainless steel, the lunch box is sturdy, durable, and easy to clean. The lunch
114
+ box is airtight which helps in keeping the food fresh for long. It is a single
115
+ layer lunch box which lets you pack solid and liquid food separately. The lunch
116
+ box features flap and lock design which makes it easy to open and carry the lunch
117
+ box. The modish looking lunch boxes catches the eye.!
118
+ - source_sentence: New Extra Large. Description :Pampers baby diaper pants are the
119
+ only pants in India with new air channels providing your baby with a new type
120
+ of dryness overnight; breathable dryness. Magic Gel that locks the wetness away
121
+ for up to 12 hours of dryness. The new and improved product design enables a comfortable
122
+ fit, closer to the baby's body.A flexible waistband that adapts to the baby's
123
+ movements for a comfortable fit. Baby lotion with aloe vera helps protect your
124
+ baby's delicate skin from diaper rash and irritation. A top layer with the cotton-like
125
+ soft material, for a comfortable nights sleep. Fun exterior graphics; fun designs
126
+ and characters to enjoy with your baby.!
127
+ sentences:
128
+ - 'Nature''s Super Foods Organic Chana Dal. Description :Aashirvaad introduces an
129
+ organic certified range of Nature''s Super Foods, Organic Chana Dal. Chana Dal
130
+ is one of the most loved Indian food. It is used in multiple cultural foods across
131
+ the nation. It has proteins, carbohydrate, good fibre, Iron and vitamins to enrich
132
+ your daily intake of essential nutrients. Chana dal tastes like kernels of small
133
+ corns, it is light and easily digestible. 
134
+
135
+ Aashirvaad organic products are sourced and packed hygienically to ensure you
136
+ get the best taste and nutrition with its premium quality products.!'
137
+ - Spanish Olives - Pitted Green. Description :Fragata Pitted Green Olives are cured
138
+ or pickled and are well-known for their rich and mouth-watering flavoring as well
139
+ as for their dietary benefit. Olives include abundant antioxidants identified
140
+ as polyphones.!
141
+ - Surface Cleaner - Jasmine. Description :Special formulation kills maximum germs,
142
+ leaves a pleasant aroma as well as removes toughest stains!
143
+ - source_sentence: Extra Virgin Coconut Oil. Description :This cold pressed, pure,
144
+ natural, extra virgin coconut oil because of its high saturated fat content, it
145
+ is slow to oxidize and, thus, resistant to rancidification, lasting up to six
146
+ months at 24 DegreeC without spoiling. This is the purest form of coconut oil,
147
+ which retains all of its goodness.!
148
+ sentences:
149
+ - Wax Candles - Metal, Smokeless, White, CD 05. Description :Enrich the ambience
150
+ of the place as you place these captivating looking tealight candles. These are
151
+ white in colour and round in shape. They are filled with wax and wick inside.
152
+ It is suitable for decorating the house during the festive occasions and parties.
153
+ These are smokeless candles that do not leave any soot residue behind. Also, the
154
+ tealight candles burn fully without damaging your furniture or floor. Also, it
155
+ has 25 pieces.!
156
+ - Pearl - Skin Whitening Facial Kit. Description :With Organic Harvests 30 Minute
157
+ Makeover Pearl-Skin Whitening Facial Kit, you achieve quick salon-like results
158
+ at home, without spending your precious time in the salon. It comes with an assurance
159
+ that only certified organic ingredients come in contact with your skin. 30 minutes,
160
+ and you will feel that your skin is lily-white.!
161
+ - Rubber Gloves - Cotton Lined, Soft & Non Slip, Medium. Description :The Super
162
+ Strong Elbow Grease Rubber Gloves in large protect the hands from bacteria and
163
+ chemicals during cleaning tasks, ideal for dishwashing, scrubbing task and using
164
+ harmful chemicals. These high-quality designs are cotton lined, soft & non-slip
165
+ gloves for ease to use. The Elbow Grease Rubber Gloves are the only gloves you
166
+ will ever need. Great for domestic or commercial cleaning purpose.!
167
+ pipeline_tag: sentence-similarity
168
+ library_name: sentence-transformers
169
+ ---
170
+
171
+ # SentenceTransformer based on BAAI/bge-base-en
172
+
173
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
174
+
175
+ ## Model Details
176
+
177
+ ### Model Description
178
+ - **Model Type:** Sentence Transformer
179
+ - **Base model:** [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en) <!-- at revision b737bf5dcc6ee8bdc530531266b4804a5d77b5d8 -->
180
+ - **Maximum Sequence Length:** 512 tokens
181
+ - **Output Dimensionality:** 768 dimensions
182
+ - **Similarity Function:** Cosine Similarity
183
+ <!-- - **Training Dataset:** Unknown -->
184
+ <!-- - **Language:** Unknown -->
185
+ <!-- - **License:** Unknown -->
186
+
187
+ ### Model Sources
188
+
189
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
190
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
191
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
192
+
193
+ ### Full Model Architecture
194
+
195
+ ```
196
+ SentenceTransformer(
197
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
198
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
199
+ (2): Normalize()
200
+ )
201
+ ```
202
+
203
+ ## Usage
204
+
205
+ ### Direct Usage (Sentence Transformers)
206
+
207
+ First install the Sentence Transformers library:
208
+
209
+ ```bash
210
+ pip install -U sentence-transformers
211
+ ```
212
+
213
+ Then you can load this model and run inference.
214
+ ```python
215
+ from sentence_transformers import SentenceTransformer
216
+
217
+ # Download from the 🤗 Hub
218
+ model = SentenceTransformer("mavihsrr/bge-final-small-retail-v2")
219
+ # Run inference
220
+ sentences = [
221
+ 'Extra Virgin Coconut Oil. Description :This cold pressed, pure, natural, extra virgin coconut oil because of its high saturated fat content, it is slow to oxidize and, thus, resistant to rancidification, lasting up to six months at 24 DegreeC without spoiling. This is the purest form of coconut oil, which retains all of its goodness.!',
222
+ 'Rubber Gloves - Cotton Lined, Soft & Non Slip, Medium. Description :The Super Strong Elbow Grease Rubber Gloves in large protect the hands from bacteria and chemicals during cleaning tasks, ideal for dishwashing, scrubbing task and using harmful chemicals. These high-quality designs are cotton lined, soft & non-slip gloves for ease to use. The Elbow Grease Rubber Gloves are the only gloves you will ever need. Great for domestic or commercial cleaning purpose.!',
223
+ 'Wax Candles - Metal, Smokeless, White, CD 05. Description :Enrich the ambience of the place as you place these captivating looking tealight candles. These are white in colour and round in shape. They are filled with wax and wick inside. It is suitable for decorating the house during the festive occasions and parties. These are smokeless candles that do not leave any soot residue behind. Also, the tealight candles burn fully without damaging your furniture or floor. Also, it has 25 pieces.!',
224
+ ]
225
+ embeddings = model.encode(sentences)
226
+ print(embeddings.shape)
227
+ # [3, 768]
228
+
229
+ # Get the similarity scores for the embeddings
230
+ similarities = model.similarity(embeddings, embeddings)
231
+ print(similarities.shape)
232
+ # [3, 3]
233
+ ```
234
+
235
+ <!--
236
+ ### Direct Usage (Transformers)
237
+
238
+ <details><summary>Click to see the direct usage in Transformers</summary>
239
+
240
+ </details>
241
+ -->
242
+
243
+ <!--
244
+ ### Downstream Usage (Sentence Transformers)
245
+
246
+ You can finetune this model on your own dataset.
247
+
248
+ <details><summary>Click to expand</summary>
249
+
250
+ </details>
251
+ -->
252
+
253
+ <!--
254
+ ### Out-of-Scope Use
255
+
256
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
257
+ -->
258
+
259
+ <!--
260
+ ## Bias, Risks and Limitations
261
+
262
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
263
+ -->
264
+
265
+ <!--
266
+ ### Recommendations
267
+
268
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
269
+ -->
270
+
271
+ ## Training Details
272
+
273
+ ### Training Dataset
274
+
275
+ #### Unnamed Dataset
276
+
277
+
278
+ * Size: 174,064 training samples
279
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
280
+ * Approximate statistics based on the first 1000 samples:
281
+ | | sentence1 | sentence2 | score |
282
+ |:--------|:-------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------|
283
+ | type | string | string | float |
284
+ | details | <ul><li>min: 12 tokens</li><li>mean: 116.54 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 111.22 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 0.1</li><li>mean: 0.64</li><li>max: 0.97</li></ul> |
285
+ * Samples:
286
+ | sentence1 | sentence2 | score |
287
+ |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------|
288
+ | <code>Oil Clear Mud Face Pack. Description :Himalaya Oil Clear Mud Face Pack Rejuvenate your dead skin with Himalaya Oil Clear Mud Face Pack. This herbal formulation deep cleanses facial skin and clears clogged pores by absorbing excess oil and removing impurities. It helps maintain the natural pH of the skin and has deep cleansing and detoxifying properties, leaving the skin cleansed and revitalized. Fullers Earth removes deep-seated dirt and pollutants. It absorbs oil, clears clogged pores and blemishes and helps remove dead skin. Fullers Earth also helps lighten tanned skin caused by UV rays.!</code> | <code>Pure White Mineral Clay Anti Pollution Purity Face Wash Foam. Description :Giving your skin an oil overhaul doesn't have to be a drag. Oil stuck in your pores is what makes your skin feel oily again after a wash! POND'S Clay Foam and Mask is the most fun way to say goodbye oil and hello to an all-day matte glow. Made with 100% natural Moroccan clay that has 4x oil absorption power, it sucks out dirt and oil stuck deep within your pores. What's left behind? Skin that's glowing and matte all day long! Pond's Clay Foam is the most enjoyable and effective way to keep your skin oil-free for longer. Revolutionise face washing with the enriching power of Mineral Clay. One of the most efficacious ingredients in deep cleansing. Its enriched with skin-loving minerals to give you a bouncy glow. So, step up your deep cleansing regimen for an oil-free glow. The clay range comes in two exciting formats. The Pond's white beauty mineral clay foam brightens and smoothens your skin for an oil-free glow!...</code> | <code>0.9511584211850151</code> |
289
+ | <code>Essence - Butter Scotch. Description :Concentrate Butterscotch Essence For Sauces, Desserts, Baking And Cakes.Butterscotch Adds A Luscious Flavor Note To Mochas, Lattes And Other Hot, Frozen And Chilled Drinks.!</code> | <code>product<br>Icing Sugar Icing Sugar. Description :Icing Sugar is finel...<br>Icing Sugar Icing Sugar. Description :This finely granulat...<br>Name: combined, dtype: object</code> | <code>0.9643093974992689</code> |
290
+ | <code>Marie Light Biscuit - Vita Orange. Description :Sunfeast Marie Light orange offer crisp & light biscuits completed with the choicest golden grains of sun-ripened oranges and wheat. It presents the only Marie biscuit in India with a stimulating, delicious orange flavour. Whats more, there is 0% transfat and 0% cholesterol making it an appetisingly vigorous biscuit.!</code> | <code>Premium Wafer Bites - Dark Choco 100 g + Strawberry 100 g + Tiramisu 100 g. Description :Tasties brings you the Delicious Creamy & Crunchy Wafer Bites. Indulge in the taste of 5 wafers and 4 cream layered mini wafer bites with mouth-melting dark chocolate filling.<br>Tasties brings you the Delicious Creamy & Crunchy Wafer Bites. Indulge in the taste of 5 wafers and 4 cream layered mini wafer bites with mouth-melting strawberry filling.<br>Tasties brings you the Delicious Creamy & Crunchy Wafer Bites. Indulge in the taste of 5 wafers and 4 cream layered mini wafer bites with mouth Tiramisu hazelnut filling.<br><br>Munch on this and say bye to your small hunger pangs.!</code> | <code>0.8838966912863657</code> |
291
+ * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
292
+ ```json
293
+ {
294
+ "loss_fct": "torch.nn.modules.loss.MSELoss"
295
+ }
296
+ ```
297
+
298
+ ### Evaluation Dataset
299
+
300
+ #### Unnamed Dataset
301
+
302
+
303
+ * Size: 21,759 evaluation samples
304
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
305
+ * Approximate statistics based on the first 1000 samples:
306
+ | | sentence1 | sentence2 | score |
307
+ |:--------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:----------------------------------------------------------------|
308
+ | type | string | string | float |
309
+ | details | <ul><li>min: 10 tokens</li><li>mean: 121.44 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 112.13 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 0.1</li><li>mean: 0.61</li><li>max: 0.97</li></ul> |
310
+ * Samples:
311
+ | sentence1 | sentence2 | score |
312
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------|
313
+ | <code>Rose Glycerin Soap For Clean & Refreshed Skin -Cold Processed, 100% Natural & Organic. Description :Feel fresh, clean and refreshed with Rose which will leave your skin delicately scented with an uplifting rose fragrance. This soap does not use any animal product like milk or honey. It is completely vegan. Rose helps to improve the skin's appearance and to perfume the skin. It contains glycerin that softens the skin. It has no added preservatives and SLS and is 100% natural herbs and essential oils. It is a vegan product and is SLS free.!</code> | <code>product<br>Relax Moisturising Hand Wash - Lavender & Ylang-Ylang Relax Moisturising Hand Wash - Lavender & Ylan...<br>Relax Moisturising Hand Wash - Lavender & Ylang-Ylang Relax Moisturising Hand Wash - Lavender & Ylan...<br>Name: combined, dtype: object</code> | <code>0.9641479761938232</code> |
314
+ | <code>Dog Food - Focus Starter, Super Premium. Description :The Drools Focus, Super premium all breed formula for Puppies is formulated with the finest natural ingredients to help your dog live a long and healthy life. The result of exhaustive scientific research carried out over the years, by some of the most experienced veterinarians and nutritionists. Just like the rest of the Drools products, this one too is manufactured with a keen eye for detail and utmost care at Asias largest and most modern plant.!</code> | <code>Erina - Coat Cleanser. Description :Action : Dandruff control : Erina prevents the formation of dandruff on your pets skin and hair coat. Antimicrobial : Its antiseptic and antibacterial cleansing eliminates germs and improves overall skin hygiene. Erina protects the body against commonly found pathogens that cause itching and bacterial infections. Deodorant : Erinas deodorizing properties eliminate foul odor. Indications : For controlling dandruff in the hair coat. Prevention and management of pruritus (itching) and pyoder(superficial bacterial infection). Used in routine bathing as a cleanser to maintain a healthy coat.!</code> | <code>0.9112330093194662</code> |
315
+ | <code>Fruit & Food Nibbler With Silicone Sack - Green. Description :Introducing new foods to your babys diet can be a fun learning experience as it provides him or her with new varying tastes and flavours. With Mee Mee fruit and food nibbler, your child can safely enjoy fruit and other kinds of whole foods, without the risk of choking or hurting his or her mouth.!</code> | <code>Trendy Stainless Steel Bottle With Sipper Cap - Steel Matt Finish, PXP 1002 DQ. Description :Now free your environment, and yourself from the unhealthy plastic bottles and get a healthier one-time product for all your needs. These high-grade stainless steel bottles are here to enhance your dining and travelling experience, saving you from the negative effects of plastic. The single-walled steel bottles are perfect add-ons to your kitchen collection if you are looking for light-weighed, durable, classy looking product. The bottle comes with sipper & wide mouth steel cap, catering to double usage. Be it going to the gym, or sending it with the kids to school, the colourful sipper can always make it a very convenient, handy and more importantly a style-statement product. You can take it to the office or just keep on the dinner table. Open the wide mouth lid and use at ease. The bottles come with the major USP of inter-changeable lid facility. Now you can make the same steel cap bottle as ...</code> | <code>0.14806349984585232</code> |
316
+ * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
317
+ ```json
318
+ {
319
+ "loss_fct": "torch.nn.modules.loss.MSELoss"
320
+ }
321
+ ```
322
+
323
+ ### Training Hyperparameters
324
+ #### Non-Default Hyperparameters
325
+
326
+ - `eval_strategy`: steps
327
+ - `per_device_train_batch_size`: 64
328
+ - `per_device_eval_batch_size`: 64
329
+ - `learning_rate`: 2e-05
330
+ - `warmup_ratio`: 0.1
331
+ - `bf16`: True
332
+
333
+ #### All Hyperparameters
334
+ <details><summary>Click to expand</summary>
335
+
336
+ - `overwrite_output_dir`: False
337
+ - `do_predict`: False
338
+ - `eval_strategy`: steps
339
+ - `prediction_loss_only`: True
340
+ - `per_device_train_batch_size`: 64
341
+ - `per_device_eval_batch_size`: 64
342
+ - `per_gpu_train_batch_size`: None
343
+ - `per_gpu_eval_batch_size`: None
344
+ - `gradient_accumulation_steps`: 1
345
+ - `eval_accumulation_steps`: None
346
+ - `torch_empty_cache_steps`: None
347
+ - `learning_rate`: 2e-05
348
+ - `weight_decay`: 0.0
349
+ - `adam_beta1`: 0.9
350
+ - `adam_beta2`: 0.999
351
+ - `adam_epsilon`: 1e-08
352
+ - `max_grad_norm`: 1.0
353
+ - `num_train_epochs`: 3
354
+ - `max_steps`: -1
355
+ - `lr_scheduler_type`: linear
356
+ - `lr_scheduler_kwargs`: {}
357
+ - `warmup_ratio`: 0.1
358
+ - `warmup_steps`: 0
359
+ - `log_level`: passive
360
+ - `log_level_replica`: warning
361
+ - `log_on_each_node`: True
362
+ - `logging_nan_inf_filter`: True
363
+ - `save_safetensors`: True
364
+ - `save_on_each_node`: False
365
+ - `save_only_model`: False
366
+ - `restore_callback_states_from_checkpoint`: False
367
+ - `no_cuda`: False
368
+ - `use_cpu`: False
369
+ - `use_mps_device`: False
370
+ - `seed`: 42
371
+ - `data_seed`: None
372
+ - `jit_mode_eval`: False
373
+ - `use_ipex`: False
374
+ - `bf16`: True
375
+ - `fp16`: False
376
+ - `fp16_opt_level`: O1
377
+ - `half_precision_backend`: auto
378
+ - `bf16_full_eval`: False
379
+ - `fp16_full_eval`: False
380
+ - `tf32`: None
381
+ - `local_rank`: 0
382
+ - `ddp_backend`: None
383
+ - `tpu_num_cores`: None
384
+ - `tpu_metrics_debug`: False
385
+ - `debug`: []
386
+ - `dataloader_drop_last`: False
387
+ - `dataloader_num_workers`: 0
388
+ - `dataloader_prefetch_factor`: None
389
+ - `past_index`: -1
390
+ - `disable_tqdm`: False
391
+ - `remove_unused_columns`: True
392
+ - `label_names`: None
393
+ - `load_best_model_at_end`: False
394
+ - `ignore_data_skip`: False
395
+ - `fsdp`: []
396
+ - `fsdp_min_num_params`: 0
397
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
398
+ - `fsdp_transformer_layer_cls_to_wrap`: None
399
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
400
+ - `deepspeed`: None
401
+ - `label_smoothing_factor`: 0.0
402
+ - `optim`: adamw_torch
403
+ - `optim_args`: None
404
+ - `adafactor`: False
405
+ - `group_by_length`: False
406
+ - `length_column_name`: length
407
+ - `ddp_find_unused_parameters`: None
408
+ - `ddp_bucket_cap_mb`: None
409
+ - `ddp_broadcast_buffers`: False
410
+ - `dataloader_pin_memory`: True
411
+ - `dataloader_persistent_workers`: False
412
+ - `skip_memory_metrics`: True
413
+ - `use_legacy_prediction_loop`: False
414
+ - `push_to_hub`: False
415
+ - `resume_from_checkpoint`: None
416
+ - `hub_model_id`: None
417
+ - `hub_strategy`: every_save
418
+ - `hub_private_repo`: None
419
+ - `hub_always_push`: False
420
+ - `gradient_checkpointing`: False
421
+ - `gradient_checkpointing_kwargs`: None
422
+ - `include_inputs_for_metrics`: False
423
+ - `include_for_metrics`: []
424
+ - `eval_do_concat_batches`: True
425
+ - `fp16_backend`: auto
426
+ - `push_to_hub_model_id`: None
427
+ - `push_to_hub_organization`: None
428
+ - `mp_parameters`:
429
+ - `auto_find_batch_size`: False
430
+ - `full_determinism`: False
431
+ - `torchdynamo`: None
432
+ - `ray_scope`: last
433
+ - `ddp_timeout`: 1800
434
+ - `torch_compile`: False
435
+ - `torch_compile_backend`: None
436
+ - `torch_compile_mode`: None
437
+ - `dispatch_batches`: None
438
+ - `split_batches`: None
439
+ - `include_tokens_per_second`: False
440
+ - `include_num_input_tokens_seen`: False
441
+ - `neftune_noise_alpha`: None
442
+ - `optim_target_modules`: None
443
+ - `batch_eval_metrics`: False
444
+ - `eval_on_start`: False
445
+ - `use_liger_kernel`: False
446
+ - `eval_use_gather_object`: False
447
+ - `average_tokens_across_devices`: False
448
+ - `prompts`: None
449
+ - `batch_sampler`: batch_sampler
450
+ - `multi_dataset_batch_sampler`: proportional
451
+
452
+ </details>
453
+
454
+ ### Training Logs
455
+ | Epoch | Step | Training Loss | Validation Loss |
456
+ |:------:|:----:|:-------------:|:---------------:|
457
+ | 0.0460 | 500 | 0.1008 | - |
458
+ | 0.0092 | 100 | 0.0515 | 0.0453 |
459
+ | 0.0184 | 200 | 0.0532 | - |
460
+ | 0.0368 | 100 | 0.0491 | 0.0393 |
461
+ | 0.0735 | 200 | 0.0427 | 0.0333 |
462
+ | 0.1103 | 300 | 0.0373 | 0.0257 |
463
+ | 0.1471 | 400 | 0.0294 | 0.0188 |
464
+ | 0.1838 | 500 | 0.0212 | 0.0169 |
465
+ | 0.2206 | 600 | 0.0174 | 0.0131 |
466
+ | 0.2574 | 700 | 0.0145 | 0.0123 |
467
+ | 0.2941 | 800 | 0.0125 | 0.0094 |
468
+ | 0.3309 | 900 | 0.0109 | 0.0103 |
469
+ | 0.3676 | 1000 | 0.0102 | 0.0086 |
470
+ | 0.4044 | 1100 | 0.0075 | 0.0088 |
471
+ | 0.4412 | 1200 | 0.0077 | 0.0076 |
472
+ | 0.4779 | 1300 | 0.0071 | 0.0070 |
473
+ | 0.5147 | 1400 | 0.007 | 0.0072 |
474
+ | 0.5515 | 1500 | 0.0065 | 0.0068 |
475
+ | 0.5882 | 1600 | 0.0058 | 0.0073 |
476
+ | 0.625 | 1700 | 0.0064 | 0.0075 |
477
+ | 0.6618 | 1800 | 0.0057 | 0.0062 |
478
+ | 0.6985 | 1900 | 0.0055 | 0.0060 |
479
+ | 0.7353 | 2000 | 0.0054 | 0.0071 |
480
+ | 0.7721 | 2100 | 0.0055 | 0.0062 |
481
+ | 0.8088 | 2200 | 0.005 | 0.0065 |
482
+ | 0.8456 | 2300 | 0.0064 | 0.0061 |
483
+ | 0.8824 | 2400 | 0.0046 | 0.0056 |
484
+ | 0.9191 | 2500 | 0.0045 | 0.0051 |
485
+ | 0.9559 | 2600 | 0.0042 | 0.0051 |
486
+ | 0.9926 | 2700 | 0.0046 | 0.0055 |
487
+ | 1.0294 | 2800 | 0.0041 | 0.0053 |
488
+ | 1.0662 | 2900 | 0.005 | 0.0057 |
489
+ | 1.1029 | 3000 | 0.0033 | 0.0055 |
490
+ | 1.1397 | 3100 | 0.0037 | 0.0054 |
491
+ | 1.1765 | 3200 | 0.004 | 0.0052 |
492
+ | 1.2132 | 3300 | 0.0038 | 0.0049 |
493
+ | 1.25 | 3400 | 0.0038 | 0.0047 |
494
+ | 1.2868 | 3500 | 0.0035 | 0.0052 |
495
+ | 1.3235 | 3600 | 0.0034 | 0.0048 |
496
+ | 1.3603 | 3700 | 0.0035 | 0.0049 |
497
+ | 1.3971 | 3800 | 0.0034 | 0.0045 |
498
+ | 1.4338 | 3900 | 0.0037 | 0.0048 |
499
+ | 1.4706 | 4000 | 0.0036 | 0.0047 |
500
+ | 1.5074 | 4100 | 0.0031 | 0.0046 |
501
+ | 1.5441 | 4200 | 0.0039 | 0.0045 |
502
+ | 1.5809 | 4300 | 0.0033 | 0.0046 |
503
+ | 1.6176 | 4400 | 0.0033 | 0.0047 |
504
+ | 1.6544 | 4500 | 0.0035 | 0.0047 |
505
+ | 1.6912 | 4600 | 0.0029 | 0.0047 |
506
+ | 1.7279 | 4700 | 0.0035 | 0.0046 |
507
+ | 1.7647 | 4800 | 0.0033 | 0.0046 |
508
+ | 1.8015 | 4900 | 0.003 | 0.0046 |
509
+ | 1.8382 | 5000 | 0.0027 | 0.0045 |
510
+ | 1.875 | 5100 | 0.003 | 0.0043 |
511
+ | 1.9118 | 5200 | 0.0031 | 0.0046 |
512
+ | 1.9485 | 5300 | 0.0029 | 0.0045 |
513
+ | 1.9853 | 5400 | 0.003 | 0.0044 |
514
+ | 2.0221 | 5500 | 0.0031 | 0.0044 |
515
+ | 2.0588 | 5600 | 0.0028 | 0.0044 |
516
+ | 2.0956 | 5700 | 0.0032 | 0.0044 |
517
+ | 2.1324 | 5800 | 0.0027 | 0.0043 |
518
+ | 2.1691 | 5900 | 0.0032 | 0.0043 |
519
+ | 2.2059 | 6000 | 0.0029 | 0.0043 |
520
+ | 2.2426 | 6100 | 0.0028 | 0.0043 |
521
+ | 2.2794 | 6200 | 0.0028 | 0.0045 |
522
+ | 2.3162 | 6300 | 0.0032 | 0.0043 |
523
+ | 2.3529 | 6400 | 0.0026 | 0.0043 |
524
+ | 2.3897 | 6500 | 0.0026 | 0.0043 |
525
+ | 2.4265 | 6600 | 0.0024 | 0.0044 |
526
+ | 2.4632 | 6700 | 0.0024 | 0.0042 |
527
+ | 2.5 | 6800 | 0.0028 | 0.0043 |
528
+ | 2.5368 | 6900 | 0.0026 | 0.0043 |
529
+ | 2.5735 | 7000 | 0.0028 | 0.0042 |
530
+ | 2.6103 | 7100 | 0.0024 | 0.0043 |
531
+ | 2.6471 | 7200 | 0.0023 | 0.0042 |
532
+ | 2.6838 | 7300 | 0.0027 | 0.0041 |
533
+ | 2.7206 | 7400 | 0.0024 | 0.0041 |
534
+ | 2.7574 | 7500 | 0.003 | 0.0041 |
535
+ | 2.7941 | 7600 | 0.003 | 0.0041 |
536
+ | 2.8309 | 7700 | 0.0028 | 0.0041 |
537
+ | 2.8676 | 7800 | 0.0029 | 0.0041 |
538
+ | 2.9044 | 7900 | 0.0026 | 0.0041 |
539
+ | 2.9412 | 8000 | 0.0022 | 0.0041 |
540
+ | 2.9779 | 8100 | 0.0023 | 0.0041 |
541
+
542
+
543
+ ### Framework Versions
544
+ - Python: 3.10.12
545
+ - Sentence Transformers: 3.3.1
546
+ - Transformers: 4.47.1
547
+ - PyTorch: 2.1.0+cu118
548
+ - Accelerate: 1.2.1
549
+ - Datasets: 3.2.0
550
+ - Tokenizers: 0.21.0
551
+
552
+ ## Citation
553
+
554
+ ### BibTeX
555
+
556
+ #### Sentence Transformers
557
+ ```bibtex
558
+ @inproceedings{reimers-2019-sentence-bert,
559
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
560
+ author = "Reimers, Nils and Gurevych, Iryna",
561
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
562
+ month = "11",
563
+ year = "2019",
564
+ publisher = "Association for Computational Linguistics",
565
+ url = "https://arxiv.org/abs/1908.10084",
566
+ }
567
+ ```
568
+
569
+ <!--
570
+ ## Glossary
571
+
572
+ *Clearly define terms in order to be accessible across audiences.*
573
+ -->
574
+
575
+ <!--
576
+ ## Model Card Authors
577
+
578
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
579
+ -->
580
+
581
+ <!--
582
+ ## Model Card Contact
583
+
584
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
585
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.47.1",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.1",
4
+ "transformers": "4.47.1",
5
+ "pytorch": "2.1.0+cu118"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:516ca4b0f6e9558ce7fa41ae376a06b61a452f0f97f23a476a6d14087cd51da1
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff