prdev commited on
Commit
812f5fd
·
verified ·
1 Parent(s): 7fbe6f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -69
README.md CHANGED
@@ -8654,12 +8654,11 @@ language:
8654
  ---
8655
 
8656
  # Mini-GTE
 
 
8657
 
8658
- This is a distillbert-based model trained from GTE-base. It can be used as a faster query encoder for the GTE series or as a standalone unit (MTEB scores are for standalone).
8659
 
8660
  ## Model Details
8661
-
8662
- ### Model Description
8663
  - **Model Type:** Sentence Transformer
8664
  - **Base model:** [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) <!-- at revision 12040accade4e8a0f71eabdb258fecc2e7e948be -->
8665
  - **Maximum Sequence Length:** 512 tokens
@@ -8667,20 +8666,24 @@ This is a distillbert-based model trained from GTE-base. It can be used as a fas
8667
  - **Similarity Function:** Cosine Similarity
8668
 
8669
  ## Usage
 
 
 
8670
 
8671
- ### Direct Usage (Sentence Transformers)
8672
-
8673
- First install the Sentence Transformers library:
8674
 
8675
  ```bash
8676
  pip install -U sentence-transformers
8677
  ```
 
 
8678
 
8679
- Then you can load this model and run inference.
8680
  ```python
8681
  from sentence_transformers import SentenceTransformer
8682
 
8683
- # Download from the 🤗 Hub
8684
  model = SentenceTransformer("sentence_transformers_model_id")
8685
  # Run inference
8686
  sentences = [
@@ -8689,54 +8692,14 @@ sentences = [
8689
  'He drove to the stadium.',
8690
  ]
8691
  embeddings = model.encode(sentences)
8692
- print(embeddings.shape)
8693
- # [3, 768]
8694
 
8695
- # Get the similarity scores for the embeddings
8696
  similarities = model.similarity(embeddings, embeddings)
8697
- print(similarities.shape)
8698
- # [3, 3]
8699
  ```
8700
 
8701
- <!--
8702
- ### Direct Usage (Transformers)
8703
-
8704
- <details><summary>Click to see the direct usage in Transformers</summary>
8705
-
8706
- </details>
8707
- -->
8708
-
8709
- <!--
8710
- ### Downstream Usage (Sentence Transformers)
8711
-
8712
- You can finetune this model on your own dataset.
8713
-
8714
- <details><summary>Click to expand</summary>
8715
-
8716
- </details>
8717
- -->
8718
-
8719
- <!--
8720
- ### Out-of-Scope Use
8721
-
8722
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
8723
- -->
8724
-
8725
- <!--
8726
- ## Bias, Risks and Limitations
8727
-
8728
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
8729
- -->
8730
-
8731
- <!--
8732
- ### Recommendations
8733
-
8734
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
8735
- -->
8736
-
8737
  ## Training Details
8738
-
8739
- ### Framework Versions
8740
  - Python: 3.10.12
8741
  - Sentence Transformers: 3.3.1
8742
  - Transformers: 4.48.0.dev0
@@ -8746,23 +8709,15 @@ You can finetune this model on your own dataset.
8746
  - Tokenizers: 0.21.0
8747
 
8748
  ## Citation
 
 
 
 
 
 
 
 
8749
 
8750
- ### BibTeX
8751
-
8752
- <!--
8753
- ## Glossary
8754
-
8755
- *Clearly define terms in order to be accessible across audiences.*
8756
- -->
8757
-
8758
- <!--
8759
- ## Model Card Authors
8760
-
8761
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
8762
- -->
8763
-
8764
- <!--
8765
- ## Model Card Contact
8766
 
8767
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
8768
- -->
 
8654
  ---
8655
 
8656
  # Mini-GTE
8657
+ ## Overview
8658
+ This is the first model developed by QTACK and serves as a proof of concept for our distillation approach! Built upon a distillbert-based architecture, Mini-GTE is distilled from GTE and designed for efficiency without sacrificing accuracy at only 66M parameters. As a standalone sentence transformer, it ranks 2nd on the MTEB classic leaderboard in the <100M parameter category and 63rd overall which makes it a strong choice for real-time query encoding, semantic search, and similarity tasks.
8659
 
 
8660
 
8661
  ## Model Details
 
 
8662
  - **Model Type:** Sentence Transformer
8663
  - **Base model:** [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) <!-- at revision 12040accade4e8a0f71eabdb258fecc2e7e948be -->
8664
  - **Maximum Sequence Length:** 512 tokens
 
8666
  - **Similarity Function:** Cosine Similarity
8667
 
8668
  ## Usage
8669
+ - Optimized for quick inference
8670
+ - Great at quickly generating high quality encodings
8671
+ - Easy to plug and play since it is distilled from GTE
8672
 
8673
+ ## Getting Started
8674
+ ### Installation
8675
+ Mini-GTE is built on the [Sentence Transformers](https://www.sbert.net/) framework. To install the required packages, run:
8676
 
8677
  ```bash
8678
  pip install -U sentence-transformers
8679
  ```
8680
+ ### Quick Start
8681
+ Here's a quick example to get you started:
8682
 
 
8683
  ```python
8684
  from sentence_transformers import SentenceTransformer
8685
 
8686
+ # Download directly from Hugging Face
8687
  model = SentenceTransformer("sentence_transformers_model_id")
8688
  # Run inference
8689
  sentences = [
 
8692
  'He drove to the stadium.',
8693
  ]
8694
  embeddings = model.encode(sentences)
8695
+ print(embeddings.shape) # Expected: [3, 768]
 
8696
 
8697
+ # Compute the similarity scores for the embeddings
8698
  similarities = model.similarity(embeddings, embeddings)
8699
+ print(similarities.shape) # Expected: [3, 3]
 
8700
  ```
8701
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8702
  ## Training Details
 
 
8703
  - Python: 3.10.12
8704
  - Sentence Transformers: 3.3.1
8705
  - Transformers: 4.48.0.dev0
 
8709
  - Tokenizers: 0.21.0
8710
 
8711
  ## Citation
8712
+ ```bibtex
8713
+ @misc{mini-gte2025,
8714
+ title={Mini-GTE: A Fast and Efficient Distilled Sentence Transformer},
8715
+ author={QTACK},
8716
+ year={2025},
8717
+ note={Available on the Hugging Face Hub}
8718
+ }
8719
+ ```
8720
 
8721
+ ## Getting Help
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8722
 
8723
+ For any questions, suggestions, or issues, please contact the QTACK team directly through our [contact page](https://www.qtack.com/contact).