--- language: [] library_name: sentence-transformers tags: - sentence-transformers - sentence-similarity - feature-extraction - dataset_size:10K<n<100K - loss:CosineSimilarityLoss base_model: sentence-transformers/all-mpnet-base-v2 widget: - source_sentence: Tech Writer sentences: - Tech Specialist - Architectural Historian - Order Selector (Picker) - source_sentence: Controller sentences: - Assistant Controller - Key Accounts Supervisor - Cosmetologist - Plus Tips - source_sentence: Accountant sentences: - Financial Accountant - Manager, Corporate Sales - Materials Sourcing Lead - source_sentence: Planner III sentences: - Strategist - Product Finance Manager - Materials Sourcing Lead - source_sentence: AP Analyst sentences: - AP Specialist - ESCO Service Coordinator - Boiler Tender Lead pipeline_tag: sentence-similarity --- # SentenceTransformer based on sentence-transformers/all-mpnet-base-v2 This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) <!-- at revision 84f2bcc00d77236f9e89c8a360a00fb1139bf47d --> - **Maximum Sequence Length:** 384 tokens - **Output Dimensionality:** 768 tokens - **Similarity Function:** Cosine Similarity <!-- - **Training Dataset:** Unknown --> <!-- - **Language:** Unknown --> <!-- - **License:** Unknown --> ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("JuanIgnacioSolerno/all-mpnet-base-v2-sts") # Run inference sentences = [ 'AP Analyst', 'AP Specialist', 'ESCO Service Coordinator', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` <!-- ### Direct Usage (Transformers) <details><summary>Click to see the direct usage in Transformers</summary> </details> --> <!-- ### Downstream Usage (Sentence Transformers) You can finetune this model on your own dataset. <details><summary>Click to expand</summary> </details> --> <!-- ### Out-of-Scope Use *List how the model may foreseeably be misused and address what users ought not to do with the model.* --> <!-- ## Bias, Risks and Limitations *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* --> <!-- ### Recommendations *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* --> ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 11,923 training samples * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code> * Approximate statistics based on the first 1000 samples: | | sentence1 | sentence2 | score | |:--------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------------------|:---------------------------------------------------------------| | type | string | string | float | | details | <ul><li>min: 3 tokens</li><li>mean: 7.17 tokens</li><li>max: 27 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 4.0 tokens</li><li>max: 4 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.04</li><li>max: 1.0</li></ul> | * Samples: | sentence1 | sentence2 | score | |:-----------------------------------------------------------------------------|:----------------------------|:-----------------| | <code>Land Coordinator, Renewable Development</code> | <code>Energy Analyst</code> | <code>0.0</code> | | <code>Customer Service Advocate - Remote within the state of Colorado</code> | <code>Energy Analyst</code> | <code>0.0</code> | | <code>Global Head of Infrastructure</code> | <code>Energy Analyst</code> | <code>0.0</code> | * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters: ```json { "loss_fct": "torch.nn.modules.loss.MSELoss" } ``` ### Evaluation Dataset #### Unnamed Dataset * Size: 2,981 evaluation samples * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code> * Approximate statistics based on the first 1000 samples: | | sentence1 | sentence2 | score | |:--------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------------------|:---------------------------------------------------------------| | type | string | string | float | | details | <ul><li>min: 3 tokens</li><li>mean: 7.21 tokens</li><li>max: 28 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 4.0 tokens</li><li>max: 4 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.05</li><li>max: 1.0</li></ul> | * Samples: | sentence1 | sentence2 | score | |:---------------------------------------------------------------------|:----------------------------|:-----------------| | <code>IT Data Coordinator - Customer Data & Integrations Team</code> | <code>Energy Analyst</code> | <code>0.0</code> | | <code>Warehouse Associate</code> | <code>Energy Analyst</code> | <code>0.0</code> | | <code>Human Resources Manager</code> | <code>Energy Analyst</code> | <code>0.0</code> | * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters: ```json { "loss_fct": "torch.nn.modules.loss.MSELoss" } ``` ### Framework Versions - Python: 3.10.14 - Sentence Transformers: 3.0.0 - Transformers: 4.41.2 - PyTorch: 2.0.0.post200 - Accelerate: 0.30.1 - Datasets: 2.19.1 - Tokenizers: 0.19.1 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` <!-- ## Glossary *Clearly define terms in order to be accessible across audiences.* --> <!-- ## Model Card Authors *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* --> <!-- ## Model Card Contact *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* -->