|
--- |
|
base_model: sentence-transformers/all-mpnet-base-v2 |
|
library_name: sentence-transformers |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- sentence-transformers |
|
- sentence-similarity |
|
- feature-extraction |
|
- generated_from_trainer |
|
- dataset_size:5579240 |
|
- loss:CachedMultipleNegativesRankingLoss |
|
widget: |
|
- source_sentence: Program Coordinator RN |
|
sentences: |
|
- discuss the medical history of the healthcare user, evidence-based approach in |
|
general practice, apply various lifting techniques, establish daily priorities, |
|
manage time, demonstrate disciplinary expertise, tolerate sitting for long periods, |
|
think critically, provide professional care in nursing, attend meetings, represent |
|
union members, nursing science, manage a multidisciplinary team involved in patient |
|
care, implement nursing care, customer service, work under supervision in care, |
|
keep up-to-date with training subjects, evidence-based nursing care, operate lifting |
|
equipment, follow code of ethics for biomedical practices, coordinate care, provide |
|
learning support in healthcare |
|
- provide written content, prepare visual data, design computer network, deliver |
|
visual presentation of data, communication, operate relational database management |
|
system, ICT communications protocols, document management, use threading techniques, |
|
search engines, computer science, analyse network bandwidth requirements, analyse |
|
network configuration and performance, develop architectural plans, conduct ICT |
|
code review, hardware architectures, computer engineering, video-games functionalities, |
|
conduct web searches, use databases, use online tools to collaborate |
|
- nursing science, administer appointments, administrative tasks in a medical environment, |
|
intravenous infusion, plan nursing care, prepare intravenous packs, work with |
|
nursing staff, supervise nursing staff, clinical perfusion |
|
- source_sentence: Director of Federal Business Development and Capture Mgmt |
|
sentences: |
|
- develop business plans, strive for company growth, develop personal skills, channel |
|
marketing, prepare financial projections, perform market research, identify new |
|
business opportunities, market research, maintain relationship with customers, |
|
manage government funding, achieve sales targets, build business relationships, |
|
expand the network of providers, make decisions, guarantee customer satisfaction, |
|
collaborate in the development of marketing strategies, analyse business plans, |
|
think analytically, develop revenue generation strategies, health care legislation, |
|
align efforts towards business development, assume responsibility, solve problems, |
|
deliver business research proposals, identify potential markets for companies |
|
- operate warehouse materials, goods transported from warehouse facilities, organise |
|
social work packages, coordinate orders from various suppliers, warehouse operations, |
|
work in assembly line teams, work in a logistics team, footwear materials |
|
- manufacturing plant equipment, use hand tools, assemble hardware components, use |
|
traditional toolbox tools, perform product testing, control panel components, |
|
perform pre-assembly quality checks, oversee equipment operation, assemble mechatronic |
|
units, arrange equipment repairs, assemble machines, build machines, resolve equipment |
|
malfunctions, electromechanics, develop assembly instructions, install hydraulic |
|
systems, revise quality control systems documentation, detect product defects, |
|
operate hydraulic machinery controls, show an exemplary leading role in an organisation, |
|
assemble manufactured pipeline parts, types of pallets, perform office routine |
|
activities, conform with production requirements, comply with quality standards |
|
related to healthcare practice |
|
- source_sentence: director of production |
|
sentences: |
|
- use customer relationship management software, sales strategies, create project |
|
specifications, document project progress, attend trade fairs, building automation, |
|
sales department processes, work independently, develop account strategy, build |
|
business relationships, facilitate the bidding process, close sales at auction, |
|
satisfy technical requirements, results-based management, achieve sales targets, |
|
manage sales teams, liaise with specialist contractors for well operations, sales |
|
activities, use sales forecasting softwares, guarantee customer satisfaction, |
|
integrate building requirements in the architectural design, participate actively |
|
in civic life, customer relationship management, implement sales strategies |
|
- translate strategy into operation, lead the brand strategic planning process, |
|
assist in developing marketing campaigns, implement sales strategies, sales promotion |
|
techniques, negotiate with employment agencies, perform market research, communicate |
|
with customers, develop media strategy, change power distribution systems, beverage |
|
products, project management, provide advertisement samples, devise military tactics, |
|
use microsoft office, market analysis, manage sales teams, create brand guidelines, |
|
brand marketing techniques, use sales forecasting softwares, supervise brand management, |
|
analyse packaging requirements, provide written content, hand out product samples, |
|
channel marketing |
|
- use microsoft office, use scripting programming, build team spirit, operate games, |
|
production processes, create project specifications, analyse production processes |
|
for improvement, manage production enterprise, Agile development, apply basic |
|
programming skills, document project progress, supervise game operations, work |
|
to develop physical ability to perform at the highest level in sport, fix meetings, |
|
office software, enhance production workflow, manage a team, set production KPI, |
|
manage commercial risks, work in teams, teamwork principles, address identified |
|
risks, meet deadlines, consult with production director |
|
- source_sentence: Nursing Assistant |
|
sentences: |
|
- supervise medical residents, observe healthcare users, provide domestic care, |
|
prepare health documentation, position patients undergoing interventions, work |
|
with broad variety of personalities, supervise food in healthcare, tend to elderly |
|
people, monitor patient's vital signs, transfer patients, show empathy, provide |
|
in-home support for disabled individuals, hygiene in a health care setting, supervise |
|
housekeeping operations, perform cleaning duties, monitor patient's health condition, |
|
provide basic support to patients, work with nursing staff, involve service users |
|
and carers in care planning, use electronic health records management system, |
|
arrange in-home services for patients, provide nursing care in community settings |
|
, work in shifts, supervise nursing staff |
|
- manage relationships with stakeholders, use microsoft office, maintain records |
|
of financial transactions, software components suppliers, tools for software configuration |
|
management, attend to detail, keep track of expenses, build business relationships, |
|
issue sales invoices, financial department processes, supplier management, process |
|
payments, perform records management, manage standard enterprise resource planning |
|
system |
|
- inspect quality of products, apply HACCP, test package, follow verbal instructions, |
|
laboratory equipment, assist in the production of laboratory documentation, ensure |
|
quality control in packaging, develop food safety programmes, packaging engineering, |
|
appropriate packaging of dangerous goods, maintain laboratory equipment, SAP Data |
|
Services, calibrate laboratory equipment, analyse packaging requirements, write |
|
English |
|
- source_sentence: Branch Manager |
|
sentences: |
|
- support employability of people with disabilities, schedule shifts, issue licences, |
|
funding methods, maintain correspondence records, computer equipment, decide on |
|
providing funds, tend filing machine, use microsoft office, lift stacks of paper, |
|
transport office equipment, tend to guests with special needs, provide written |
|
content, foreign affairs policy development, provide charity services, philanthropy, |
|
maintain financial records, meet deadlines, manage fundraising activities, assist |
|
individuals with disabilities in community activities, report on grants, prepare |
|
compliance documents, manage grant applications, tolerate sitting for long periods, |
|
follow work schedule |
|
- cook pastry products, create new recipes, food service operations, assess shelf |
|
life of food products, apply requirements concerning manufacturing of food and |
|
beverages, food waste monitoring systems, maintain work area cleanliness, comply |
|
with food safety and hygiene, coordinate catering, maintain store cleanliness, |
|
work according to recipe, health, safety and hygiene legislation, install refrigeration |
|
equipment, prepare desserts, measure precise food processing operations, conform |
|
with production requirements, work in an organised manner, demand excellence from |
|
performers, refrigerants, attend to detail, ensure food quality, manufacture prepared |
|
meals |
|
- teamwork principles, office administration, delegate responsibilities, create |
|
banking accounts, manage alarm system, make independent operating decisions, use |
|
microsoft office, offer financial services, ensure proper document management, |
|
own management skills, use spreadsheets software, manage cash flow, integrate |
|
community outreach, manage time, perform multiple tasks at the same time, carry |
|
out calculations, assess customer credibility, maintain customer service, team |
|
building, digitise documents, promote financial products, communication, assist |
|
customers, follow procedures in the event of an alarm, office equipment |
|
--- |
|
|
|
# SentenceTransformer based on sentence-transformers/all-mpnet-base-v2 |
|
|
|
This is a [sentence-transformers](https://www.SBERT.net) model specifically trained for job title matching and similarity. It's finetuned from [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) on a large dataset of job titles and their associated skills/requirements. The model maps job titles and descriptions to a 1024-dimensional dense vector space and can be used for semantic job title matching, job similarity search, and related HR/recruitment tasks. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
- **Model Type:** Sentence Transformer |
|
- **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) |
|
- **Maximum Sequence Length:** 64 tokens |
|
- **Output Dimensionality:** 1024 tokens |
|
- **Similarity Function:** Cosine Similarity |
|
- **Training Dataset:** 5.5M+ job title pairs |
|
- **Primary Use Case:** Job title matching and similarity |
|
- **Performance:** Achieves 0.6457 MAP on TalentCLEF benchmark |
|
|
|
### Model Sources |
|
|
|
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net) |
|
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) |
|
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) |
|
|
|
### Full Model Architecture |
|
|
|
``` |
|
SentenceTransformer( |
|
(0): Transformer({'max_seq_length': 64, 'do_lower_case': False}) with Transformer model: MPNetModel |
|
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) |
|
(2): Asym( |
|
(anchor-0): Dense({'in_features': 768, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'}) |
|
(positive-0): Dense({'in_features': 768, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'}) |
|
) |
|
) |
|
``` |
|
|
|
## Usage |
|
|
|
### Direct Usage (Sentence Transformers) |
|
|
|
First install the required packages: |
|
|
|
```bash |
|
pip install -U sentence-transformers |
|
``` |
|
|
|
Then you can load and use the model with the following code: |
|
|
|
```python |
|
import torch |
|
import numpy as np |
|
from tqdm.auto import tqdm |
|
from sentence_transformers import SentenceTransformer |
|
from sentence_transformers.util import batch_to_device, cos_sim |
|
|
|
# Load the model |
|
model = SentenceTransformer("TechWolf/JobBERT-v2") |
|
|
|
def encode_batch(jobbert_model, texts): |
|
features = jobbert_model.tokenize(texts) |
|
features = batch_to_device(features, jobbert_model.device) |
|
features["text_keys"] = ["anchor"] |
|
with torch.no_grad(): |
|
out_features = jobbert_model.forward(features) |
|
return out_features["sentence_embedding"].cpu().numpy() |
|
|
|
def encode(jobbert_model, texts, batch_size: int = 8): |
|
# Sort texts by length and keep track of original indices |
|
sorted_indices = np.argsort([len(text) for text in texts]) |
|
sorted_texts = [texts[i] for i in sorted_indices] |
|
|
|
embeddings = [] |
|
|
|
# Encode in batches |
|
for i in tqdm(range(0, len(sorted_texts), batch_size)): |
|
batch = sorted_texts[i:i+batch_size] |
|
embeddings.append(encode_batch(jobbert_model, batch)) |
|
|
|
# Concatenate embeddings and reorder to original indices |
|
sorted_embeddings = np.concatenate(embeddings) |
|
original_order = np.argsort(sorted_indices) |
|
return sorted_embeddings[original_order] |
|
|
|
# Example usage |
|
job_titles = [ |
|
'Software Engineer', |
|
'Senior Software Developer', |
|
'Product Manager', |
|
'Data Scientist' |
|
] |
|
|
|
# Get embeddings |
|
embeddings = encode(model, job_titles) |
|
|
|
# Calculate cosine similarity matrix |
|
similarities = cos_sim(embeddings, embeddings) |
|
print(similarities) |
|
``` |
|
|
|
The output will be a similarity matrix where each value represents the cosine similarity between two job titles: |
|
|
|
``` |
|
tensor([[1.0000, 0.8723, 0.4821, 0.5447], |
|
[0.8723, 1.0000, 0.4822, 0.5019], |
|
[0.4821, 0.4822, 1.0000, 0.4328], |
|
[0.5447, 0.5019, 0.4328, 1.0000]]) |
|
``` |
|
|
|
In this example: |
|
- The diagonal values are 1.0000 (perfect similarity with itself) |
|
- 'Software Engineer' and 'Senior Software Developer' have high similarity (0.8723) |
|
- 'Product Manager' and 'Data Scientist' show lower similarity with other roles |
|
- All values range between 0 and 1, where higher values indicate greater similarity |
|
|
|
### Example Use Cases |
|
|
|
1. **Job Title Matching**: Find similar job titles for standardization or matching |
|
2. **Job Search**: Match job seekers with relevant positions based on title similarity |
|
3. **HR Analytics**: Analyze job title patterns and similarities across organizations |
|
4. **Talent Management**: Identify similar roles for career development and succession planning |
|
|
|
## Training Details |
|
|
|
### Training Dataset |
|
|
|
#### generator |
|
- Dataset: 5.5M+ job title pairs |
|
- Format: Anchor job titles paired with related skills/requirements |
|
- Training objective: Learn semantic similarity between job titles and their associated skills |
|
- Loss: CachedMultipleNegativesRankingLoss with cosine similarity |
|
|
|
### Training Hyperparameters |
|
- Batch Size: 2048 |
|
- Learning Rate: 5e-05 |
|
- Epochs: 1 |
|
- FP16 Training: Enabled |
|
- Optimizer: AdamW |
|
|
|
### Framework Versions |
|
- Python: 3.9.19 |
|
- Sentence Transformers: 3.1.0 |
|
- Transformers: 4.44.2 |
|
- PyTorch: 2.4.1+cu118 |
|
- Accelerate: 0.34.2 |
|
- Datasets: 3.0.0 |
|
- Tokenizers: 0.19.1 |
|
|
|
## Citation |
|
|
|
### BibTeX |
|
|
|
#### Sentence Transformers |
|
```bibtex |
|
@inproceedings{reimers-2019-sentence-bert, |
|
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", |
|
author = "Reimers, Nils and Gurevych, Iryna", |
|
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", |
|
month = "11", |
|
year = "2019", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://arxiv.org/abs/1908.10084", |
|
} |
|
``` |
|
|
|
#### CachedMultipleNegativesRankingLoss |
|
```bibtex |
|
@misc{gao2021scaling, |
|
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup}, |
|
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan}, |
|
year={2021}, |
|
eprint={2101.06983}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.LG} |
|
} |
|
``` |