--- tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:182 - loss:CosineSimilarityLoss base_model: sentence-transformers/all-MiniLM-L6-v2 widget: - source_sentence: What documents must contractors/vendors provide? sentences: - 1. ESH representatives will carry out the training when new employees need to be trained, or on an annual basis. - "1. Safe Operating Procedure (SOP). \n2. Risk Assessment ( Hazard Identification,\ \ Risk Assessment, & Risk control / HIRARC) / JSA / Job Safety Analysis. \n3.\ \ Valid licenses (If applicable). \n4. Certification of Fitness-CF (For all types\ \ of cranes). \n5. Crane Operator Competency License. (If applicable). \n6. All\ \ scaffolding must be erected as per the statutory regulations. \n7. Lifting Supervisor\ \ Competency Certificate. (If applicable). \n8. Signal Man Competency Certificate.\ \ (If applicable. \n9. Rigger Competency Certificate. (If applicable). \n10. Lifting\ \ plan (If applicable). \n11. Scaffolder Level 1/2/3 Certificate. (If applicable)." - 1. To ensure the specific employees are aware of the correct procedures associated with chemical handling and waste management. - source_sentence: What is the guideline for shirts and blouses? sentences: - 1. ESH representatives will carry out the training when new employees need to be trained, or on an annual basis. - 1. Employees in CLEAN ROOM are NOT ALLOWED to use/wear makeup/bangles. - "1. 1. Formal or casual shirts with sleeves. \n2. 2. Collared T-shirts and blouses/sleeveless\ \ tops (for ladies). \n3. 3. Round-neck T-shirts are allowed for non-office personnel.\ \ \n4. 4. Clothing with the company logo is encouraged. \n5. 5. Sport Team. \n\ 6. 6. University. \n7. 7. Fashion brands on clothing are generally acceptable." - source_sentence: What is the lunch schedule for the 1st shift in the normal schedule in M-site? sentences: - 12 days. - '1. Categorization of Machine: Identify the location of the machine, its function, and all necessary items needed for it to run (e.g., lubricants, saw blades, etc). 2. Authorization: Ensure that all personnel operating the machine have received the appropriate training. 3. Hazard & Risks associated with equipment/machinery/techniques/process: Identify all hazards and risks associated, and implement sufficient controls according to the hierarchy of controls (e.g., warning labels and symbols). 4. Pre-work procedure: Ensure that the machine is in proper, running condition before starting work. 5. During work procedure: Follow the correct standard operating procedure for carrying out that work activity. 6. After work procedure: Ensure that the machine remains in a neat and tidy condition at all times. 7. Work Area: Identify the area where the work is being done. 8. PPE: Ensure that appropriate PPE is available for all personnel handling the machine. 9. Emergency Procedure: Ensure sufficient emergency features are available on the machine (e.g., emergency stop button). 10. After work hour: Ensure the machine system is in shutdown/standby mode when the machine is not running. 11. Housekeeping: Ensure basic housekeeping is done at the work area. 12. Scheduled waste: Any scheduled waste generated by the process should be disposed of according to Carsem waste management procedure.' - 1. Lunch (Tengah Hari) for the 1st shift is from 12:00 PM to 1:00 PM, lasting 60 minutes. - source_sentence: What is the meal schedule for M-site? sentences: - 2 days. - "1. 1st Shift: -Dinner (Malam): 8:00PM - 8:40PM, -Supper(Lewat Malam): 1:00AM\ \ - 1:30 AM -Breakfast(Pagi): 8:00AM - 8:30AM -Lunch(Tengah Hari): 12:50PM - 1:30PM.\ \ \n2. 2nd Shift: -Dinner(Malam): 8:50PM - 9:30PM -Supper(Lewat Malam): 1:40AM\ \ - 2:10AM -Breakfast(Pagi): 8:40AM - 9:10AM -Lunch(Tengah Hari): 1:40PM - 2:20PM.\ \ \n3. 3rd Shift: -Dinner(Malam): 9:40PM - 10:20PM -Supper(Lewat Malam): 2:20AM\ \ - 2:50AM -Breakfast(Pagi): 9:20AM - 9:50AM -Lunch(Tengah Hari): 2:30PM - 3:10PM.\ \ \n4. 4th Shift: -Dinner(Malam): 10:30PM - 11:10PM -Supper(Lewat Malam): 3:00AM\ \ - 3:30AM -Breakfast(Pagi): 10:00AM - 10:30AM -Lunch(Tengah Hari): 3:20PM - 4:00PM." - "1. The mechanical safety guidelines include: \n2. 1. Lock-Out Tag-Out (LOTO):\ \ Always practice LOTO procedures when performing maintenance or repairs on machines.\ \ \n3. 2. Preventive Maintenance: Conduct regular preventive maintenance on all\ \ machinery to ensure proper functioning. \n4. 3. Pinch Points Awareness: Identify\ \ all possible pinch points on machinery, and ensure they are properly labeled.\ \ \n5. 4. Production Area Organization: Keep the production area neat and organized\ \ at all times. \n6. 5. Operator Training: Provide adequate training to operators\ \ before allowing them to handle machines. \n7. 6. Machine Guarding: Ensure all\ \ safety guards are in place before starting machine operations." - source_sentence: Can employees wear traditional attire? sentences: - "1. N03 : Monday to Friday, 8am to 5:30pm.\n2. N04 : Tuesday to Saturday, 8am\ \ to 5:30pm.\n3. N05 : Monday to Friday, 8:30am to 6pm.\n4. N06 : Monday to Friday,\ \ 9am to 6:30pm.\n5. N07 : Tuesday to Saturday, 8:30am to 6pm.\n6. N08 : Tuesday\ \ to Saturday, 9am to 6.30pm.\n7. N6 : Tuesday to Saturday, 8:30pm to 6:15pm.\n\ 8. N9: 5 working days 2 days off, 7:30am to 5:15pm , 10:30am to 8:15pm.\n9. N10:\ \ 5 working days 2 days off, 10:30am to 8:15pm , 7:30am to 5:15pm.\n10. AA/BB/CC/A/B/C\ \ : 4 working days 2 days off, 6:30am to 6:30pm , 6:30pm to 6:30am.\n11. AA1/BB1/CC1/A1/B1/C1\ \ : 4 working days 2 days off, 6:30am to 6:30pm , 6:30pm to 6:30am.\n12. GG/HH/II/GG1/HH1/II1\ \ : 4 working days 2 days off, 7:30am to 7:30pm , 7:30pm to 7:30am.\n13. P1 :\ \ Monday to Thursday (4 working days 2 days off), 6:30am to 6:30pm , 6:30pm to\ \ 6:30am.\n14. P2 : Tuesday to Friday (4 working days 2 days off), 6:30am to 6:30pm\ \ , 6:30pm to 6:30am. \n15. U1/U2/U3/UU1/UU2/UU3 : 4 working days 2 days off,\ \ 7:30am to 7.30pm. \n16. V1/V2/V3/VV1/VV2/VV3 : 4 working days 2 days off, 8.30am\ \ to 8.30pm. \n17. W1/W2/W3/WW1/WW2/WW3 : 4 working days 2 days off, 6.30am to\ \ 6.30pm. \n18. H1 : Monday to Thursday (4 working days 2 days off), 6.30am to\ \ 6.30pm. \n19. H2 : Tuesday to Friday (4 working days 2 days off), 6.30am to\ \ 6.30pm. \n20. H3 : Wednesday to Saturday (4 working days 2 days off), 6.30am\ \ to 6.30pm. \n21. H6(applicable in S only) : Monday to Thursday (4 working days\ \ 2 days off), 7.30am to 7.30pm. \n22. H6(applicable in M only) : Monday to Thursday\ \ (4 working days 2 days off), 7.30am to 7.30pm." - "1. 1st Shift: -Dinner (Malam): 8:00PM - 8:40PM, -Supper(Lewat Malam): 1:00AM\ \ - 1:30 AM -Breakfast(Pagi): 8:30AM - 9:00AM -Lunch(Tengah Hari): 1:40PM - 2:20PM.\ \ \n2. 2nd Shift: -Dinner(Malam): 8:50PM - 9:30PM -Supper(Lewat Malam): 1:40AM\ \ - 2:10AM -Breakfast(Pagi): 9:10AM - 9:40AM -Lunch(Tengah Hari): 2:30PM - 3:10PM.\ \ \n3. 3rd Shift: -Dinner(Malam): 9:40PM - 10:20PM -Supper(Lewat Malam): 2:20AM\ \ - 2:50AM -Breakfast(Pagi): 9:50AM - 10:20AM -Lunch(Tengah Hari): 3:20PM - 4:00PM." - "1. 1. Yes, acceptable traditional attire includes: \n2. 1. Malaysian Traditional\ \ Attire. \n3. 2.Malay Baju Kurung. \n4. 3. Baju Melayu for Muslim men. \n5. 4.Indian\ \ Saree. \n6. 5. Punjabi Suit. \n7. Chinese Cheongsam are acceptable." pipeline_tag: sentence-similarity library_name: sentence-transformers --- # SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2 This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) - **Maximum Sequence Length:** 256 tokens - **Output Dimensionality:** 384 dimensions - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("PeYing/model1_v2") # Run inference sentences = [ 'Can employees wear traditional attire?', '1. 1. Yes, acceptable traditional attire includes: \n2. 1. Malaysian Traditional Attire. \n3. 2.Malay Baju Kurung. \n4. 3. Baju Melayu for Muslim men. \n5. 4.Indian Saree. \n6. 5. Punjabi Suit. \n7. Chinese Cheongsam are acceptable.', '1. N03 : Monday to Friday, 8am to 5:30pm.\n2. N04 : Tuesday to Saturday, 8am to 5:30pm.\n3. N05 : Monday to Friday, 8:30am to 6pm.\n4. N06 : Monday to Friday, 9am to 6:30pm.\n5. N07 : Tuesday to Saturday, 8:30am to 6pm.\n6. N08 : Tuesday to Saturday, 9am to 6.30pm.\n7. N6 : Tuesday to Saturday, 8:30pm to 6:15pm.\n8. N9: 5 working days 2 days off, 7:30am to 5:15pm , 10:30am to 8:15pm.\n9. N10: 5 working days 2 days off, 10:30am to 8:15pm , 7:30am to 5:15pm.\n10. AA/BB/CC/A/B/C : 4 working days 2 days off, 6:30am to 6:30pm , 6:30pm to 6:30am.\n11. AA1/BB1/CC1/A1/B1/C1 : 4 working days 2 days off, 6:30am to 6:30pm , 6:30pm to 6:30am.\n12. GG/HH/II/GG1/HH1/II1 : 4 working days 2 days off, 7:30am to 7:30pm , 7:30pm to 7:30am.\n13. P1 : Monday to Thursday (4 working days 2 days off), 6:30am to 6:30pm , 6:30pm to 6:30am.\n14. P2 : Tuesday to Friday (4 working days 2 days off), 6:30am to 6:30pm , 6:30pm to 6:30am. \n15. U1/U2/U3/UU1/UU2/UU3 : 4 working days 2 days off, 7:30am to 7.30pm. \n16. V1/V2/V3/VV1/VV2/VV3 : 4 working days 2 days off, 8.30am to 8.30pm. \n17. W1/W2/W3/WW1/WW2/WW3 : 4 working days 2 days off, 6.30am to 6.30pm. \n18. H1 : Monday to Thursday (4 working days 2 days off), 6.30am to 6.30pm. \n19. H2 : Tuesday to Friday (4 working days 2 days off), 6.30am to 6.30pm. \n20. H3 : Wednesday to Saturday (4 working days 2 days off), 6.30am to 6.30pm. \n21. H6(applicable in S only) : Monday to Thursday (4 working days 2 days off), 7.30am to 7.30pm. \n22. H6(applicable in M only) : Monday to Thursday (4 working days 2 days off), 7.30am to 7.30pm.', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 384] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 182 training samples * Columns: sentence_0, sentence_1, and label * Approximate statistics based on the first 182 samples: | | sentence_0 | sentence_1 | label | |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------| | type | string | string | int | | details | | | | * Samples: | sentence_0 | sentence_1 | label | |:----------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------| | List out all the work schedule for Carsem. | 1. N03 : Monday to Friday, 8am to 5:30pm.
2. N04 : Tuesday to Saturday, 8am to 5:30pm.
3. N05 : Monday to Friday, 8:30am to 6pm.
4. N06 : Monday to Friday, 9am to 6:30pm.
5. N07 : Tuesday to Saturday, 8:30am to 6pm.
6. N08 : Tuesday to Saturday, 9am to 6.30pm.
7. N6 : Tuesday to Saturday, 8:30pm to 6:15pm.
8. N9: 5 working days 2 days off, 7:30am to 5:15pm , 10:30am to 8:15pm.
9. N10: 5 working days 2 days off, 10:30am to 8:15pm , 7:30am to 5:15pm.
10. AA/BB/CC/A/B/C : 4 working days 2 days off, 6:30am to 6:30pm , 6:30pm to 6:30am.
11. AA1/BB1/CC1/A1/B1/C1 : 4 working days 2 days off, 6:30am to 6:30pm , 6:30pm to 6:30am.
12. GG/HH/II/GG1/HH1/II1 : 4 working days 2 days off, 7:30am to 7:30pm , 7:30pm to 7:30am.
13. P1 : Monday to Thursday (4 working days 2 days off), 6:30am to 6:30pm , 6:30pm to 6:30am.
14. P2 : Tuesday to Friday (4 working days 2 days off), 6:30am to 6:30pm , 6:30pm to 6:30am.
15. U1/U2/U3/UU1/UU2/UU3 : 4 working days 2 days off, 7:30am to 7.30pm.
16. V1/V2/V3/VV1/VV...
| 1 | | What is the maximum allowed working hours in a week? | 1. Employees are not allowed to work more than 60 hours in a week inclusive of overtime and 1 rest day per week. Company will monitor overtime and rest day utilization and take appropriate action to address instances deemed excessive. | 1 | | Why the company is not allowed working hours in a week more than 60 hours? | 1. Continuous overtime causes worker strain that may lead to reduced productivity, increased turnover and increased injury and illnesses. | 1 | * Loss: [CosineSimilarityLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters: ```json { "loss_fct": "torch.nn.modules.loss.MSELoss" } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `per_device_train_batch_size`: 1 - `per_device_eval_batch_size`: 1 - `num_train_epochs`: 1 - `multi_dataset_batch_sampler`: round_robin #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: no - `prediction_loss_only`: True - `per_device_train_batch_size`: 1 - `per_device_eval_batch_size`: 1 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 5e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1 - `num_train_epochs`: 1 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.0 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: False - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: None - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `include_for_metrics`: [] - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `eval_use_gather_object`: False - `average_tokens_across_devices`: False - `prompts`: None - `batch_sampler`: batch_sampler - `multi_dataset_batch_sampler`: round_robin
### Framework Versions - Python: 3.11.11 - Sentence Transformers: 3.4.1 - Transformers: 4.48.2 - PyTorch: 2.5.1+cu124 - Accelerate: 1.2.1 - Datasets: 3.2.0 - Tokenizers: 0.21.0 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ```