finehit / README.md

Add new SentenceTransformer model.

0f930fc verified 8 months ago

18.3 kB

	---
	base_model: sentence-transformers/all-MiniLM-L12-v2
	datasets: []
	language: []
	library_name: sentence-transformers
	pipeline_tag: sentence-similarity
	tags:
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- generated_from_trainer
	- dataset_size:2144
	- loss:MultipleNegativesRankingLoss
	widget:
	- source_sentence: How do I find out when I should write my examinations?
	sentences:
	- Information relating to examination timetables is available from the Examination
	Office and will be published on the official Institute Notice Board and the website.
	- If you find an error on your academic record, you should contact the Registration
	and Student Records Management Office immediately.
	- To request accommodations for a disability, you must submit documentation of the
	disability to the disability services office and meet with a disability services
	coordinator.
	- source_sentence: What is the language of instruction at the Harare Institute of
	Technology?
	sentences:
	- English is the language of instruction.
	- Tracking international events and conference and strategically link them to HIT,
	internationalizing HIT programmes and activities, developing bouquet of events
	and activities for international visitors, helping affiliate, accredit HIT, staff
	and students to international bodies and associations, liaising with national
	bodies and promote Zimbabwean culture and symbols, serving as a point of contact
	for exchange students, staff and visitors, ensuring international programmes align
	to national programmes and symbols, helping affiliate HIT ethos to national art
	and culture, monitoring implementation of MoUs and MoAs, facilitation of international
	travel and visits, providing Institute departments with consular advice, ensuring
	HIT members get oriented to particular countries’ culture and services before
	departure, driving recruitment of foreign students and exchange programmes.
	- BFA 7206 is the course code for Financial Institutions Fraud, which is an elective
	course in the second semester of the program.
	- source_sentence: What is the process for collecting a certificate?
	sentences:
	- The programme is designed such that on completion, graduates should be able to
	innovatively execute their professional role within prescribed and legislative
	parameters, demonstrate a critical understanding and application of quality assurance
	and radiation protection in Radiography, apply scientific knowledge and technical
	skills to perform Radiography procedures, plan, develop and apply total quality
	management appropriate to the Radiography context, apply management, entrepreneurial,
	education and research skills independently and function in a supervisory clinical
	governance and quality assurance capacity within the professional sector, demonstrate
	the ability to reflect in clinical practice, critically evaluate and adjust to
	current and new trends in Radiography, demonstrate capability to implement new
	knowledge and solve problems in varying contexts, and engage life-long learning
	and development in their profession.
	- The process involves clearing any dues to the Institute and providing valid identification
	documents.
	- A student can apply for change of programme within two weeks after commencement
	of lectures.
	- source_sentence: How do I change my address or contact information?
	sentences:
	- Information Security & Assurance is a field that deals with the protection of
	information and information systems from unauthorized access, use, disclosure,
	disruption, modification, or destruction.
	- The Information and Communications Technology Services (ICTS) Department at HIT
	is responsible for providing and maintaining the Institute's IT infrastructure
	and services.
	- You can update your address or contact information through the online student
	portal or by contacting the Academic Registry.
	- source_sentence: What is the difference between Cloud Computing and Information
	Security & Assurance?
	sentences:
	- The fourth semester focuses on courses such as Research Project, Clinical Practice
	IV, and Seminar.
	- Cloud Computing is focused on the design, implementation, and management of cloud
	services, while Information Security & Assurance is focused on the protection
	of information by mitigating information risks and ensuring availability, privacy,
	and integrity of data.
	- The Applied Research Methods course is designed to equip students with the skills
	and knowledge necessary to conduct research in chemical engineering process and
	plant design.
	---

	# SentenceTransformer based on sentence-transformers/all-MiniLM-L12-v2

	This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

	## Model Details

	### Model Description
	- Model Type: Sentence Transformer
	- Base model: [sentence-transformers/all-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2) <!-- at revision a05860a77cef7b37e0048a7864658139bc18a854 -->
	- Maximum Sequence Length: 128 tokens
	- Output Dimensionality: 384 tokens
	- Similarity Function: Cosine Similarity
	<!-- - Training Dataset: Unknown -->
	<!-- - Language: Unknown -->
	<!-- - License: Unknown -->

	### Model Sources

	- Documentation: [Sentence Transformers Documentation](https://sbert.net)
	- Repository: [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
	- Hugging Face: [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

	### Full Model Architecture

	```
	SentenceTransformer(
	(0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel
	(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
	(2): Normalize()
	)
	```

	## Usage

	### Direct Usage (Sentence Transformers)

	First install the Sentence Transformers library:

	```bash
	pip install -U sentence-transformers
	```

	Then you can load this model and run inference.
	```python
	from sentence_transformers import SentenceTransformer

	# Download from the 🤗 Hub
	model = SentenceTransformer("Dex-X/finehit")
	# Run inference
	sentences = [
	'What is the difference between Cloud Computing and Information Security & Assurance?',
	'Cloud Computing is focused on the design, implementation, and management of cloud services, while Information Security & Assurance is focused on the protection of information by mitigating information risks and ensuring availability, privacy, and integrity of data.',
	'The Applied Research Methods course is designed to equip students with the skills and knowledge necessary to conduct research in chemical engineering process and plant design.',
	]
	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# [3, 384]

	# Get the similarity scores for the embeddings
	similarities = model.similarity(embeddings, embeddings)
	print(similarities.shape)
	# [3, 3]
	```

	<!--
	### Direct Usage (Transformers)

	<details><summary>Click to see the direct usage in Transformers</summary>

	</details>
	-->

	<!--
	### Downstream Usage (Sentence Transformers)

	You can finetune this model on your own dataset.

	<details><summary>Click to expand</summary>

	</details>
	-->

	<!--
	### Out-of-Scope Use

	List how the model may foreseeably be misused and address what users ought not to do with the model.
	-->

	<!--
	## Bias, Risks and Limitations

	What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.
	-->

	<!--
	### Recommendations

	What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.
	-->

	## Training Details

	### Training Dataset

	#### Unnamed Dataset


	* Size: 2,144 training samples
	* Columns: <code>question</code> and <code>answer</code>
	* Approximate statistics based on the first 1000 samples:
	\| \| question \| answer \|
	\|:--------\|:----------------------------------------------------------------------------------\|:----------------------------------------------------------------------------------\|
	\| type \| string \| string \|
	\| details \| <ul><li>min: 6 tokens</li><li>mean: 13.94 tokens</li><li>max: 31 tokens</li></ul> \| <ul><li>min: 3 tokens</li><li>mean: 30.7 tokens</li><li>max: 128 tokens</li></ul> \|
	* Samples:
	\| question \| answer \|
	\|:----------------------------------------------------------------------\|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------\|
	\| <code>What is the role of the Dean of Students?</code> \| <code>The Dean of Students oversees various aspects of student life, including student affairs, campus life and development, accommodation, wellness, and more.</code> \|
	\| <code>What does the Student Affairs department do?</code> \| <code>The Student Affairs department handles matters related to student life, conduct, and welfare.</code> \|
	\| <code>What is the role of Campus Life and Student Development?</code> \| <code>Campus Life and Student Development is responsible for fostering a positive campus environment and promoting student growth and development.</code> \|
	* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
	```json
	{
	"scale": 20.0,
	"similarity_fct": "cos_sim"
	}
	```

	### Evaluation Dataset

	#### Unnamed Dataset


	* Size: 214 evaluation samples
	* Columns: <code>question</code> and <code>answer</code>
	* Approximate statistics based on the first 1000 samples:
	\| \| question \| answer \|
	\|:--------\|:----------------------------------------------------------------------------------\|:-----------------------------------------------------------------------------------\|
	\| type \| string \| string \|
	\| details \| <ul><li>min: 7 tokens</li><li>mean: 15.12 tokens</li><li>max: 31 tokens</li></ul> \| <ul><li>min: 3 tokens</li><li>mean: 31.14 tokens</li><li>max: 128 tokens</li></ul> \|
	* Samples:
	\| question \| answer \|
	\|:--------------------------------------------------------------------------------------------------\|:------------------------------------------------------------------------------------------------------------------\|
	\| <code>What is Student Accommodation and Catering?</code> \| <code>Student Accommodation and Catering is a department that manages student housing and dining services.</code> \|
	\| <code>What certification does Mr. Njonga have from the National Social Security Authority?</code> \| <code>Safety and Health Advisor Certification</code> \|
	\| <code>What is the duration of the B Tech (Hons) Computer Science programme?</code> \| <code>The B Tech (Hons) Computer Science programme is a four-year full-time regular programme.</code> \|
	* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
	```json
	{
	"scale": 20.0,
	"similarity_fct": "cos_sim"
	}
	```

	### Training Hyperparameters
	#### Non-Default Hyperparameters

	- `eval_strategy`: steps
	- `per_device_train_batch_size`: 16
	- `per_device_eval_batch_size`: 16
	- `num_train_epochs`: 1
	- `warmup_ratio`: 0.1
	- `fp16`: True
	- `batch_sampler`: no_duplicates

	#### All Hyperparameters
	<details><summary>Click to expand</summary>

	- `overwrite_output_dir`: False
	- `do_predict`: False
	- `eval_strategy`: steps
	- `prediction_loss_only`: True
	- `per_device_train_batch_size`: 16
	- `per_device_eval_batch_size`: 16
	- `per_gpu_train_batch_size`: None
	- `per_gpu_eval_batch_size`: None
	- `gradient_accumulation_steps`: 1
	- `eval_accumulation_steps`: None
	- `learning_rate`: 5e-05
	- `weight_decay`: 0.0
	- `adam_beta1`: 0.9
	- `adam_beta2`: 0.999
	- `adam_epsilon`: 1e-08
	- `max_grad_norm`: 1.0
	- `num_train_epochs`: 1
	- `max_steps`: -1
	- `lr_scheduler_type`: linear
	- `lr_scheduler_kwargs`: {}
	- `warmup_ratio`: 0.1
	- `warmup_steps`: 0
	- `log_level`: passive
	- `log_level_replica`: warning
	- `log_on_each_node`: True
	- `logging_nan_inf_filter`: True
	- `save_safetensors`: True
	- `save_on_each_node`: False
	- `save_only_model`: False
	- `restore_callback_states_from_checkpoint`: False
	- `no_cuda`: False
	- `use_cpu`: False
	- `use_mps_device`: False
	- `seed`: 42
	- `data_seed`: None
	- `jit_mode_eval`: False
	- `use_ipex`: False
	- `bf16`: False
	- `fp16`: True
	- `fp16_opt_level`: O1
	- `half_precision_backend`: auto
	- `bf16_full_eval`: False
	- `fp16_full_eval`: False
	- `tf32`: None
	- `local_rank`: 0
	- `ddp_backend`: None
	- `tpu_num_cores`: None
	- `tpu_metrics_debug`: False
	- `debug`: []
	- `dataloader_drop_last`: False
	- `dataloader_num_workers`: 0
	- `dataloader_prefetch_factor`: None
	- `past_index`: -1
	- `disable_tqdm`: False
	- `remove_unused_columns`: True
	- `label_names`: None
	- `load_best_model_at_end`: False
	- `ignore_data_skip`: False
	- `fsdp`: []
	- `fsdp_min_num_params`: 0
	- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
	- `fsdp_transformer_layer_cls_to_wrap`: None
	- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
	- `deepspeed`: None
	- `label_smoothing_factor`: 0.0
	- `optim`: adamw_torch
	- `optim_args`: None
	- `adafactor`: False
	- `group_by_length`: False
	- `length_column_name`: length
	- `ddp_find_unused_parameters`: None
	- `ddp_bucket_cap_mb`: None
	- `ddp_broadcast_buffers`: False
	- `dataloader_pin_memory`: True
	- `dataloader_persistent_workers`: False
	- `skip_memory_metrics`: True
	- `use_legacy_prediction_loop`: False
	- `push_to_hub`: False
	- `resume_from_checkpoint`: None
	- `hub_model_id`: None
	- `hub_strategy`: every_save
	- `hub_private_repo`: False
	- `hub_always_push`: False
	- `gradient_checkpointing`: False
	- `gradient_checkpointing_kwargs`: None
	- `include_inputs_for_metrics`: False
	- `eval_do_concat_batches`: True
	- `fp16_backend`: auto
	- `push_to_hub_model_id`: None
	- `push_to_hub_organization`: None
	- `mp_parameters`:
	- `auto_find_batch_size`: False
	- `full_determinism`: False
	- `torchdynamo`: None
	- `ray_scope`: last
	- `ddp_timeout`: 1800
	- `torch_compile`: False
	- `torch_compile_backend`: None
	- `torch_compile_mode`: None
	- `dispatch_batches`: None
	- `split_batches`: None
	- `include_tokens_per_second`: False
	- `include_num_input_tokens_seen`: False
	- `neftune_noise_alpha`: None
	- `optim_target_modules`: None
	- `batch_eval_metrics`: False
	- `batch_sampler`: no_duplicates
	- `multi_dataset_batch_sampler`: proportional

	</details>

	### Training Logs
	\| Epoch \| Step \| Training Loss \| loss \|
	\|:------:\|:----:\|:-------------:\|:------:\|
	\| 0.7463 \| 100 \| 0.5551 \| 0.0665 \|


	### Framework Versions
	- Python: 3.10.12
	- Sentence Transformers: 3.0.1
	- Transformers: 4.41.2
	- PyTorch: 2.3.0+cu121
	- Accelerate: 0.32.1
	- Datasets: 2.20.0
	- Tokenizers: 0.19.1

	## Citation

	### BibTeX

	#### Sentence Transformers
	```bibtex
	@inproceedings{reimers-2019-sentence-bert,
	title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
	author = "Reimers, Nils and Gurevych, Iryna",
	booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
	month = "11",
	year = "2019",
	publisher = "Association for Computational Linguistics",
	url = "https://arxiv.org/abs/1908.10084",
	}
	```

	#### MultipleNegativesRankingLoss
	```bibtex
	@misc{henderson2017efficient,
	title={Efficient Natural Language Response Suggestion for Smart Reply},
	author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
	year={2017},
	eprint={1705.00652},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```

	<!--
	## Glossary

	Clearly define terms in order to be accessible across audiences.
	-->

	<!--
	## Model Card Authors

	Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.
	-->

	<!--
	## Model Card Contact

	Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.
	-->