mistralmed / README.md

Update README.md

2d52511 almost 2 years ago

11.2 kB

	---
	library_name: peft
	base_model: mistralai/Mistral-7B-v0.1
	license: mit
	datasets:
	- keivalya/MedQuad-MedicalQnADataset
	language:
	- en
	metrics:
	- bertscore
	tags:
	- medical
	---

	# Model Card for Model ID

	This is a medicine-focussed mistral fine tuned using keivalya/MedQuad-MedicalQnADataset


	## Model Details

	### Model Description

	Trying to get better at medical Q & A


	- Developed by: [Tonic](https://huggingface.co/Tonic)
	- Shared by [optional]: [Tonic](https://huggingface.co/Tonic)
	- Model type: Mistral Fine-Tune
	- Language(s) (NLP): English
	- License: MIT2.0
	- Finetuned from model [optional]: [mistralai/Mistral-7B-v0.1](https://huggingface.com/Mistralai/Mistral-7B-v0.1)

	### Model Sources [optional]


	- Repository: [Tonic/mistralmed](https://huggingface.co/Tonic/mistralmed)
	- Code : [github](https://github.com/Josephrp/mistralmed/blob/main/finetuning.py)
	- Demo : [Tonic/MistralMed_Chat](https://huggingface.co/Tonic/MistralMed_Chat)

	## Uses

	This model can be used the same way you normally use mistral

	### Direct Use

	This model can do better in medical question and answer scenarios.

	### Downstream Use [optional]

	This model is intended to be further fine tuned.

	### Recommendations

	- Do Not Use As Is
	- Fine Tune This Model Further
	- For Educational Purposes Only
	- Benchmark your model usage
	- Evaluate the model before use

	Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

	## How to Get Started with the Model

	Use the code below to get started with the model.

	[Tonic/MistralMED_Chat](https://huggingface.co/Tonic/MistralMED_Chat)

	## Training Details

	### Training Data

	[MedQuad](https://huggingface.co/datasets/keivalya/MedQuad-MedicalQnADataset/viewer/default/train)

	### Training Procedure

	Dataset({
	features: ['qtype', 'Question', 'Answer'],
	num_rows: 16407
	})


	#### Preprocessing [optional]

	MistralForCausalLM(
	(model): MistralModel(
	(embed_tokens): Embedding(32000, 4096)
	(layers): ModuleList(
	(0-31): 32 x MistralDecoderLayer(
	(self_attn): MistralAttention(
	(q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
	(k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
	(v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
	(o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
	(rotary_emb): MistralRotaryEmbedding()
	)
	(mlp): MistralMLP(
	(gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
	(up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
	(down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
	(act_fn): SiLUActivation()
	)
	(input_layernorm): MistralRMSNorm()
	(post_attention_layernorm): MistralRMSNorm()
	)
	)
	(norm): MistralRMSNorm()
	)
	(lm_head): Linear(in_features=4096, out_features=32000, bias=False)
	)


	#### Training Hyperparameters

	- Training regime:
	config = LoraConfig(
	r=8,
	lora_alpha=16,
	target_modules=[
	"q_proj",
	"k_proj",
	"v_proj",
	"o_proj",
	"gate_proj",
	"up_proj",
	"down_proj",
	"lm_head",
	],
	bias="none",
	lora_dropout=0.05, # Conventional
	task_type="CAUSAL_LM",
	)

	#### Speeds, Sizes, Times [optional]

	- trainable params: 21260288 \|\| all params: 3773331456 \|\| trainable%: 0.5634354746703705
	- TrainOutput(global_step=1000, training_loss=0.47226515007019043, metrics={'train_runtime': 3143.4141, 'train_samples_per_second': 2.545, 'train_steps_per_second': 0.318, 'total_flos': 1.75274075357184e+17, 'train_loss': 0.47226515007019043, 'epoch': 0.49})


	## Environmental Impact

	<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

	- Hardware Type: A100
	- Hours used: 1
	- Cloud Provider: Google
	- Compute Region: East1
	- Carbon Emitted: 0.09

	## Training Results

	[1000/1000 52:20, Epoch 0/1]

	\| Step \| Training Loss \|
	\|-------\|--------------\|
	\| 50 \| 0.474200 \|
	\| 100 \| 0.523300 \|
	\| 150 \| 0.484500 \|
	\| 200 \| 0.482800 \|
	\| 250 \| 0.498800 \|
	\| 300 \| 0.451800 \|
	\| 350 \| 0.491800 \|
	\| 400 \| 0.488000 \|
	\| 450 \| 0.472800 \|
	\| 500 \| 0.460400 \|
	\| 550 \| 0.464700 \|
	\| 600 \| 0.484800 \|
	\| 650 \| 0.474600 \|
	\| 700 \| 0.477900 \|
	\| 750 \| 0.445300 \|
	\| 800 \| 0.431300 \|
	\| 850 \| 0.461500 \|
	\| 900 \| 0.451200 \|
	\| 950 \| 0.470800 \|
	\| 1000 \| 0.454900 \|


	### Model Architecture and Objective

	PeftModelForCausalLM(
	(base_model): LoraModel(
	(model): MistralForCausalLM(
	(model): MistralModel(
	(embed_tokens): Embedding(32000, 4096)
	(layers): ModuleList(
	(0-31): 32 x MistralDecoderLayer(
	(self_attn): MistralAttention(
	(q_proj): Linear4bit(
	(lora_dropout): ModuleDict(
	(default): Dropout(p=0.05, inplace=False)
	)
	(lora_A): ModuleDict(
	(default): Linear(in_features=4096, out_features=8, bias=False)
	)
	(lora_B): ModuleDict(
	(default): Linear(in_features=8, out_features=4096, bias=False)
	)
	(lora_embedding_A): ParameterDict()
	(lora_embedding_B): ParameterDict()
	(base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
	)
	(k_proj): Linear4bit(
	(lora_dropout): ModuleDict(
	(default): Dropout(p=0.05, inplace=False)
	)
	(lora_A): ModuleDict(
	(default): Linear(in_features=4096, out_features=8, bias=False)
	)
	(lora_B): ModuleDict(
	(default): Linear(in_features=8, out_features=1024, bias=False)
	)
	(lora_embedding_A): ParameterDict()
	(lora_embedding_B): ParameterDict()
	(base_layer): Linear4bit(in_features=4096, out_features=1024, bias=False)
	)
	(v_proj): Linear4bit(
	(lora_dropout): ModuleDict(
	(default): Dropout(p=0.05, inplace=False)
	)
	(lora_A): ModuleDict(
	(default): Linear(in_features=4096, out_features=8, bias=False)
	)
	(lora_B): ModuleDict(
	(default): Linear(in_features=8, out_features=1024, bias=False)
	)
	(lora_embedding_A): ParameterDict()
	(lora_embedding_B): ParameterDict()
	(base_layer): Linear4bit(in_features=4096, out_features=1024, bias=False)
	)
	(o_proj): Linear4bit(
	(lora_dropout): ModuleDict(
	(default): Dropout(p=0.05, inplace=False)
	)
	(lora_A): ModuleDict(
	(default): Linear(in_features=4096, out_features=8, bias=False)
	)
	(lora_B): ModuleDict(
	(default): Linear(in_features=8, out_features=4096, bias=False)
	)
	(lora_embedding_A): ParameterDict()
	(lora_embedding_B): ParameterDict()
	(base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
	)
	(rotary_emb): MistralRotaryEmbedding()
	)
	(mlp): MistralMLP(
	(gate_proj): Linear4bit(
	(lora_dropout): ModuleDict(
	(default): Dropout(p=0.05, inplace=False)
	)
	(lora_A): ModuleDict(
	(default): Linear(in_features=4096, out_features=8, bias=False)
	)
	(lora_B): ModuleDict(
	(default): Linear(in_features=8, out_features=14336, bias=False)
	)
	(lora_embedding_A): ParameterDict()
	(lora_embedding_B): ParameterDict()
	(base_layer): Linear4bit(in_features=4096, out_features=14336, bias=False)
	)
	(up_proj): Linear4bit(
	(lora_dropout): ModuleDict(
	(default): Dropout(p=0.05, inplace=False)
	)
	(lora_A): ModuleDict(
	(default): Linear(in_features=4096, out_features=8, bias=False)
	)
	(lora_B): ModuleDict(
	(default): Linear(in_features=8, out_features=14336, bias=False)
	)
	(lora_embedding_A): ParameterDict()
	(lora_embedding_B): ParameterDict()
	(base_layer): Linear4bit(in_features=4096, out_features=14336, bias=False)
	)
	(down_proj): Linear4bit(
	(lora_dropout): ModuleDict(
	(default): Dropout(p=0.05, inplace=False)
	)
	(lora_A): ModuleDict(
	(default): Linear(in_features=14336, out_features=8, bias=False)
	)
	(lora_B): ModuleDict(
	(default): Linear(in_features=8, out_features=4096, bias=False)
	)
	(lora_embedding_A): ParameterDict()
	(lora_embedding_B): ParameterDict()
	(base_layer): Linear4bit(in_features=14336, out_features=4096, bias=False)
	)
	(act_fn): SiLUActivation()
	)
	(input_layernorm): MistralRMSNorm()
	(post_attention_layernorm): MistralRMSNorm()
	)
	)
	(norm): MistralRMSNorm()
	)
	(lm_head): Linear(
	in_features=4096, out_features=32000, bias=False
	(lora_dropout): ModuleDict(
	(default): Dropout(p=0.05, inplace=False)
	)
	(lora_A): ModuleDict(
	(default): Linear(in_features=4096, out_features=8, bias=False)
	)
	(lora_B): ModuleDict(
	(default): Linear(in_features=8, out_features=32000, bias=False)
	)
	(lora_embedding_A): ParameterDict()
	(lora_embedding_B): ParameterDict()
	)
	)
	)
	)
	#### Hardware

	A100



	## Model Card Authors [optional]

	[Tonic](https://huggingface.co/Tonic)

	## Model Card Contact

	[Tonic](https://huggingface.co/Tonic)


	## Training procedure


	The following `bitsandbytes` quantization config was used during training:
	- quant_method: bitsandbytes
	- load_in_8bit: False
	- load_in_4bit: True
	- llm_int8_threshold: 6.0
	- llm_int8_skip_modules: None
	- llm_int8_enable_fp32_cpu_offload: False
	- llm_int8_has_fp16_weight: False
	- bnb_4bit_quant_type: nf4
	- bnb_4bit_use_double_quant: True
	- bnb_4bit_compute_dtype: bfloat16

	### Framework versions


	- PEFT 0.6.0.dev0