kangqi-ni
/

Mistral-7B-Instruct-v0.2_bio-tutor_dpo

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Mistral-7B-Instruct-v0.2_bio-tutor_dpo / README.md

kangqi-ni's picture

Update README.md

7c5c4d9 verified 12 months ago

|

676 Bytes

	---
	license: apache-2.0
	language:
	- en
	tags:
	- mistral
	- dpo
	- biology
	- education
	---
	This model is trained on zephyr-7b-beta with FastChat (for SFT) and TRL (for DPO). The purpose is to develop a more capable educational chatbot that helps students learn biology.

	If you use this work, please cite: Pedagogical Alignment of Large Language Models https://arxiv.org/abs/2402.05000
	```
	@misc{sonkar2024pedagogical,
	title={Pedagogical Alignment of Large Language Models},
	author={Shashank Sonkar and Kangqi Ni and Sapana Chaudhary and Richard G. Baraniuk},
	year={2024},
	eprint={2402.05000},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```