Adding Evaluation Results

9edef8d verified 4 months ago

7.25 kB

	---
	language:
	- multilingual
	license: gemma
	library_name: transformers
	tags:
	- nlp
	- code
	base_model: google/gemma-2-2b-jpn-it
	datasets:
	- mlabonne/orpo-dpo-mix-40k
	license_link: https://ai.google.dev/gemma/terms
	pipeline_tag: text-generation
	quantized_by: ymcki
	widget:
	- messages:
	- role: user
	content: Can you provide ways to eat combinations of bananas and dragonfruits?
	model-index:
	- name: gemma-2-2b-jpn-it-abliterated-17-ORPO
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: IFEval (0-Shot)
	type: HuggingFaceH4/ifeval
	args:
	num_few_shot: 0
	metrics:
	- type: inst_level_strict_acc and prompt_level_strict_acc
	value: 49.48
	name: strict accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ymcki/gemma-2-2b-jpn-it-abliterated-17-ORPO
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BBH (3-Shot)
	type: BBH
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 14.92
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ymcki/gemma-2-2b-jpn-it-abliterated-17-ORPO
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MATH Lvl 5 (4-Shot)
	type: hendrycks/competition_math
	args:
	num_few_shot: 4
	metrics:
	- type: exact_match
	value: 2.87
	name: exact match
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ymcki/gemma-2-2b-jpn-it-abliterated-17-ORPO
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GPQA (0-shot)
	type: Idavidrein/gpqa
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 3.24
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ymcki/gemma-2-2b-jpn-it-abliterated-17-ORPO
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MuSR (0-shot)
	type: TAUR-Lab/MuSR
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 5.67
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ymcki/gemma-2-2b-jpn-it-abliterated-17-ORPO
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU-PRO (5-shot)
	type: TIGER-Lab/MMLU-Pro
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 13.18
	name: accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ymcki/gemma-2-2b-jpn-it-abliterated-17-ORPO
	name: Open LLM Leaderboard
	---

	Original model: https://huggingface.co/google/gemma-2-2b-jpn-it

	## Prompt format

	```
	<start_of_turn>user
	{prompt}<end_of_turn>
	<start_of_turn>model
	<end_of_turn>
	<start_of_turn>model

	```

	Note that this model does not support a System prompt.

	This is abliterated model of [google/gemma-2-2b-jpn-it](https://huggingface.co/google/gemma-2-2b-jpn-it) using the
	[method](https://medium.com/@mlabonne/uncensor-any-llm-with-abliteration-d30148b7d43e)
	described by mlabonne.

	Layer 17 of the original model was chosen for abliteration.
	I also created another layer 18 abliterated model for comparison.

	ORPO fine tuning was performed for four epoches.

	\| Epoch \| loss \| eval_loss \|
	\| ----- \| ---- \| --------- \|
	\| 1 \| 1.20152769684791564 \| 1.0501047372817993 \|
	\| 2 \| 1.25755584239959716 \| 1.0144596099853516 \|
	\| 3 \| 0.93099724054336543 \| 0.9957754611968994 \|
	\| 4 \| 0.88664623498916623 \| 0.9857067465782166 \|

	The fine tuned model is uploaded here to be evaluated by the Open LLM Leaderboard to see if the slightly brain damaged non-ORPO model can be healed. Again, the fine tuning method is also based on one described by [mlabonne](https://towardsdatascience.com/fine-tune-llama-3-with-orpo-56cfab2f9ada) but the input model was read into VRAM by [unsloth](https://github.com/unslothai/unsloth) to allow using the full 40k dataset to run on a single 3090.

	## Benchmark (100.0*raw scores only)

	Click on the model name go to the raw score json generated by Open LLM Leaderboard.

	\| Model \| Average \| IFEval \| BHH \| Math Lv5 \| GPQA \| MUSR \| MMLU-PRO \|
	\| ----- \| ------- \| ------ \| ----\|--------- \| ---- \| ---- \| -------- \|
	\| [gemma-2-2b-jpn-it](https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/google/gemma-2-2b-jpn-it/results_2024-10-15T15-21-39.173019.json) \| 30.82 \| 54.11 \| 41.43 \| 0.0 \| 27.52 \| 37.17 \| 24.67 \|
	\| [gemma-2-2b-jpn-it-abliterated-17-ORPO](https://huggingface.co/datasets/open-llm-leaderboard/results/raw/main/ymcki/gemma-2-2b-jpn-it-abliterated-17-ORPO/results_2024-10-20T02-46-59.069357.json) \| 29.99 \| 50.94 \| 38.59 \| 2.87 \| 27.43 \| 38.23 \| 21.86 \|
	\| [gemma-2-2b-jpn-it-abliterated-17](https://huggingface.co/datasets/open-llm-leaderboard/results/raw/main/ymcki/gemma-2-2b-jpn-it-abliterated-17/results_2024-10-18T15-18-46.821674.json) \| 30.29 \| 52.65 \| 40.46 \| 0.0 \| 27.18 \| 36.90 \| 24.55 \|
	\| [gemma-2-2b-jpn-it-abliterated-18](https://huggingface.co/datasets/open-llm-leaderboard/results/raw/main/ymcki/gemma-2-2b-jpn-it-abliterated-18/results_2024-10-18T15-41-42.399571.json) \| 30.61 \| 53.02 \| 40.96 \| 0.0 \| 27.35 \| 37.30 \| 25.05 \|

	Looks like fine tuning is probably not enough. May need to run more epoches.

	## How to run this model

	```py
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import transformers
	import torch

	model_id = "gemma-2-2b-jpn-it-abliterated-17-ORPO"
	dtype = torch.bfloat16

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	device_map="cuda",
	torch_dtype=dtype,)

	chat = [
	{ "role": "user", "content": "Write a hello world program" },
	]
	prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
	```

	## Downloading using huggingface-cli

	First, make sure you have hugginface-cli installed:

	```
	pip install -U "huggingface_hub[cli]"
	```

	Then, you can target the specific file you want:

	```
	huggingface-cli download ymcki/gemma-2-2b-jpn-it-abliterated-17-ORPO --include "*" --local-dir ./
	```

	## Credits

	Thank you mlabonne for describing his fine tuning method.

	Thanks FullOf_Bad_Ideas from LocalLlama for the suggestion of using unsloth to save VRAM.

	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ymcki__gemma-2-2b-jpn-it-abliterated-17-ORPO)

	\| Metric \|Value\|
	\|-------------------\|----:\|
	\|Avg. \|14.89\|
	\|IFEval (0-Shot) \|49.48\|
	\|BBH (3-Shot) \|14.92\|
	\|MATH Lvl 5 (4-Shot)\| 2.87\|
	\|GPQA (0-shot) \| 3.24\|
	\|MuSR (0-shot) \| 5.67\|
	\|MMLU-PRO (5-shot) \|13.18\|