ThatsGroes
/

gemma-2-27b-it-FP8-Dynamic

Model card Files Files and versions Community

gemma-2-27b-it-FP8-Dynamic / README.md

saattrupdan's picture

Add pipeline tag to model

d22bbaa verified about 2 months ago

|

1.04 kB

	---
	license: gemma
	base_model:
	- google/gemma-2-27b-it
	pipeline_tag: text-generation
	---

	FP-8 quantized version of google/gemma-2-27b-it quantized with compute sponsored by Arrow and Nvidia through Danish Data Science Community.

	Quantized using this script:

	```python
	from llmcompressor.transformers import SparseAutoModelForCausalLM
	from transformers import AutoTokenizer
	from llmcompressor.transformers import oneshot
	from llmcompressor.modifiers.quantization import QuantizationModifier


	MODEL_ID = "google/gemma-2-27b-it"

	model = SparseAutoModelForCausalLM.from_pretrained(
	MODEL_ID, device_map="auto", torch_dtype="auto")
	tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)


	# Configure the simple PTQ quantization
	recipe = QuantizationModifier(
	targets="Linear", scheme="FP8_DYNAMIC", ignore=["lm_head"])

	# Apply the quantization algorithm.
	oneshot(model=model, recipe=recipe)

	# Save the model.
	SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-Dynamic"
	model.save_pretrained(SAVE_DIR)
	tokenizer.save_pretrained(SAVE_DIR)
	```