uploaded readme

51ec1d0 verified 4 months ago

7.52 kB

	Quantization made by Richard Erkhov.

	[Github](https://github.com/RichardErkhov)

	[Discord](https://discord.gg/pvy7H8DZMG)

	[Request more models](https://github.com/RichardErkhov/quant_request)


	PowerMoE-3b - GGUF
	- Model creator: https://huggingface.co/ibm/
	- Original model: https://huggingface.co/ibm/PowerMoE-3b/


	\| Name \| Quant method \| Size \|
	\| ---- \| ---- \| ---- \|
	\| [PowerMoE-3b.Q2_K.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q2_K.gguf) \| Q2_K \| 1.18GB \|
	\| [PowerMoE-3b.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.IQ3_XS.gguf) \| IQ3_XS \| 1.32GB \|
	\| [PowerMoE-3b.IQ3_S.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.IQ3_S.gguf) \| IQ3_S \| 1.39GB \|
	\| [PowerMoE-3b.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q3_K_S.gguf) \| Q3_K_S \| 1.39GB \|
	\| [PowerMoE-3b.IQ3_M.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.IQ3_M.gguf) \| IQ3_M \| 1.41GB \|
	\| [PowerMoE-3b.Q3_K.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q3_K.gguf) \| Q3_K \| 1.53GB \|
	\| [PowerMoE-3b.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q3_K_M.gguf) \| Q3_K_M \| 1.53GB \|
	\| [PowerMoE-3b.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q3_K_L.gguf) \| Q3_K_L \| 1.65GB \|
	\| [PowerMoE-3b.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.IQ4_XS.gguf) \| IQ4_XS \| 1.72GB \|
	\| [PowerMoE-3b.Q4_0.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q4_0.gguf) \| Q4_0 \| 1.79GB \|
	\| [PowerMoE-3b.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.IQ4_NL.gguf) \| IQ4_NL \| 1.81GB \|
	\| [PowerMoE-3b.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q4_K_S.gguf) \| Q4_K_S \| 1.81GB \|
	\| [PowerMoE-3b.Q4_K.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q4_K.gguf) \| Q4_K \| 1.92GB \|
	\| [PowerMoE-3b.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q4_K_M.gguf) \| Q4_K_M \| 1.92GB \|
	\| [PowerMoE-3b.Q4_1.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q4_1.gguf) \| Q4_1 \| 1.99GB \|
	\| [PowerMoE-3b.Q5_0.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q5_0.gguf) \| Q5_0 \| 2.18GB \|
	\| [PowerMoE-3b.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q5_K_S.gguf) \| Q5_K_S \| 2.18GB \|
	\| [PowerMoE-3b.Q5_K.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q5_K.gguf) \| Q5_K \| 2.24GB \|
	\| [PowerMoE-3b.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q5_K_M.gguf) \| Q5_K_M \| 2.24GB \|
	\| [PowerMoE-3b.Q5_1.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q5_1.gguf) \| Q5_1 \| 2.37GB \|
	\| [PowerMoE-3b.Q6_K.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q6_K.gguf) \| Q6_K \| 2.59GB \|
	\| [PowerMoE-3b.Q8_0.gguf](https://huggingface.co/RichardErkhov/ibm_-_PowerMoE-3b-gguf/blob/main/PowerMoE-3b.Q8_0.gguf) \| Q8_0 \| 3.35GB \|




	Original model description:
	---
	pipeline_tag: text-generation
	inference: false
	license: apache-2.0
	library_name: transformers
	model-index:
	- name: ibm/PowerMoE-3b
	results:
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: ARC
	metrics:
	- name: accuracy-norm
	type: accuracy-norm
	value: 58.1
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: BoolQ
	metrics:
	- name: accuracy
	type: accuracy
	value: 65.0
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: Hellaswag
	metrics:
	- name: accuracy-norm
	type: accuracy-norm
	value: 71.5
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: OpenBookQA
	metrics:
	- name: accuracy-norm
	type: accuracy-norm
	value: 41.0
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: PIQA
	metrics:
	- name: accuracy-norm
	type: accuracy-norm
	value: 79.1
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: Winogrande
	metrics:
	- name: accuracy-norm
	type: accuracy-norm
	value: 65.0
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: MMLU (5 shot)
	metrics:
	- name: accuracy
	type: accuracy
	value: 42.8
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: GSM8k (5 shot)
	metrics:
	- name: accuracy
	type: accuracy
	value: 25.9
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: math (4 shot)
	metrics:
	- name: accuracy
	type: accuracy
	value: 14.8
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode-eval
	name: humaneval
	metrics:
	- name: pass@1
	type: pass@1
	value: 20.1
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode-eval
	name: MBPP
	metrics:
	- name: pass@1
	type: pass@1
	value: 32.4
	verified: false
	---

	## Model Summary
	PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.
	Paper: https://arxiv.org/abs/2408.13359

	## Usage
	Note: Requires installing HF transformers from source.

	### Generation
	This is a simple example of how to use PowerMoE-3b model.

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	device = "cuda" # or "cpu"
	model_path = "ibm/PowerMoE-3b"
	tokenizer = AutoTokenizer.from_pretrained(model_path)
	# drop device_map if running on CPU
	model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
	model.eval()
	# change input text as desired
	prompt = "Write a code to find the maximum value in a list of numbers."
	# tokenize the text
	input_tokens = tokenizer(prompt, return_tensors="pt")
	# transfer tokenized inputs to the device
	for i in input_tokens:
	input_tokens[i] = input_tokens[i].to(device)
	# generate output tokens
	output = model.generate(**input_tokens, max_new_tokens=100)
	# decode output tokens into text
	output = tokenizer.batch_decode(output)
	# loop over the batch to print, in this example the batch size is 1
	for i in output:
	print(i)
	```


	Additional thanks to @nicoboss for giving me access to his private supercomputer, enabling me to provide many more quants, at much higher speed, than I would otherwise be able to.