Spaces:

compressed-llm
/

README

Running

App Files Files Community

jyhong836 commited on Dec 1, 2023

Commit

5b0fb94

1 Parent(s): 42509a6

Update README.md

Browse files

Files changed (1) hide show

README.md +100 -4

README.md CHANGED Viewed

@@ -1,10 +1,106 @@
 ---
 title: README
-emoji: 📈
-colorFrom: yellow
-colorTo: red
 sdk: static
 pinned: false
 ---
-Edit this `README.md` markdown file to author your organization card.

 ---
 title: README
+emoji: 🐇
+colorFrom: pink
+colorTo: indigo
 sdk: static
 pinned: false
 ---
+# Compressed LLM Model Zone
+The models are prepared by [Visual Informatics Group @ University of Texas at Austin (VITA-group)](https://vita-group.github.io/) and [LLNL]().
+Credits to Ajay Jaiswal, Jinhao Duan, Zhenyu Zhang, Zhangheng Li, Lu Yin, Shiwei Liu and Junyuan Hong.
+License: [MIT License](https://opensource.org/license/mit/)
+Setup environment
+```shell
+pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117
+pip install transformers==4.31.0
+pip install accelerate
+pip install auto-gptq  # for gptq
+```
+How to use pruned models
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+base_model = 'llama-2-7b'
+comp_method = 'magnitude_unstructured'
+comp_degree = 0.2
+model_path = f'compressed-llm/{base_model}_{comp_method}'
+model = AutoModelForCausalLM.from_pretrained(
+        model_path,
+        revision=f's{comp_degree}',
+        torch_dtype=torch.float16,
+        low_cpu_mem_usage=True,
+        device_map="auto"
+    )
+tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf')
+input_ids = tokenizer('Hello! I am a compressed-LLM chatbot!', return_tensors='pt').input_ids.cuda()
+outputs = model.generate(input_ids, max_new_tokens=128)
+print(tokenizer.decode(outputs[0]))
+```
+How to use wanda+gptq models
+```python
+from transformers import AutoTokenizer
+from auto_gptq import AutoGPTQForCausalLM
+model_path = 'compressed-llm/llama-2-7b_wanda_2_4_gptq_4bit_128g'
+tokenizer_path = 'meta-llama/Llama-2-7b-hf'
+model = AutoGPTQForCausalLM.from_quantized(
+        model_path,
+        # inject_fused_attention=False, # or
+        disable_exllama=True,
+        device_map='auto',
+    )
+tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, trust_remote_code=True)
+input_ids = tokenizer('Hello! I am a VITA-compressed-LLM chatbot!', return_tensors='pt').input_ids.to('cuda')
+outputs = model.generate(input_ids=input_ids, max_length=128)
+tokenizer.decode(outputs[0])
+```
+How to use gptq models
+```python
+from transformers import AutoTokenizer
+from auto_gptq import AutoGPTQForCausalLM
+# model_path = 'compressed-llm/llama-2-7b_wanda_2_4_gptq_4bit_128g'
+# tokenizer_path = 'meta-llama/Llama-2-7b-hf'
+model_path = 'compressed-llm/vicuna-7b-v1.3_gptq'
+tokenizer_path = 'lmsys/vicuna-7b-v1.3'
+model = AutoGPTQForCausalLM.from_quantized(
+        model_path,
+        # inject_fused_attention=False, # or
+        disable_exllama=True,
+        device_map='auto',
+        revision='2bit_128g',
+    )
+from transformers import AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, trust_remote_code=True)
+input_ids = tokenizer('Hello! I am a VITA-compressed-LLM chatbot!', return_tensors='pt').input_ids.to('cuda')
+outputs = model.generate(input_ids=input_ids, max_length=128)
+tokenizer.decode(outputs[0])
+```
+## Citations
+If you are using models in this hub, please consider citing our papers.
+```bibtex
+@article{jaiswal2023emergence,
+  title={The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter},
+  author={Jaiswal, Ajay and Liu, Shiwei and Chen, Tianlong and Wang, Zhangyang},
+  journal={arXiv},
+  year={2023}
+}
+@article{jaiswal2023compressing,
+      title={Compressing LLMs: The Truth is Rarely Pure and Never Simple},
+      author={Ajay Jaiswal and Zhe Gan and Xianzhi Du and Bowen Zhang and Zhangyang Wang and Yinfei Yang},
+      year={2023},
+      journal={arXiv},
+}
+```
+For any question, please contact [Junyuan Hong](mailto:[email protected]).