MiniChat-2-3B - DeepSparse

This repo contains model files for MiniChat-2-3B optimized for DeepSparse, a CPU inference runtime for sparse models.

This model was quantized and pruned with SparseGPT, using SparseML.

Inference

Install DeepSparse LLM for fast inference on CPUs:

pip install deepsparse-nightly[llm]

Run in a Python pipeline:

from deepsparse import TextGeneration

prompt = "How to get in a good university?"
formatted_prompt =  f"<s> [|User|]\n{prompt}</s>[|Assistant|]\n"

model = TextGeneration(model_path="hf:nm-testing/MiniChat-2-3B-pruned50-ds")

print(model(formatted_prompt, max_new_tokens=500).generations[0].text)
"""
Getting into a good university is a complex process that involves several steps. However, here are some key factors to consider:

1. Academic performance: Your grades, test scores, and overall academic achievements are essential in demonstrating your academic abilities. Strive to maintain a high GPA and achieve strong scores in standardized tests like the SAT, ACT, or AP exams.

2. Academic preparation: Develop a strong foundation in various subjects, including English, math, science, and foreign languages. This will help you succeed academically and demonstrate your readiness for college-level courses.

3. Extracurricular activities: Participate in extracurricular activities such as clubs, sports teams, volunteering, or leadership roles. These activities can help you develop valuable skills, demonstrate your leadership abilities, and showcase your interests outside the classroom.

4. Academic preparation for college-level courses: Research and understand the curriculum of the universities you are interested in. Familiarize yourself with the coursework, coursework requirements, and any potential prerequisites.

5. Personal qualities and extracurricular activities: Showcase your unique qualities and extracurricular activities that demonstrate your leadership, teamwork, and problem-solving abilities. Universities value students who are well-rounded and have a diverse set of skills.

6. Application process: Follow the university's application process, which may include submitting an application form, paying the application fee, and submitting any required documents.

7. Interviews and assessments: If you are invited to an interview, prepare for it by researching the university, its campus, and its mission. Be confident, articulate, and demonstrate your enthusiasm for the university.

8. Post-admission process: After being accepted, follow the university's post-admission process, which may include registering for classes, paying tuition fees, and obtaining student housing.

9. Networking and mentorship: Connect with professors, professors, and fellow students to gain insights into the university culture and gain a deeper understanding of the college environment.

10. Financial support: Research financial aid options and scholarships available to help cover tuition fees and living
"""

Prompt template


  <s> [|User|]\n
  {prompt}
  </s>[|Assistant|]\n

Sparsification

For details on how this model was sparsified, see the recipe.yaml in this repo and follow the instructions below.

git clone https://github.com/neuralmagic/sparseml
pip install -e "sparseml[transformers]"
python sparseml/src/sparseml/transformers/sparsification/obcq/obcq.py GeneZC/MiniChat-2-3B open_platypus --recipe recipe.yaml --save True
python sparseml/src/sparseml/transformers/sparsification/obcq/export.py --task text-generation --model_path obcq_deployment 
cp deployment/model.onnx deployment/model-orig.onnx

Run this kv-cache injection to speed up the model at inference by caching the Key and Value states:

import os
import onnx
from sparseml.exporters.kv_cache_injector import KeyValueCacheInjector
input_file = "deployment/model-orig.onnx"
output_file = "deployment/model.onnx"
model = onnx.load(input_file, load_external_data=False)
model = KeyValueCacheInjector(model_path=os.path.dirname(input_file)).apply(model)
onnx.save(model, output_file)
print(f"Modified model saved to: {output_file}")

Follow the instructions on our One Shot With SparseML page for a step-by-step guide for performing one-shot quantization of large language models.

Slack

For further support, and discussions on these models and AI in general, join Neural Magic's Slack Community

nm-testing
/

MiniChat-2-3B-pruned50-ds

MiniChat-2-3B - DeepSparse

Inference

Prompt template

Sparsification

Slack

Model tree for nm-testing/MiniChat-2-3B-pruned50-ds