docs: Update Refact 1.6B FIM GGUF Documentation
Browse files- Add Acknowledgments section for tensor conversion
- Include example shell command for testing against Hugging Face
- Resolve llama.cpp issue #3061
This commit updates the documentation for the Refact 1.6B FIM GGUF model, adding an Acknowledgments section and providing additional example usage. It also addresses a specific issue in the llama.cpp repository.
README.md
ADDED
@@ -0,0 +1,110 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
pipeline_tag: text-generation
|
3 |
+
inference: true
|
4 |
+
widget:
|
5 |
+
- text: 'def print_hello_world():'
|
6 |
+
example_title: Hello world
|
7 |
+
group: Python
|
8 |
+
license: bigscience-openrail-m
|
9 |
+
pretrain-datasets:
|
10 |
+
- books
|
11 |
+
- arxiv
|
12 |
+
- c4
|
13 |
+
- falcon-refinedweb
|
14 |
+
- wiki
|
15 |
+
- github-issues
|
16 |
+
- stack_markdown
|
17 |
+
- self-made dataset of permissive github code
|
18 |
+
datasets:
|
19 |
+
- bigcode/the-stack-dedup
|
20 |
+
- rombodawg/2XUNCENSORED_MegaCodeTraining188k
|
21 |
+
- bigcode/commitpackft
|
22 |
+
library_name: llama.cpp
|
23 |
+
tags:
|
24 |
+
- code
|
25 |
+
language:
|
26 |
+
- en
|
27 |
+
---
|
28 |
+
|
29 |
+
# Refact 1.6B FIM GGUF
|
30 |
+
|
31 |
+
## Introduction
|
32 |
+
|
33 |
+
The Refact 1.6B FIM GGUF model is a state-of-the-art AI-powered coding assistant developed by Small Magellanic Cloud AI Ltd. This model is designed to assist developers with code completion, refactoring, and chat-based interactions, excelling in code-related natural language understanding and generation tasks.
|
34 |
+
|
35 |
+
## Quantized Model Files
|
36 |
+
|
37 |
+
The model comes in various quantized versions to suit different computational needs:
|
38 |
+
|
39 |
+
- **refact-1.6B-fim-q4_0.gguf**: A 4-bit quantized model with a file size of 878 MB.
|
40 |
+
- **refact-1.6B-fim-q5_0.gguf**: A 5-bit quantized model with a file size of 1.1 GB.
|
41 |
+
- **refact-1.6B-fim-q8_0.gguf**: An 8-bit quantized model with a file size of 1.6 GB.
|
42 |
+
|
43 |
+
## Features and Usage
|
44 |
+
|
45 |
+
The model is versatile and can be employed for:
|
46 |
+
|
47 |
+
- Code completion
|
48 |
+
- Code refactoring
|
49 |
+
- Chat-based interactions
|
50 |
+
|
51 |
+
### Example Usage
|
52 |
+
|
53 |
+
Here's a sample shell command to invoke the model:
|
54 |
+
|
55 |
+
```sh
|
56 |
+
# Sample shell command to use the model
|
57 |
+
./main -m models/smallcloudai/Refact-1_6B-fim/ggml-model-f16.gguf -n 300 -p "write a function to multiply two integers in python" --temp 1.0 --top-p 1.0 --top-k 1 --repeat_penalty 1.0
|
58 |
+
```
|
59 |
+
|
60 |
+
## Performance Metrics
|
61 |
+
|
62 |
+
The model outperforms many existing models in both code completion and chat-based interactions, as evidenced by the HumanEval results.
|
63 |
+
|
64 |
+
| Model | Size | HumanEval pass@1 | HumanEval pass@10 |
|
65 |
+
|----------------------|-------|------------------|-------------------|
|
66 |
+
| **Refact-1.6-fim** | 1.6b | 32.0% | 53.0% |
|
67 |
+
| StableCode | 3b | 20.2% | 33.8% |
|
68 |
+
| ReplitCode v1 | 3b | 21.9% | N/A |
|
69 |
+
|
70 |
+
## Installation and Setup
|
71 |
+
|
72 |
+
The model can be integrated into your IDE via the [Refact plugin](https://refact.ai/). For self-hosting, an [open-source Docker container](https://github.com/smallcloudai/refact) is available.
|
73 |
+
|
74 |
+
## Limitations and Bias
|
75 |
+
|
76 |
+
The model primarily focuses on English text, which may result in lower performance for non-English languages.
|
77 |
+
|
78 |
+
## Technical Specifications
|
79 |
+
|
80 |
+
- **Architecture**: LLAMA-like model with multi-query attention
|
81 |
+
- **Training Tokens**: 1.2T for pretraining, 40B for fine-tuning
|
82 |
+
- **Precision**: bfloat16
|
83 |
+
- **Training Time**: 28 days
|
84 |
+
|
85 |
+
## License
|
86 |
+
|
87 |
+
The model is licensed under the BigScience OpenRAIL-M v1 license agreement.
|
88 |
+
|
89 |
+
## Citation
|
90 |
+
|
91 |
+
If you use this model in your work, please cite it by linking back to the following page for proper attribution:
|
92 |
+
|
93 |
+
[Refact 1.6B FIM Model](https://huggingface.co/smallcloudai/Refact-1_6B-fim)
|
94 |
+
|
95 |
+
## Acknowledgments
|
96 |
+
|
97 |
+
Special thanks to [ds5t5](https://github.com/ggerganov/llama.cpp/pull/3329) for their contribution in implementing the source for converting the model's tensors from Hugging Face to GGUF format. Their work has been instrumental in enhancing the model's versatility.
|
98 |
+
|
99 |
+
### Example Command for Testing
|
100 |
+
|
101 |
+
To test the model against Hugging Face, you can use the following command:
|
102 |
+
|
103 |
+
```sh
|
104 |
+
# Example command for testing against Hugging Face
|
105 |
+
python3 convert-refact-hf-to-gguf.py ./Refact-1_6B-fim 1
|
106 |
+
|
107 |
+
./main -m ./Refact-1_6B-fim/ggml-model-f16.gguf -n 300 -p "write a function to multiply two integers in python" --temp 1.0 --top-p 1.0 --top-k 1 --repeat_penalty 1.0
|
108 |
+
```
|
109 |
+
|
110 |
+
This resolves llama.cpp issue [#3061](https://github.com/ggerganov/llama.cpp/issues/3061).
|