docs: Update Refact 1.6B FIM GGUF Documentation

- Add Acknowledgments section for tensor conversion
- Include example shell command for testing against Hugging Face
- Resolve llama.cpp issue #3061

This commit updates the documentation for the Refact 1.6B FIM GGUF model, adding an Acknowledgments section and providing additional example usage. It also addresses a specific issue in the llama.cpp repository.

Files changed (1) hide show

README.md +110 -0

README.md ADDED Viewed

	@@ -0,0 +1,110 @@

+---
+pipeline_tag: text-generation
+inference: true
+widget:
+- text: 'def print_hello_world():'
+  example_title: Hello world
+  group: Python
+license: bigscience-openrail-m
+pretrain-datasets:
+- books
+- arxiv
+- c4
+- falcon-refinedweb
+- wiki
+- github-issues
+- stack_markdown
+- self-made dataset of permissive github code
+datasets:
+- bigcode/the-stack-dedup
+- rombodawg/2XUNCENSORED_MegaCodeTraining188k
+- bigcode/commitpackft
+library_name: llama.cpp
+tags:
+- code
+language:
+- en
+---
+# Refact 1.6B FIM GGUF
+## Introduction
+The Refact 1.6B FIM GGUF model is a state-of-the-art AI-powered coding assistant developed by Small Magellanic Cloud AI Ltd. This model is designed to assist developers with code completion, refactoring, and chat-based interactions, excelling in code-related natural language understanding and generation tasks.
+## Quantized Model Files
+The model comes in various quantized versions to suit different computational needs:
+- **refact-1.6B-fim-q4_0.gguf**: A 4-bit quantized model with a file size of 878 MB.
+- **refact-1.6B-fim-q5_0.gguf**: A 5-bit quantized model with a file size of 1.1 GB.
+- **refact-1.6B-fim-q8_0.gguf**: An 8-bit quantized model with a file size of 1.6 GB.
+## Features and Usage
+The model is versatile and can be employed for:
+- Code completion
+- Code refactoring
+- Chat-based interactions
+### Example Usage
+Here's a sample shell command to invoke the model:
+```sh
+# Sample shell command to use the model
+./main -m models/smallcloudai/Refact-1_6B-fim/ggml-model-f16.gguf -n 300 -p "write a function to multiply two integers in python" --temp 1.0 --top-p 1.0 --top-k 1 --repeat_penalty 1.0
+```
+## Performance Metrics
+The model outperforms many existing models in both code completion and chat-based interactions, as evidenced by the HumanEval results.
+| Model                | Size  | HumanEval pass@1 | HumanEval pass@10 |
+|----------------------|-------|------------------|-------------------|
+| **Refact-1.6-fim**   | 1.6b  | 32.0%            | 53.0%             |
+| StableCode           | 3b    | 20.2%            | 33.8%             |
+| ReplitCode v1        | 3b    | 21.9%            | N/A               |
+## Installation and Setup
+The model can be integrated into your IDE via the [Refact plugin](https://refact.ai/). For self-hosting, an [open-source Docker container](https://github.com/smallcloudai/refact) is available.
+## Limitations and Bias
+The model primarily focuses on English text, which may result in lower performance for non-English languages.
+## Technical Specifications
+- **Architecture**: LLAMA-like model with multi-query attention
+- **Training Tokens**: 1.2T for pretraining, 40B for fine-tuning
+- **Precision**: bfloat16
+- **Training Time**: 28 days
+## License
+The model is licensed under the BigScience OpenRAIL-M v1 license agreement.
+## Citation
+If you use this model in your work, please cite it by linking back to the following page for proper attribution:
+[Refact 1.6B FIM Model](https://huggingface.co/smallcloudai/Refact-1_6B-fim)
+## Acknowledgments
+Special thanks to [ds5t5](https://github.com/ggerganov/llama.cpp/pull/3329) for their contribution in implementing the source for converting the model's tensors from Hugging Face to GGUF format. Their work has been instrumental in enhancing the model's versatility.
+### Example Command for Testing
+To test the model against Hugging Face, you can use the following command:
+```sh
+# Example command for testing against Hugging Face
+python3 convert-refact-hf-to-gguf.py ./Refact-1_6B-fim 1
+./main -m ./Refact-1_6B-fim/ggml-model-f16.gguf -n 300 -p "write a function to multiply two integers in python"  --temp 1.0 --top-p 1.0 --top-k 1 --repeat_penalty 1.0
+```
+This resolves llama.cpp issue [#3061](https://github.com/ggerganov/llama.cpp/issues/3061).