File size: 9,449 Bytes
d680c07
 
732452e
cb89cd1
d680c07
 
 
732452e
 
 
 
2eed54b
732452e
 
78a40a5
732452e
 
 
 
 
 
 
 
 
 
 
7cd4f5e
732452e
 
 
7cd4f5e
 
732452e
 
d680c07
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
license: apache-2.0
base_model: internlm/AlchemistCoder-DS-6.7B
inference: false
tags:
- code generation
---

# AlchemistCoder-DS-6.7B-exl2

Original model: [AlchemistCoder-DS-6.7B](https://huggingface.co/internlm/AlchemistCoder-DS-6.7B)  
Model creator: [InternLM](https://huggingface.co/internlm)

## Quants
[4bpw h6 (main)](https://huggingface.co/cgus/AlchemistCoder-DS-6.7B-exl2/tree/main)  
[4.25bpw h6](https://huggingface.co/cgus/AlchemistCoder-DS-6.7B-exl2/tree/4.25bpw-h6)  
[4.65bpw h6](https://huggingface.co/cgus/AlchemistCoder-DS-6.7B-exl2/tree/4.65bpw-h6)  
[5bpw h6](https://huggingface.co/cgus/AlchemistCoder-DS-6.7B-exl2/tree/5bpw-h6)  
[6bpw h6](https://huggingface.co/cgus/AlchemistCoder-DS-6.7B-exl2/tree/6bpw-h6)  
[8bpw h8](https://huggingface.co/cgus/AlchemistCoder-DS-6.7B-exl2/tree/8bpw-h8)
## Quantization notes
Made with Exllamav2 0.1.3 with the default dataset.
## How to run
This model is meant to be used with Exllamav2 loader that requires the model to be fully loaded into GPU VRAM.  
It primarily requires a Nvidia RTX card on Windows/Linux or AMD card on Linux.  
If you want to use this model but your system doesn't meet these requirements, you should look for GGUF versions of the model.  
It can be used with apps like:  
[Text Generation Webui](https://github.com/oobabooga/text-generation-webui)  
[KoboldAI](https://github.com/henk717/KoboldAI)  
[ExUI](https://github.com/turboderp/exui)  
[lollms-webui](https://github.com/ParisNeo/lollms-webui)  

# Original model card

# AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data

[[🤗 HuggingFace](https://huggingface.co/internlm/AlchemistCoder-DS-6.7B)]
[[📃 Paper](https://arxiv.org/abs/2405.19265)]
[[🌐 Project Page](https://internlm.github.io/AlchemistCoder/)]


## ✨ Highlights
> **Abstract:** *Open-source Large Language Models (LLMs) and their specialized variants, particularly Code LLMs, have recently delivered impressive performance. However, previous Code LLMs are typically fine-tuned on single-source data with limited quality and diversity, which may insufficiently elicit the potential of pre-trained Code LLMs. In this paper, we present AlchemistCoder, a series of Code LLMs with enhanced code generation and generalization capabilities fine-tuned on multi-source data. To achieve this, we pioneer to unveil inherent conflicts among the various styles and qualities in multi-source code corpora and introduce data-specific prompts with hindsight relabeling, termed AlchemistPrompts, to harmonize different data sources and instruction-response pairs. Additionally, we propose incorporating the data construction process into the fine-tuning data as code comprehension tasks, including instruction evolution, data filtering, and code review. Extensive experiments demonstrate that AlchemistCoder holds a clear lead among all models of the same size (6.7B/7B) and rivals or even surpasses larger models (15B/33B/70B), showcasing the efficacy of our method in refining instruction-following capabilities and advancing the boundaries of code intelligence.* 

- **AlchemistPrompts**: Designed as data-specific prompts for harmonizing inherent conflicts in multi-source data and mitigating the instruction/response misalignment at a fined-grained level.
- **Code Comprehenstion Tasks**: Sourced from the process of data construction, consisting of instruction evolution, data filtering, and code review.
- **Harmonized Multi-source Data**: Instruction tuned on 200M tokens, including 6 types of high-quality data.
- **Superior Model Performance**: Surpassing all the open-source models of the same size (6.7/7B), and rivaling or even beating larger models (15B/33B/70B/ChatGPT) on 6 code benchmarks.
- **Advanced generic capabilities**: Demonstrated by the significant improvements on MMLU, BBH, and GSM8K.
  

## 🚀 Quick Start
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("internlm/AlchemistCoder-DS-6.7B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("internlm/AlchemistCoder-DS-6.7B", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
model = model.eval()

input_text = "Implement the Dijkstra algorithm in Python"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

The above code will generate the following result:
```python
import heapq

def dijkstra(graph, start, end):
    """
    Find the shortest path between two nodes in a graph using Dijkstra's algorithm.

    Args:
        graph (dict): A dictionary representing the graph, where the keys are nodes and the values are dictionaries
                      containing the neighboring nodes and their edge weights.
        start: The starting node.
        end: The target node.

    Returns:
        list: The shortest path from the start node to the target node as a list of nodes.

    Raises:
        ValueError: If either the start or end node is not present in the graph.

    """

    # Check if start and end nodes are in the graph
    if start not in graph:
        raise ValueError("Start node is not present in the graph.")
    if end not in graph:
        raise ValueError("End node is not present in the graph.")

    # Initialize the distance dictionary with infinite distances for all nodes
    distances = {node: float('inf') for node in graph}

    # Set the distance of the start node to 0
    distances[start] = 0

    # Initialize the heap with the start node
    heap = [(0, start)]

    # Initialize the previous dictionary to keep track of the path
    previous = {}

    while heap:
        # Pop the node with the smallest distance from the heap
        current_distance, current_node = heapq.heappop(heap)

        # If the current node is the end node, we have found the shortest path
        if current_node == end:
            # Reconstruct the path by following the previous nodes
            path = []
            while current_node in previous:
                path.append(current_node)
                current_node = previous[current_node]
            path.append(start)
            # Reverse the path to get the correct order
            return path[::-1]

        # Iterate over the neighboring nodes and update their distances
        for neighbor, weight in graph[current_node].items():
            new_distance = current_distance + weight
            # If a shorter path is found, update the distance and previous node
            if new_distance < distances[neighbor]:
                distances[neighbor] = new_distance
                previous[neighbor] = current_node
                heapq.heappush(heap, (new_distance, neighbor))

    # If there is no path between the start and end nodes, return an empty list
    return []
```

> The `dijkstra` function takes three arguments: `graph`, `start`, and `end`. The `graph` argument is a dictionary representing the graph, where the keys are nodes and the values are dictionaries containing the neighboring nodes and their edge weights. The `start` argument is the starting node, and the `end` argument is the target node.
> The function first checks if the start and end nodes are present in the graph. If either node is not present, a `ValueError` is raised.
> The function then initializes a `distances` dictionary with infinite distances for all nodes. It sets the distance of the start node to 0. It also initializes a heap with the start node and a `previous` dictionary to keep track of the path.
> The algorithm then iterates over the nodes in the heap. For each node, it checks if it is the end node. If it is, the function reconstructs the path by following the previous nodes and returns the shortest path as a list of nodes in the correct order.
> If the current node is not the end node, the algorithm iterates over its neighboring nodes and updates their distances if a shorter path is found. It also updates the `previous` dictionary to keep track of the path.
> If there is no path between the start and end nodes, the function returns an empty list.
> Note that this implementation assumes that the graph is a directed graph, and it uses a heap data structure to efficiently select the node with the smallest distance at each step.


## 🧪 Evaluation and Fine-tune
Please refer to [**AlchemistCoder**](https://github.com/InternLM/AlchemistCoder) and [**InternLM**](https://github.com/InternLM/InternLM/tree/main).

## 😃 Acknowledgments
*AlchemistCoder* is built with [**InternLM**](https://github.com/InternLM) and [**OpenCompass**](https://github.com/open-compass). Thanks for their awesome work!

## 📧 Contact
If you have any questions, please create an issue on this repository or contact us at:
- [email protected]
- [email protected]

## 🌟 Citation
If you find our work useful, please consider citing:

```bibtex
@misc{song2024alchemistcoder,
      title={AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data}, 
      author={Zifan Song and Yudong Wang and Wenwei Zhang and Kuikun Liu and Chengqi Lyu and Demin Song and Qipeng Guo and Hang Yan and Dahua Lin and Kai Chen and Cairong Zhao},
      year={2024},
      eprint={2405.19265},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```