Update README.md
Browse files
README.md
CHANGED
@@ -8,3 +8,103 @@ base_model:
|
|
8 |
- meta-llama/Llama-3.1-405B-Instruct
|
9 |
pipeline_tag: text-generation
|
10 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
- meta-llama/Llama-3.1-405B-Instruct
|
9 |
pipeline_tag: text-generation
|
10 |
---
|
11 |
+
|
12 |
+
# CALM-405B: The Largest Open-Source Agentic LLM
|
13 |
+
|
14 |
+
## π Model Overview
|
15 |
+
|
16 |
+
**CALM-405B** is the **largest open-source Conversational Agentic Language Model (LLM) ever created**. This model sets a new standard in **Conversational AI**, seamlessly integrating both **Task-Oriented Dialogue (TOD) capabilities** and **Language Agent (LA) functionalities**.
|
17 |
+
It is designed to **push the boundaries** of open-source agentic LLMs, excelling at **multi-turn dialogue, tool usage, reasoning, and API execution**. It is the **best-performing fully open-source LLM** on the **Berkeley Function Calling Leaderboard V3 (BFCL V3)**, marking a historic leap in open-source AI research.
|
18 |
+
|
19 |
+
## Model Sources
|
20 |
+
|
21 |
+
<!-- Provide the basic links for the model. -->
|
22 |
+
|
23 |
+
- **Paper [optional]:** [More Information Needed]
|
24 |
+
- **Repository:** [More Information Needed]
|
25 |
+
|
26 |
+
|
27 |
+
---
|
28 |
+
## π Model Details
|
29 |
+
|
30 |
+
- **Model Name:** CALM-405B
|
31 |
+
- **Developed by:** Colloboration of UIUC Conversational AI LAB and Oumi
|
32 |
+
- **License:** Apache 2.0
|
33 |
+
- **Architecture:** Meta-Llama 3.1-405B Instruct
|
34 |
+
- **Training Data:** CALM-IT
|
35 |
+
- **Fine-tuning Framework:** Oumi
|
36 |
+
- **Training Hardware:** 8 NVIDIA H100 GPUs
|
37 |
+
- **Training Duration:** ~6.5 days
|
38 |
+
- **Evaluation Benchmarks:** MultiWOZ 2.4, BFCL V3, API-Bank
|
39 |
+
- **Release Date:** February 5, 2025
|
40 |
+
|
41 |
+
---
|
42 |
+
## π Why CALM-405B is a Game-Changer
|
43 |
+
|
44 |
+
- **π¨ Largest Open-Source Agentic LLM:** A **405B** parameter model that brings state-of-the-art agentic capabilities to the public domain.
|
45 |
+
- **π― Best Open-Source Performance on BFCL V3:** Outperforms leading proprietary models like **GPT-4o, Gemini, and Claude** in function-calling tasks.
|
46 |
+
- **π True Zero-Shot Function Calling:** Generalizes to unseen API tasks with **unmatched accuracy**.
|
47 |
+
- **π€ Multi-Turn Dialogue Mastery:** Excels at long conversations, **task tracking, and complex reasoning**.
|
48 |
+
- **π API Tool Use and Reasoning:** Makes precise API calls, interprets responses, and synthesizes **coherent** multi-step solutions.
|
49 |
+
- **π Fully Open-Source & Reproducible:** Released under **Apache 2.0**, including model weights, training logs, and datasets.
|
50 |
+
|
51 |
+
---
|
52 |
+
## π Benchmark Performance
|
53 |
+
|
54 |
+
TODO: Add BFCL results
|
55 |
+
|
56 |
+
---
|
57 |
+
## π§ Training Process
|
58 |
+
### Fine-tuning Stages
|
59 |
+
1. **TOD Fine-tuning:** Optimized for **dialogue state tracking** (e.g., augmented SNIPS in instruction-tuned format).
|
60 |
+
2. **Function Calling Fine-tuning:** Trained to generate **highly accurate API calls** from LA datasets.
|
61 |
+
3. **ReAct-based Fine-tuning:** Enhances multi-turn conversations with structured **thought-action-observation-response reasoning**.
|
62 |
+
|
63 |
+
### Training Hyperparameters
|
64 |
+
- **Base Model:** Meta-Llama 3.1-405B Instruct
|
65 |
+
- **LoRA Config:** Rank = 16, Scaling Factor = 32
|
66 |
+
- **Batch Size:** 2
|
67 |
+
- **Learning Rate:** 1e-4
|
68 |
+
- **Optimizer:** AdamW (betas = 0.9, 0.999, epsilon = 1e-8)
|
69 |
+
- **Precision:** q4
|
70 |
+
- **Warm-up Steps:** 500
|
71 |
+
- **Gradient Accumulation Steps:** 1
|
72 |
+
|
73 |
+
---
|
74 |
+
|
75 |
+
## π‘ How to Use CALM-405B
|
76 |
+
π¨ It requires 16xH100 NVIDIA GPUs for Inference.
|
77 |
+
|
78 |
+
### π How to Load the Model
|
79 |
+
```python
|
80 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
81 |
+
|
82 |
+
tokenizer = AutoTokenizer.from_pretrained("uiuc-convai/CALM-8B")
|
83 |
+
model = AutoModelForCausalLM.from_pretrained("uiuc-convai/CALM-8B")
|
84 |
+
```
|
85 |
+
|
86 |
+
<!-- TODO -->
|
87 |
+
### π Example Inference
|
88 |
+
```python
|
89 |
+
TODO
|
90 |
+
```
|
91 |
+
|
92 |
+
|
93 |
+
More fine-tuning and **community-driven** optimizations are planned to enhance real-world usability.
|
94 |
+
|
95 |
+
---
|
96 |
+
## π Citation
|
97 |
+
If you use **CALM-405B** in your research, please cite:
|
98 |
+
```
|
99 |
+
@article{yourpaper2024,
|
100 |
+
title={CALM: Conversational Agentic Language Model},
|
101 |
+
author={Your Name and Collaborators},
|
102 |
+
journal={Your Conference/Journal},
|
103 |
+
year={2024}
|
104 |
+
}
|
105 |
+
```
|
106 |
+
|
107 |
+
For more details, visit [Project Repository](https://github.com/your-repo) or contact **[email protected]**.
|
108 |
+
|
109 |
+
|
110 |
+
|