emrecanacikgoz commited on
Commit
fc98b4d
Β·
verified Β·
1 Parent(s): e7163e4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -0
README.md CHANGED
@@ -8,3 +8,103 @@ base_model:
8
  - meta-llama/Llama-3.1-405B-Instruct
9
  pipeline_tag: text-generation
10
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  - meta-llama/Llama-3.1-405B-Instruct
9
  pipeline_tag: text-generation
10
  ---
11
+
12
+ # CALM-405B: The Largest Open-Source Agentic LLM
13
+
14
+ ## 🌟 Model Overview
15
+
16
+ **CALM-405B** is the **largest open-source Conversational Agentic Language Model (LLM) ever created**. This model sets a new standard in **Conversational AI**, seamlessly integrating both **Task-Oriented Dialogue (TOD) capabilities** and **Language Agent (LA) functionalities**.
17
+ It is designed to **push the boundaries** of open-source agentic LLMs, excelling at **multi-turn dialogue, tool usage, reasoning, and API execution**. It is the **best-performing fully open-source LLM** on the **Berkeley Function Calling Leaderboard V3 (BFCL V3)**, marking a historic leap in open-source AI research.
18
+
19
+ ## Model Sources
20
+
21
+ <!-- Provide the basic links for the model. -->
22
+
23
+ - **Paper [optional]:** [More Information Needed]
24
+ - **Repository:** [More Information Needed]
25
+
26
+
27
+ ---
28
+ ## πŸš€ Model Details
29
+
30
+ - **Model Name:** CALM-405B
31
+ - **Developed by:** Colloboration of UIUC Conversational AI LAB and Oumi
32
+ - **License:** Apache 2.0
33
+ - **Architecture:** Meta-Llama 3.1-405B Instruct
34
+ - **Training Data:** CALM-IT
35
+ - **Fine-tuning Framework:** Oumi
36
+ - **Training Hardware:** 8 NVIDIA H100 GPUs
37
+ - **Training Duration:** ~6.5 days
38
+ - **Evaluation Benchmarks:** MultiWOZ 2.4, BFCL V3, API-Bank
39
+ - **Release Date:** February 5, 2025
40
+
41
+ ---
42
+ ## πŸ† Why CALM-405B is a Game-Changer
43
+
44
+ - **🚨 Largest Open-Source Agentic LLM:** A **405B** parameter model that brings state-of-the-art agentic capabilities to the public domain.
45
+ - **🎯 Best Open-Source Performance on BFCL V3:** Outperforms leading proprietary models like **GPT-4o, Gemini, and Claude** in function-calling tasks.
46
+ - **πŸ” True Zero-Shot Function Calling:** Generalizes to unseen API tasks with **unmatched accuracy**.
47
+ - **πŸ€– Multi-Turn Dialogue Mastery:** Excels at long conversations, **task tracking, and complex reasoning**.
48
+ - **πŸ›  API Tool Use and Reasoning:** Makes precise API calls, interprets responses, and synthesizes **coherent** multi-step solutions.
49
+ - **πŸ“œ Fully Open-Source & Reproducible:** Released under **Apache 2.0**, including model weights, training logs, and datasets.
50
+
51
+ ---
52
+ ## πŸ“Š Benchmark Performance
53
+
54
+ TODO: Add BFCL results
55
+
56
+ ---
57
+ ## πŸ”§ Training Process
58
+ ### Fine-tuning Stages
59
+ 1. **TOD Fine-tuning:** Optimized for **dialogue state tracking** (e.g., augmented SNIPS in instruction-tuned format).
60
+ 2. **Function Calling Fine-tuning:** Trained to generate **highly accurate API calls** from LA datasets.
61
+ 3. **ReAct-based Fine-tuning:** Enhances multi-turn conversations with structured **thought-action-observation-response reasoning**.
62
+
63
+ ### Training Hyperparameters
64
+ - **Base Model:** Meta-Llama 3.1-405B Instruct
65
+ - **LoRA Config:** Rank = 16, Scaling Factor = 32
66
+ - **Batch Size:** 2
67
+ - **Learning Rate:** 1e-4
68
+ - **Optimizer:** AdamW (betas = 0.9, 0.999, epsilon = 1e-8)
69
+ - **Precision:** q4
70
+ - **Warm-up Steps:** 500
71
+ - **Gradient Accumulation Steps:** 1
72
+
73
+ ---
74
+
75
+ ## πŸ’‘ How to Use CALM-405B
76
+ 🚨 It requires 16xH100 NVIDIA GPUs for Inference.
77
+
78
+ ### πŸ— How to Load the Model
79
+ ```python
80
+ from transformers import AutoModelForCausalLM, AutoTokenizer
81
+
82
+ tokenizer = AutoTokenizer.from_pretrained("uiuc-convai/CALM-8B")
83
+ model = AutoModelForCausalLM.from_pretrained("uiuc-convai/CALM-8B")
84
+ ```
85
+
86
+ <!-- TODO -->
87
+ ### πŸ›  Example Inference
88
+ ```python
89
+ TODO
90
+ ```
91
+
92
+
93
+ More fine-tuning and **community-driven** optimizations are planned to enhance real-world usability.
94
+
95
+ ---
96
+ ## πŸ“– Citation
97
+ If you use **CALM-405B** in your research, please cite:
98
+ ```
99
+ @article{yourpaper2024,
100
+ title={CALM: Conversational Agentic Language Model},
101
+ author={Your Name and Collaborators},
102
+ journal={Your Conference/Journal},
103
+ year={2024}
104
+ }
105
+ ```
106
+
107
+ For more details, visit [Project Repository](https://github.com/your-repo) or contact **[email protected]**.
108
+
109
+
110
+