uiuc-convai
/

CoALM-405B

 - meta-llama/Llama-3.1-405B-Instruct
 pipeline_tag: text-generation
 ---
+# CALM-405B: The Largest Open-Source Agentic LLM
+## 🌟 Model Overview
+**CALM-405B** is the **largest open-source Conversational Agentic Language Model (LLM) ever created**. This model sets a new standard in **Conversational AI**, seamlessly integrating both **Task-Oriented Dialogue (TOD) capabilities** and **Language Agent (LA) functionalities**.
+It is designed to **push the boundaries** of open-source agentic LLMs, excelling at **multi-turn dialogue, tool usage, reasoning, and API execution**. It is the **best-performing fully open-source LLM** on the **Berkeley Function Calling Leaderboard V3 (BFCL V3)**, marking a historic leap in open-source AI research.
+## Model Sources
+<!-- Provide the basic links for the model. -->
+- **Paper [optional]:** [More Information Needed]
+- **Repository:** [More Information Needed]
+---
+## 🚀 Model Details
+- **Model Name:** CALM-405B
+- **Developed by:** Colloboration of UIUC Conversational AI LAB and Oumi
+- **License:** Apache 2.0
+- **Architecture:** Meta-Llama 3.1-405B Instruct
+- **Training Data:** CALM-IT
+- **Fine-tuning Framework:** Oumi
+- **Training Hardware:** 8 NVIDIA H100 GPUs
+- **Training Duration:** ~6.5 days
+- **Evaluation Benchmarks:** MultiWOZ 2.4, BFCL V3, API-Bank
+- **Release Date:** February 5, 2025
+---
+## 🏆 Why CALM-405B is a Game-Changer
+- **🚨 Largest Open-Source Agentic LLM:** A **405B** parameter model that brings state-of-the-art agentic capabilities to the public domain.
+- **🎯 Best Open-Source Performance on BFCL V3:** Outperforms leading proprietary models like **GPT-4o, Gemini, and Claude** in function-calling tasks.
+- **🔍 True Zero-Shot Function Calling:** Generalizes to unseen API tasks with **unmatched accuracy**.
+- **🤖 Multi-Turn Dialogue Mastery:** Excels at long conversations, **task tracking, and complex reasoning**.
+- **🛠 API Tool Use and Reasoning:** Makes precise API calls, interprets responses, and synthesizes **coherent** multi-step solutions.
+- **📜 Fully Open-Source & Reproducible:** Released under **Apache 2.0**, including model weights, training logs, and datasets.
+---
+## 📊 Benchmark Performance
+TODO: Add BFCL results
+---
+## 🔧 Training Process
+### Fine-tuning Stages
+1. **TOD Fine-tuning:** Optimized for **dialogue state tracking** (e.g., augmented SNIPS in instruction-tuned format).
+2. **Function Calling Fine-tuning:** Trained to generate **highly accurate API calls** from LA datasets.
+3. **ReAct-based Fine-tuning:** Enhances multi-turn conversations with structured **thought-action-observation-response reasoning**.
+### Training Hyperparameters
+- **Base Model:** Meta-Llama 3.1-405B Instruct
+- **LoRA Config:** Rank = 16, Scaling Factor = 32
+- **Batch Size:** 2
+- **Learning Rate:** 1e-4
+- **Optimizer:** AdamW (betas = 0.9, 0.999, epsilon = 1e-8)
+- **Precision:** q4
+- **Warm-up Steps:** 500
+- **Gradient Accumulation Steps:** 1
+---
+## 💡 How to Use CALM-405B
+🚨 It requires 16xH100 NVIDIA GPUs for Inference.
+### 🏗 How to Load the Model
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("uiuc-convai/CALM-8B")
+model = AutoModelForCausalLM.from_pretrained("uiuc-convai/CALM-8B")
+```
+<!-- TODO -->
+### 🛠 Example Inference
+```python
+TODO
+```
+More fine-tuning and **community-driven** optimizations are planned to enhance real-world usability.
+---
+## 📖 Citation
+If you use **CALM-405B** in your research, please cite:
+```
+@article{yourpaper2024,
+  title={CALM: Conversational Agentic Language Model},
+  author={Your Name and Collaborators},
+  journal={Your Conference/Journal},
+  year={2024}
+}
+```
+For more details, visit [Project Repository](https://github.com/your-repo) or contact **[email protected]**.