---
license: mit
base_model:
- deepseek-ai/DeepSeek-Prover-V1.5-SFT
---
# MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving
* Paper link: [https://arxiv.org/abs/2503.03205](https://arxiv.org/abs/2503.03205)
* LoT-Solver (Based on DeepSeek-Prover-v1.5-SFT): [https://huggingface.co/RickyDeSkywalker/LoT-Solver](https://huggingface.co/RickyDeSkywalker/LoT-Solver)
* LoT-Solver (Based on Godel-Prover-SFT): [https://huggingface.co/RickyDeSkywalker/LoT-Solver-Godel](https://huggingface.co/RickyDeSkywalker/LoT-Solver-Godel)
* Website: [https://ma-lot.github.io/](https://ma-lot.github.io/)
This is the model under MA-LoT paper presented training pipeline trained on DeepSeek-Prover-v1.5-SFT. For usage of the model, pelase refer to our GitHub page.
## Introduction
Solving mathematical problems using computer-verifiable languages like Lean has significantly impacted mathematical and computer science communities.
State-of-the-art methods utilize single Large Language Models (LLMs) as agents or provers to either generate complete proof or perform tree searches.
However, single-agent methods inherently lack a structured way to combine high-level reasoning in Natural Language (NL) with Formal Language (FL) verification feedback.
To solve these issues, we propose **MA-LoT**: Multi-Agent Lean-based Long Chain-of-Thought framework,
(to the best of our knowledge), the first multi-agent framework for Lean4 theorem proving that balance high-level NL reasoning and FL verification in Long CoT.
Using this structured interaction, our approach enables deeper insights and long-term coherence in proof generation, with which past methods struggle.
We do this by leveraging emergent formal reasoning ability in Long CoT using our novel *LoT-Transfer Learning* training-inference pipeline.
Extensive experiment shows that our framework achieves **61.07%** accuracy rate on the Lean4 version of MiniF2F-Test dataset,
largely outperforming GPT-4 (22.95\%), single-agent tree search (InternLM-Step-Prover, 50.70%), and whole-proof generation (Godel-Prover, 55.33%) baselines.
Furthermore, our findings highlight the potential of combining Long CoT with formal verification for a more insightful generation in a broader perspective.
## Evaluation Results
## Usage example
### General setup
Load model
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
MODEL_ID = "RickyDeSkywalker/LoT-Solver"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
torch_dtype=torch.bfloat16,
device_map="auto"
)
```
### Prover Agent
#### Code-completion usage
The model can be used as a general code-completion prover as follows:
```python
# formulate prompt
prompt = """Complete the following Lean 4 code with explanatory comments preceding each line of code:
```lean4
import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
/--Suppose $a, b, c$ are the sides of a triangle. Prove that
$a^2(b+c-a)+b^2(c+a-b)+c^2(a+b-c)\le{3abc}.$-/
theorem imo_1964_p2
(a b c : ℝ)
(h₀ : 0 < a ∧ 0 < b ∧ 0 < c)
(h₁ : c < a + b)
(h₂ : b < a + c)
(h₃ : a < b + c) :
a^2 * (b + c - a) + b^2 * (c + a - b) + c^2 * (a + b - c) ≤ 3 * a * b * c := by"""
# tokenize and generation
input_tensor = tokenizer.encode(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=2048)
# print results
result = tokenizer.decode(outputs[0])
print(result)
```
#### Lean-based Long CoT Prover:
The model can also be used as a LoT(Lean long CoT) prover as follows:
```python
# formulate prompt
prompt = """<|begin▁of▁sentence|>You are an expert in Lean4 theorem proving with exceptional strategic reasoning abilities. When solving problems, strictly follow this process:
1. First, create a natural language proof draft explaining key insights and logical steps
2. Then analyze required Lean4 tactics, specifying exact syntax and their logical purpose
### Instruction: You will receive several Lean4 problems. For each:
- Prove the theorem **WITH** Long Chain-of-Thought in the block.
- **Do not** reveal your chain of thought outside the block.
- **Ensure** the final Lean4 code or final result is placed **only** in