Frankai123 commited on
Commit
691352e
·
verified ·
1 Parent(s): 856c07e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -3
README.md CHANGED
@@ -92,11 +92,17 @@ DMind-1 exhibits advanced web3-aligned reasoning and interactive capabilities in
92
 
93
  ## 2. Evaluation Results
94
 
95
- ![DMind-1 Web3 Performance](figures/dmind-1-web3-performance.jpeg)
96
 
97
- We evaluate DMind-1 using the **DMind Benchmark**, a domain-specific evaluation suite tailored to assess large language models in the Web3 context. The benchmark spans 1,917 expert-reviewed questions across nine critical categories—including Blockchain Fundamentals, Infrastructure, Smart Contracts, DeFi, DAO, NFT, Token Economics, Meme, and Security. It combines multiple-choice and subjective open-ended tasks, simulating real-world challenges and requiring deep contextual understanding, which provides a comprehensive assessment of both factual knowledge and advanced reasoning.
98
 
99
- Under this rigorous evaluation, DMind-1 ranked 1st among 24 leading models, outperforming both proprietary (e.g., Grok-3) and open-source (e.g., DeepSeek-R1) LLMs. Notably, our distilled variant DMind-1-mini also performed strongly, ranking 2nd overall. This demonstrates the effectiveness of our compact distillation pipeline.
 
 
 
 
 
 
100
 
101
 
102
  ## 3. Use Cases
 
92
 
93
  ## 2. Evaluation Results
94
 
95
+ ![DMind-1 Web3 Performance](figures/normalized-performance-with-price.jpeg)
96
 
97
+ We evaluate DMind-1 and DMind-1-mini using the [DMind Benchmark](https://huggingface.co/datasets/DMindAI/DMind_Benchmark), a domain-specific evaluation suite designed to assess large language models in the Web3 context. The benchmark includes 1,917 expert-reviewed questions across nine core domain categories, and it features both multiple-choice and open-ended tasks to measure factual knowledge, contextual reasoning, and other abilities.
98
 
99
+ To complement accuracy metrics, we conducted a **cost-performance analysis** by comparing benchmark scores against publicly available input token prices across 24 leading LLMs. In this evaluation:
100
+
101
+ - **DMind-1** achieved the highest Web3 score while maintaining one of the lowest token input costs among top-tier models such as Grok 3 and Claude 3.5 Sonnet.
102
+
103
+ - **DMind-1-mini** ranked second, retaining over 95% of DMind-1’s performance with greater efficiency in latency and compute.
104
+
105
+ Both models are uniquely positioned in the most favorable region of the score vs. price curve, delivering state-of-the-art Web3 reasoning at significantly lower cost. This balance of quality and efficiency makes the DMind models highly competitive for both research and production use.
106
 
107
 
108
  ## 3. Use Cases