Update README.md
Browse files
README.md
CHANGED
@@ -92,11 +92,17 @@ DMind-1 exhibits advanced web3-aligned reasoning and interactive capabilities in
|
|
92 |
|
93 |
## 2. Evaluation Results
|
94 |
|
95 |
-

|
96 |
|
97 |
+
We evaluate DMind-1 and DMind-1-mini using the [DMind Benchmark](https://huggingface.co/datasets/DMindAI/DMind_Benchmark), a domain-specific evaluation suite designed to assess large language models in the Web3 context. The benchmark includes 1,917 expert-reviewed questions across nine core domain categories, and it features both multiple-choice and open-ended tasks to measure factual knowledge, contextual reasoning, and other abilities.
|
98 |
|
99 |
+
To complement accuracy metrics, we conducted a **cost-performance analysis** by comparing benchmark scores against publicly available input token prices across 24 leading LLMs. In this evaluation:
|
100 |
+
|
101 |
+
- **DMind-1** achieved the highest Web3 score while maintaining one of the lowest token input costs among top-tier models such as Grok 3 and Claude 3.5 Sonnet.
|
102 |
+
|
103 |
+
- **DMind-1-mini** ranked second, retaining over 95% of DMind-1’s performance with greater efficiency in latency and compute.
|
104 |
+
|
105 |
+
Both models are uniquely positioned in the most favorable region of the score vs. price curve, delivering state-of-the-art Web3 reasoning at significantly lower cost. This balance of quality and efficiency makes the DMind models highly competitive for both research and production use.
|
106 |
|
107 |
|
108 |
## 3. Use Cases
|