Lingfeng Ming
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -65,13 +65,24 @@ Baichuan-Omni-1.5-Base is a high-performance foundational omni-modal model in th
|
|
65 |
Notably, the model is fully trained end-to-end using NTP loss in the whole pre-training stage.
|
66 |
- **High-quality Controllable Audio Solution.** Multimodal system prompts have been redesigned to include traditional text system prompts and **speech system prompts** for specifying model sounds. It provides the flexibility to control voice style through text or speech samples at inference time, and supports advanced capabilities such as end-to-end voice cloning and timbre creation.
|
67 |
|
68 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
69 |
|
70 |
- We have built a more diverse medical evaluation dataset named **Openmm-Medical** to evaluate large models in medical scenarios.
|
71 |
- The images in Openmm-Medical come from **42 public medical image datasets**, such as ACRIMA (fundus images), BioMediTech (microscope images), and CoronaHack (X-rays).
|
72 |
- **Openmm-Medical contains a total of 88,996 images**, and each image is designed as a **multiple-choice question to facilitate the evaluation of different large models.**
|
73 |
- To promote the development of omnimodal large models in the medical field, we will soon **open** this evaluation dataset.
|
74 |
-
|
75 |
|
76 |
### Evaluation
|
77 |
|
|
|
65 |
Notably, the model is fully trained end-to-end using NTP loss in the whole pre-training stage.
|
66 |
- **High-quality Controllable Audio Solution.** Multimodal system prompts have been redesigned to include traditional text system prompts and **speech system prompts** for specifying model sounds. It provides the flexibility to control voice style through text or speech samples at inference time, and supports advanced capabilities such as end-to-end voice cloning and timbre creation.
|
67 |
|
68 |
+
|
69 |
+
### Open-source Evaluation Datasets
|
70 |
+
|
71 |
+
**OpenMM-Medical**
|
72 |
+
|
73 |
+
To comprehensively evaluate the model's multi-modal medical capabilities, we have constructed OpenMM-Medical, which includes data from 42 publicly available medical image datasets such as ACRIMA (retinal images), BioMediTech (microscope images), and CoronaHack (X-rays), totaling 88,996 images.
|
74 |
+
|
75 |
+
**OpenAudioBench**
|
76 |
+
|
77 |
+
To efficiently assess the model's "IQ" issues, we developed OpenAudioBench, comprising five end-to-end audio understanding sub-datasets: four public benchmarks (Llama Question, WEB QA, TriviaQA, AlpacaEval), and an internally created speech logical reasoning dataset by the Baichuan team, totaling 2,701 entries. This suite reflects the model's comprehensive "IQ" level.
|
78 |
+
|
79 |
+
<!-- **High-quality Medical Image Evaluation Dataset--Openmm-Medical**
|
80 |
|
81 |
- We have built a more diverse medical evaluation dataset named **Openmm-Medical** to evaluate large models in medical scenarios.
|
82 |
- The images in Openmm-Medical come from **42 public medical image datasets**, such as ACRIMA (fundus images), BioMediTech (microscope images), and CoronaHack (X-rays).
|
83 |
- **Openmm-Medical contains a total of 88,996 images**, and each image is designed as a **multiple-choice question to facilitate the evaluation of different large models.**
|
84 |
- To promote the development of omnimodal large models in the medical field, we will soon **open** this evaluation dataset.
|
85 |
+
-->
|
86 |
|
87 |
### Evaluation
|
88 |
|