THUdyh
/

Ola-7b

Any-to-Any

Safetensors

English

Chinese

ola_qwen

Model card Files Files and versions Community

THUdyh commited on 1 day ago

Commit

aa1adf7

verified ·

1 Parent(s): b70a751

Update README.md

Browse files

Files changed (1) hide show

README.md +1 -152

README.md CHANGED Viewed

@@ -321,155 +321,4 @@ title={Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive
 author={Liu, Zuyan and Dong, Yuhao and Wang, Jiahui and Liu, Ziwei and Hu, Winston and Lu, Jiwen and Rao, Yongming},
 journal={arXiv preprint arXiv:2502.04328},
 year={2025}
-}
-# File information
-The repository contains the following file information:
-Filename: generation_config.json
-Content: {
-  "attn_implementation": "flash_attention_2",
-  "bos_token_id": 151643,
-  "do_sample": true,
-  "eos_token_id": [
-    151645,
-    151643
-  ],
-  "pad_token_id": 151643,
-  "repetition_penalty": 1.05,
-  "temperature": 0.7,
-  "top_k": 20,
-  "top_p": 0.8,
-  "transformers_version": "4.43.4"
-}
-Filename: merges.txt
-Content: "Content of the file is larger than 50 KB, too long to display."
-Filename: special_tokens_map.json
-Content: {
-  "additional_special_tokens": [
-    "<|im_start|>",
-    "<|im_end|>",
-    "<|object_ref_start|>",
-    "<|object_ref_end|>",
-    "<|box_start|>",
-    "<|box_end|>",
-    "<|quad_start|>",
-    "<|quad_end|>",
-    "<|vision_start|>",
-    "<|vision_end|>",
-    "<|vision_pad|>",
-    "<|image_pad|>",
-    "<|video_pad|>"
-  ],
-  "eos_token": {
-    "content": "<|im_end|>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  },
-  "pad_token": "<|mm_pad|>"
-}
-Filename: model.safetensors.index.json
-Content: "Content of the file is larger than 50 KB, too long to display."
-Filename: config.json
-Content: "Content of the file is larger than 50 KB, too long to display."
-Filename: vocab.json
-Content: "Content of the file is larger than 50 KB, too long to display."
-Filename: tokenizer_config.json
-Content: "Content of the file is larger than 50 KB, too long to display."
-# Project page
-The project page URL we found has the following URL:
-# Github README
-The Github README we found contains the following content:
-<div align="center">
-<img src="assets/logo.png" width="30%"/>
-# OLA: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment
-Join our [WeChat](http://imagebind-llm.opengvlab.com/qrcode/)
-[[Project Page](https://ola-omni.github.io/)]  [[Demo](http://106.14.2.150:10020/)]
-</div>
-<img src="assets/teaser.png" width="100%"/>
-## 🚀 News
-* [2025/02/07] 🎉🎉🎉 Initial codebase for eval and training will be released ASAP! Thanks for your attention.
-## ⚡ Model Zoo
-1. Speech-Visual Data
-    * [ ] image+text with local audio caption.
-    * [ ] videos from webvid2.5m with audio caption.
-2. Visual Tokenizer
-    * [ ] Imagebind small.
-    * [ ] Oryx-ViT 18B-1152.
-3. Training Pipeline
-    * [ ] image+text stage.
-    * [ ] audio+image+text stage.
-    * [ ] video+audio+image+text stage
-## TODO
-- [ ] Multi Stage Training
-## ⚙️ Installation
-See [INSTALL.md](docs/INSTALL.md) for detailed instructions.
-## 🛴 Quick Inference Code
-- Check out the [quick inference script](example/inference/image_audio.ipynb) using a visual and audio data!
-## 📃 Citation
-```
-@article{liu2025ola,
-title={Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment},
-author={Liu, Zuyan and Dong, Yuhao and Wang, Jiahui and Liu, Ziwei and Hu, Winston and Lu, Jiwen and Rao, Yongming},
-journal={arXiv preprint arXiv:2502.04328},
-year={2025}
-}
-```
-## Acknowledgement
-- This project has been built using the great codebase of [Qwen](https://github.com/QwenLM/Qwen), [Video-LLaVA](https://github.com/mbai-xiao/Video-LLaVA), [OpenFlamingo](https://github.com/mlfoundations/open_flamingo). We thank the authors for their wonderful works.
-## Contact
-- If you have any questions, feel free to open issues or pull requests.
-Format your response as markdown, like this:
-## reasoning
-A reasoning section regarding which metadata is most appropriate for the given model to put in the `content` section as YAML, given the available
-context about the paper (abstract, Github README content and project page content if provided). Formatted as plain text.
-## Title
-The title of your Hugging Face pull request formatted as plain text
-## Comment
-The comment of your Hugging Face pull request formatted as markdown
-## Metadata
-The metadata of the new/updated model card formatted as YAML.
-## Content
-The content of the new/updated README.md (model card) formatted as markdown
-Start your answer directly with a "## Reasoning" section followed by "## Title", "## Comment", "## Metadata" and "## Content" sections
-that are filled in with relevant info for the given paper. Only format the Metadata section using ```yaml and ``` markers.
-In case there is already an Arxiv link present, there is no need to replace it with a Hugging Face paper page link.
-In case there is already a Github or project page URL present, there is no need to mention in the comment that you added it.

 author={Liu, Zuyan and Dong, Yuhao and Wang, Jiahui and Liu, Ziwei and Hu, Winston and Lu, Jiwen and Rao, Yongming},
 journal={arXiv preprint arXiv:2502.04328},
 year={2025}
+}