THUdyh commited on
Commit
aa1adf7
·
verified ·
1 Parent(s): b70a751

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -152
README.md CHANGED
@@ -321,155 +321,4 @@ title={Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive
321
  author={Liu, Zuyan and Dong, Yuhao and Wang, Jiahui and Liu, Ziwei and Hu, Winston and Lu, Jiwen and Rao, Yongming},
322
  journal={arXiv preprint arXiv:2502.04328},
323
  year={2025}
324
- }
325
-
326
- # File information
327
-
328
- The repository contains the following file information:
329
-
330
- Filename: generation_config.json
331
- Content: {
332
- "attn_implementation": "flash_attention_2",
333
- "bos_token_id": 151643,
334
- "do_sample": true,
335
- "eos_token_id": [
336
- 151645,
337
- 151643
338
- ],
339
- "pad_token_id": 151643,
340
- "repetition_penalty": 1.05,
341
- "temperature": 0.7,
342
- "top_k": 20,
343
- "top_p": 0.8,
344
- "transformers_version": "4.43.4"
345
- }
346
-
347
- Filename: merges.txt
348
- Content: "Content of the file is larger than 50 KB, too long to display."
349
-
350
- Filename: special_tokens_map.json
351
- Content: {
352
- "additional_special_tokens": [
353
- "<|im_start|>",
354
- "<|im_end|>",
355
- "<|object_ref_start|>",
356
- "<|object_ref_end|>",
357
- "<|box_start|>",
358
- "<|box_end|>",
359
- "<|quad_start|>",
360
- "<|quad_end|>",
361
- "<|vision_start|>",
362
- "<|vision_end|>",
363
- "<|vision_pad|>",
364
- "<|image_pad|>",
365
- "<|video_pad|>"
366
- ],
367
- "eos_token": {
368
- "content": "<|im_end|>",
369
- "lstrip": false,
370
- "normalized": false,
371
- "rstrip": false,
372
- "single_word": false
373
- },
374
- "pad_token": "<|mm_pad|>"
375
- }
376
-
377
- Filename: model.safetensors.index.json
378
- Content: "Content of the file is larger than 50 KB, too long to display."
379
-
380
- Filename: config.json
381
- Content: "Content of the file is larger than 50 KB, too long to display."
382
-
383
- Filename: vocab.json
384
- Content: "Content of the file is larger than 50 KB, too long to display."
385
-
386
- Filename: tokenizer_config.json
387
- Content: "Content of the file is larger than 50 KB, too long to display."
388
-
389
-
390
-
391
- # Project page
392
-
393
- The project page URL we found has the following URL:
394
-
395
- # Github README
396
-
397
- The Github README we found contains the following content:
398
-
399
- <div align="center">
400
-
401
- <img src="assets/logo.png" width="30%"/>
402
-
403
- # OLA: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment
404
-
405
- Join our [WeChat](http://imagebind-llm.opengvlab.com/qrcode/)
406
- [[Project Page](https://ola-omni.github.io/)] [[Demo](http://106.14.2.150:10020/)]
407
-
408
- </div>
409
-
410
- <img src="assets/teaser.png" width="100%"/>
411
-
412
- ## 🚀 News
413
- * [2025/02/07] 🎉🎉🎉 Initial codebase for eval and training will be released ASAP! Thanks for your attention.
414
-
415
- ## ⚡ Model Zoo
416
- 1. Speech-Visual Data
417
- * [ ] image+text with local audio caption.
418
- * [ ] videos from webvid2.5m with audio caption.
419
- 2. Visual Tokenizer
420
- * [ ] Imagebind small.
421
- * [ ] Oryx-ViT 18B-1152.
422
- 3. Training Pipeline
423
- * [ ] image+text stage.
424
- * [ ] audio+image+text stage.
425
- * [ ] video+audio+image+text stage
426
-
427
- ## TODO
428
- - [ ] Multi Stage Training
429
-
430
- ## ⚙️ Installation
431
-
432
- See [INSTALL.md](docs/INSTALL.md) for detailed instructions.
433
-
434
- ## 🛴 Quick Inference Code
435
-
436
- - Check out the [quick inference script](example/inference/image_audio.ipynb) using a visual and audio data!
437
-
438
- ## 📃 Citation
439
- ```
440
- @article{liu2025ola,
441
- title={Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment},
442
- author={Liu, Zuyan and Dong, Yuhao and Wang, Jiahui and Liu, Ziwei and Hu, Winston and Lu, Jiwen and Rao, Yongming},
443
- journal={arXiv preprint arXiv:2502.04328},
444
- year={2025}
445
- }
446
- ```
447
-
448
- ## Acknowledgement
449
- - This project has been built using the great codebase of [Qwen](https://github.com/QwenLM/Qwen), [Video-LLaVA](https://github.com/mbai-xiao/Video-LLaVA), [OpenFlamingo](https://github.com/mlfoundations/open_flamingo). We thank the authors for their wonderful works.
450
-
451
- ## Contact
452
- - If you have any questions, feel free to open issues or pull requests.
453
-
454
- Format your response as markdown, like this:
455
-
456
- ## reasoning
457
- A reasoning section regarding which metadata is most appropriate for the given model to put in the `content` section as YAML, given the available
458
- context about the paper (abstract, Github README content and project page content if provided). Formatted as plain text.
459
-
460
- ## Title
461
- The title of your Hugging Face pull request formatted as plain text
462
-
463
- ## Comment
464
- The comment of your Hugging Face pull request formatted as markdown
465
-
466
- ## Metadata
467
- The metadata of the new/updated model card formatted as YAML.
468
-
469
- ## Content
470
- The content of the new/updated README.md (model card) formatted as markdown
471
-
472
- Start your answer directly with a "## Reasoning" section followed by "## Title", "## Comment", "## Metadata" and "## Content" sections
473
- that are filled in with relevant info for the given paper. Only format the Metadata section using ```yaml and ``` markers.
474
- In case there is already an Arxiv link present, there is no need to replace it with a Hugging Face paper page link.
475
- In case there is already a Github or project page URL present, there is no need to mention in the comment that you added it.
 
321
  author={Liu, Zuyan and Dong, Yuhao and Wang, Jiahui and Liu, Ziwei and Hu, Winston and Lu, Jiwen and Rao, Yongming},
322
  journal={arXiv preprint arXiv:2502.04328},
323
  year={2025}
324
+ }