nguyenbh commited on
Commit
6cf9696
·
verified ·
1 Parent(s): 18812f4

Update readme

Browse files
Files changed (1) hide show
  1. README.md +20 -0
README.md CHANGED
@@ -145,6 +145,8 @@ With Phi-4-multimodal-instruct, a single new open model has been trained across
145
  It is anticipated that Phi-4-multimodal-instruct will greatly benefit app developers and various use cases. The enthusiastic support for the Phi-4 series is greatly appreciated. Feedback on Phi-4 is welcomed and crucial to the model's evolution and improvement. Thank you for being part of this journey!
146
 
147
  ## Model Quality
 
 
148
 
149
  To understand the capabilities, Phi-4-multimodal-instruct was compared with a set of models over a variety of benchmarks using an internal benchmark platform (See Appendix A for benchmark methodology). Users can refer to the Phi-4-Mini-Instruct model card for details of language benchmarks. At the high-level overview of the model quality on representative speech and vision benchmarks:
150
 
@@ -262,6 +264,7 @@ BLINK is an aggregated benchmark with 14 visual tasks that humans can solve very
262
 
263
  ![alt text](./figures/multi_image.png)
264
 
 
265
 
266
  ## Usage
267
 
@@ -474,6 +477,23 @@ print(f'>>> Response\n{response}')
474
 
475
  More inference examples can be found [**here**](https://huggingface.co/microsoft/Phi-4-multimodal-instruct/blob/main/sample_inference_phi4mm.py).
476
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
477
  ## Training
478
 
479
  ### Fine-tuning
 
145
  It is anticipated that Phi-4-multimodal-instruct will greatly benefit app developers and various use cases. The enthusiastic support for the Phi-4 series is greatly appreciated. Feedback on Phi-4 is welcomed and crucial to the model's evolution and improvement. Thank you for being part of this journey!
146
 
147
  ## Model Quality
148
+ <details>
149
+ <summary>Click to view details</summary>
150
 
151
  To understand the capabilities, Phi-4-multimodal-instruct was compared with a set of models over a variety of benchmarks using an internal benchmark platform (See Appendix A for benchmark methodology). Users can refer to the Phi-4-Mini-Instruct model card for details of language benchmarks. At the high-level overview of the model quality on representative speech and vision benchmarks:
152
 
 
264
 
265
  ![alt text](./figures/multi_image.png)
266
 
267
+ </details>
268
 
269
  ## Usage
270
 
 
477
 
478
  More inference examples can be found [**here**](https://huggingface.co/microsoft/Phi-4-multimodal-instruct/blob/main/sample_inference_phi4mm.py).
479
 
480
+ ### vLLM inference
481
+
482
+ User can start a server with this command
483
+
484
+ ```bash
485
+ python -m vllm.entrypoints.openai.api_server --model 'microsoft/Phi-4-multimodal-instruct' --dtype auto --trust-remote-code --max-model-len 131072 --enable-lora --max-lora-rank 320 --lora-extra-vocab-size 0 --limit-mm-per-prompt audio=3,image=3 --max-loras 2 --lora-modules speech=<path to speech lora folder> vision=<path to vision lora folder>
486
+ ```
487
+
488
+ The speech lora and vision lora folders are within the Phi-4-multimodal-instruct folder downloaded by vLLM, you can also use the following script to find thoses:
489
+
490
+ ```python
491
+ from huggingface_hub import snapshot_download
492
+ model_path = snapshot_download(repo_id="microsoft/Phi-4-multimodal-instruct")
493
+ speech_lora_path = model_path+"/speech-lora"
494
+ vision_lora_path = model_path+"/vision-lora"
495
+ ```
496
+
497
  ## Training
498
 
499
  ### Fine-tuning