Update README.md
Browse files
README.md
CHANGED
@@ -63,7 +63,7 @@ Since we utilizes a pre-trained Multimodal Large Language Model (MLLM) with a De
|
|
63 |
|
64 |
The overall architecture of our system is designed to maximize the synergy between image and text modalities, ensuring a robust and coherent generation of video content from static images. This integration not only improves the fidelity of the generated videos but also enhances the model's ability to interpret and utilize complex multimodal inputs. The overall architecture is as follows.
|
65 |
<p align="center">
|
66 |
-
<img src="https://raw.githubusercontent.com/Tencent/HunyuanVideo-I2V/refs/heads/main/assets/backbone.png" height
|
67 |
</p>
|
68 |
|
69 |
|
|
|
63 |
|
64 |
The overall architecture of our system is designed to maximize the synergy between image and text modalities, ensuring a robust and coherent generation of video content from static images. This integration not only improves the fidelity of the generated videos but also enhances the model's ability to interpret and utilize complex multimodal inputs. The overall architecture is as follows.
|
65 |
<p align="center">
|
66 |
+
<img src="https://raw.githubusercontent.com/Tencent/HunyuanVideo-I2V/refs/heads/main/assets/backbone.png" style="max-width: 60%; height: auto;">
|
67 |
</p>
|
68 |
|
69 |
|