qazimbhat1
/

Crystal-based-MLLM-7B

Text Generation

Inference Endpoints

Model card Files Files and versions Community

qazimbhat1 commited on Jun 30, 2024

Commit

a9b0bc6

·

verified ·

1 Parent(s): 22e78ad

Update README_code_model.md

Files changed (1) hide show

README_code_model.md +16 -0

README_code_model.md CHANGED Viewed

@@ -17,6 +17,22 @@ tags:
 CrystalChat-7B based multi-modal large language model (MLLM) mimics the training recipe used for Vicuna-7B based [LLaVa-v1.5](https://huggingface.co/docs/transformers/main/model_doc/llava). CrystalChat-7B based MLLMs models are entirely transparent, having open-sourced all materials, including code, data, model checkpoint, intermediate results, and more at [TODO: Add paper link](). CrystalChat-7B-Web2Code MLLM is specialized in webpage images-to-html code generation.
 ### About CrystalChat-7B-Web2Code:
 * 7 billion parameter LLM
 * CLIP ViT-L/14-336px vision encoder

 CrystalChat-7B based multi-modal large language model (MLLM) mimics the training recipe used for Vicuna-7B based [LLaVa-v1.5](https://huggingface.co/docs/transformers/main/model_doc/llava). CrystalChat-7B based MLLMs models are entirely transparent, having open-sourced all materials, including code, data, model checkpoint, intermediate results, and more at [TODO: Add paper link](). CrystalChat-7B-Web2Code MLLM is specialized in webpage images-to-html code generation.
+## Evaluations
+| LLM Backbone           | DWCG | DWU | DWCG<sub>R</sub> | DWU<sub>R</sub> | VSA ↑  | CAD ↑  | TCC ↑  | UII ↑  | Overall ↑ |
+|------------------------|------|-----|------------------|------------------|--------|--------|--------|--------|------------|
+| **CrystalChat-7B**     |      |     |                  |                  | 4.714  | 4.572  | 4.865  | 5.147  | 4.825      |
+|                        | ✓    |     |                  |                  | 7.900  | 8.001  | 8.204  | 8.215  | 8.080      |
+|                        | ✓    | ✓   |                  |                  | 7.900  | 8.001  | 8.204  | 8.215  | 8.080      |
+|                        | ✓    | ✓   | ✓                | ✓                | **8.384**  | **8.287**  | **8.417**  | **8.488**  | **8.394**      |
+| **Vicuna-7B**          |      |     |                  |                  | 3.042  | 3.250  | 3.333  | 3.167  | 3.198      |
+|                        | ✓    |     |                  |                  | 6.871  | 6.660  | 6.589  | 6.897  | 6.754      |
+|                        |      | ✓   |                  |                  | 3.898  | 3.489  | 3.340  | 3.651  | 3.595      |
+|                        | ✓    | ✓   | ✓                | ✓                | **7.876**  | **7.687**  | **7.267**  | **7.563**  | **7.598**      |
+**Table:** Performance comparison of various LLM Backbones with different datasets. The arrows (↑) indicate that higher values are better.
 ### About CrystalChat-7B-Web2Code:
 * 7 billion parameter LLM
 * CLIP ViT-L/14-336px vision encoder