Spaces:

akhaliq
/

anycoder

Running

App Files Files Community

akhaliq HF Staff commited on Jul 4

Commit

e287280

1 Parent(s): d142097

add ernie vl support

Browse files

Files changed (2) hide show

README.md +61 -1
app.py +89 -6

README.md CHANGED Viewed

@@ -93,4 +93,64 @@ The application uses:
 - **Hugging Face Hub**: For model inference
 - **ModelScope Studio**: For UI components
 - **OAuth Login**: Requires users to sign in with Hugging Face for code generation
-- **Streaming**: For real-time code generation

 - **Hugging Face Hub**: For model inference
 - **ModelScope Studio**: For UI components
 - **OAuth Login**: Requires users to sign in with Hugging Face for code generation
+- **Streaming**: For real-time code generation
+# Hugging Face Coder
+A Gradio-based application that uses Hugging Face models to generate code based on user requirements. The app supports both text-only and multimodal (text + image) code generation.
+## Features
+- **Multiple Model Support**: DeepSeek V3, DeepSeek R1, and ERNIE-4.5-VL
+- **Multimodal Input**: Upload images to help describe your requirements
+- **Real-time Code Generation**: Stream responses from the models
+- **Live Preview**: See your generated code in action with the built-in sandbox
+- **History Management**: Keep track of your previous generations
+- **Example Templates**: Quick-start with predefined application templates
+## Setup
+1. Install dependencies:
+```bash
+pip install -r requirements.txt
+```
+2. Set your Hugging Face API token as an environment variable:
+```bash
+export HF_TOKEN="your_huggingface_token_here"
+```
+3. Run the application:
+```bash
+python app.py
+```
+## Usage
+1. **Text-only Generation**: Simply type your requirements in the text area
+2. **Multimodal Generation**: Upload an image and describe what you want to create
+3. **Model Selection**: Switch between different models using the model selector
+4. **Examples**: Use the provided example templates to get started quickly
+## Supported Models
+- **DeepSeek V3**: General code generation
+- **DeepSeek R1**: Advanced code generation
+- **ERNIE-4.5-VL**: Multimodal code generation with image understanding
+## Environment Variables
+- `HF_TOKEN`: Your Hugging Face API token (required)
+## Examples
+- Todo App
+- Calculator
+- Weather Dashboard
+- Chat Interface
+- E-commerce Product Card
+- Login Form
+- Dashboard Layout
+- Data Table
+- Image Gallery
+- UI from Image (multimodal)

app.py CHANGED Viewed

@@ -20,6 +20,8 @@ When asked to create an application, you should:
 4. Include necessary comments and documentation
 5. Ensure the code is functional and follows best practices
 Always respond with code that can be executed or rendered directly.
 Always output only the HTML code inside a ```html ... ``` code block, and do not include any explanations or extra text."""
@@ -35,6 +37,11 @@ AVAILABLE_MODELS = [
         "name": "DeepSeek R1",
         "id": "deepseek-ai/DeepSeek-R1-0528",
         "description": "DeepSeek R1 model for code generation"
     }
 ]
@@ -70,6 +77,14 @@ DEMO_LIST = [
     {
         "title": "Data Table",
         "description": "Build a data table with sorting and filtering capabilities"
     }
 ]
@@ -87,7 +102,17 @@ Messages = List[Dict[str, str]]
 def history_to_messages(history: History, system: str) -> Messages:
     messages = [{'role': 'system', 'content': system}]
     for h in history:
-        messages.append({'role': 'user', 'content': h[0]})
         messages.append({'role': 'assistant', 'content': h[1]})
     return messages
@@ -95,7 +120,16 @@ def messages_to_history(messages: Messages) -> Tuple[str, History]:
     assert messages[0]['role'] == 'system'
     history = []
     for q, r in zip(messages[1::2], messages[2::2]):
-        history.append([q['content'], r['content']])
     return history
 def remove_code_block(text):
@@ -121,6 +155,46 @@ def history_render(history: History):
 def clear_history():
     return []
 def send_to_sandbox(code):
     # Add a wrapper to inject necessary permissions and ensure full HTML
     wrapped_code = f"""
@@ -207,6 +281,7 @@ with gr.Blocks(css_paths="app.css") as demo:
                         current_model_display = gr.Markdown("**Current Model:** DeepSeek V3", visible=False)
                         input = antd.InputTextarea(
                             size="large", allow_clear=True, placeholder="Please enter what kind of application you want", visible=False)
                         btn = antd.Button("send", type="primary", size="large", visible=False)
                         clear_btn = antd.Button("clear history", type="default", size="large", visible=False)
@@ -215,7 +290,7 @@ with gr.Blocks(css_paths="app.css") as demo:
                             for i, demo_item in enumerate(DEMO_LIST):
                                 with antd.Card(hoverable=True, title=demo_item["title"]) as demoCard:
                                     antd.CardMeta(description=demo_item["description"])
-                                demoCard.click(lambda e, idx=i: DEMO_LIST[idx]['description'], outputs=[input])
                         antd.Divider("setting", visible=False)
                         with antd.Flex(gap="small", wrap=True, visible=False) as setting_flex:
@@ -285,6 +360,7 @@ with gr.Blocks(css_paths="app.css") as demo:
                         gr.update(visible=False),
                         gr.update(visible=False),
                         gr.update(visible=False),
                     )
                 else:
                     return (
@@ -299,9 +375,10 @@ with gr.Blocks(css_paths="app.css") as demo:
                         gr.update(visible=True),
                         gr.update(visible=True),
                         gr.update(visible=True),
                     )
-            def generation_code(query: Optional[str], _setting: Dict[str, str], _history: Optional[History], profile: gr.OAuthProfile | None, _current_model: Dict):
                 if profile is None:
                     return (
                         "Please sign in with Hugging Face to use this feature.",
@@ -315,7 +392,12 @@ with gr.Blocks(css_paths="app.css") as demo:
                 if _history is None:
                     _history = []
                 messages = history_to_messages(_history, _setting['system'])
-                messages.append({'role': 'user', 'content': query})
                 try:
                     completion = client.chat.completions.create(
@@ -358,7 +440,7 @@ with gr.Blocks(css_paths="app.css") as demo:
             btn.click(
                 generation_code,
-                inputs=[input, setting, history, current_model],
                 outputs=[code_output, history, sandbox, state_tab, code_drawer]
             )
@@ -370,6 +452,7 @@ with gr.Blocks(css_paths="app.css") as demo:
                 outputs=[
                     login_message,
                     input,
                     current_model_display,
                     btn,
                     clear_btn,

 4. Include necessary comments and documentation
 5. Ensure the code is functional and follows best practices
+If an image is provided, analyze it and use the visual information to better understand the user's requirements.
 Always respond with code that can be executed or rendered directly.
 Always output only the HTML code inside a ```html ... ``` code block, and do not include any explanations or extra text."""
         "name": "DeepSeek R1",
         "id": "deepseek-ai/DeepSeek-R1-0528",
         "description": "DeepSeek R1 model for code generation"
+    },
+    {
+        "name": "ERNIE-4.5-VL",
+        "id": "baidu/ERNIE-4.5-VL-424B-A47B-Base-PT",
+        "description": "ERNIE-4.5-VL model for multimodal code generation with image support"
     }
 ]
     {
         "title": "Data Table",
         "description": "Build a data table with sorting and filtering capabilities"
+    },
+    {
+        "title": "Image Gallery",
+        "description": "Create an image gallery with lightbox functionality and responsive grid layout"
+    },
+    {
+        "title": "UI from Image",
+        "description": "Upload an image of a UI design and I'll generate the HTML/CSS code for it"
     }
 ]
 def history_to_messages(history: History, system: str) -> Messages:
     messages = [{'role': 'system', 'content': system}]
     for h in history:
+        # Handle multimodal content in history
+        user_content = h[0]
+        if isinstance(user_content, list):
+            # Extract text from multimodal content
+            text_content = ""
+            for item in user_content:
+                if isinstance(item, dict) and item.get("type") == "text":
+                    text_content += item.get("text", "")
+            user_content = text_content if text_content else str(user_content)
+        messages.append({'role': 'user', 'content': user_content})
         messages.append({'role': 'assistant', 'content': h[1]})
     return messages
     assert messages[0]['role'] == 'system'
     history = []
     for q, r in zip(messages[1::2], messages[2::2]):
+        # Extract text content from multimodal messages for history
+        user_content = q['content']
+        if isinstance(user_content, list):
+            text_content = ""
+            for item in user_content:
+                if isinstance(item, dict) and item.get("type") == "text":
+                    text_content += item.get("text", "")
+            user_content = text_content if text_content else str(user_content)
+        history.append([user_content, r['content']])
     return history
 def remove_code_block(text):
 def clear_history():
     return []
+def process_image_for_model(image):
+    """Convert image to base64 for model input"""
+    if image is None:
+        return None
+    # Convert numpy array to PIL Image if needed
+    import io
+    import base64
+    import numpy as np
+    from PIL import Image
+    # Handle numpy array from Gradio
+    if isinstance(image, np.ndarray):
+        image = Image.fromarray(image)
+    buffer = io.BytesIO()
+    image.save(buffer, format='PNG')
+    img_str = base64.b64encode(buffer.getvalue()).decode()
+    return f"data:image/png;base64,{img_str}"
+def create_multimodal_message(text, image=None):
+    """Create a multimodal message with text and optional image"""
+    if image is None:
+        return {"role": "user", "content": text}
+    content = [
+        {
+            "type": "text",
+            "text": text
+        },
+        {
+            "type": "image_url",
+            "image_url": {
+                "url": process_image_for_model(image)
+            }
+        }
+    ]
+    return {"role": "user", "content": content}
 def send_to_sandbox(code):
     # Add a wrapper to inject necessary permissions and ensure full HTML
     wrapped_code = f"""
                         current_model_display = gr.Markdown("**Current Model:** DeepSeek V3", visible=False)
                         input = antd.InputTextarea(
                             size="large", allow_clear=True, placeholder="Please enter what kind of application you want", visible=False)
+                        image_input = gr.Image(label="Upload an image (optional)", visible=False)
                         btn = antd.Button("send", type="primary", size="large", visible=False)
                         clear_btn = antd.Button("clear history", type="default", size="large", visible=False)
                             for i, demo_item in enumerate(DEMO_LIST):
                                 with antd.Card(hoverable=True, title=demo_item["title"]) as demoCard:
                                     antd.CardMeta(description=demo_item["description"])
+                                demoCard.click(lambda e, idx=i: (DEMO_LIST[idx]['description'], None), outputs=[input, image_input])
                         antd.Divider("setting", visible=False)
                         with antd.Flex(gap="small", wrap=True, visible=False) as setting_flex:
                         gr.update(visible=False),
                         gr.update(visible=False),
                         gr.update(visible=False),
+                        gr.update(visible=False),
                     )
                 else:
                     return (
                         gr.update(visible=True),
                         gr.update(visible=True),
                         gr.update(visible=True),
+                        gr.update(visible=True),
                     )
+            def generation_code(query: Optional[str], image: Optional[gr.Image], _setting: Dict[str, str], _history: Optional[History], profile: gr.OAuthProfile | None, _current_model: Dict):
                 if profile is None:
                     return (
                         "Please sign in with Hugging Face to use this feature.",
                 if _history is None:
                     _history = []
                 messages = history_to_messages(_history, _setting['system'])
+                # Create multimodal message if image is provided
+                if image is not None:
+                    messages.append(create_multimodal_message(query, image))
+                else:
+                    messages.append({'role': 'user', 'content': query})
                 try:
                     completion = client.chat.completions.create(
             btn.click(
                 generation_code,
+                inputs=[input, image_input, setting, history, current_model],
                 outputs=[code_output, history, sandbox, state_tab, code_drawer]
             )
                 outputs=[
                     login_message,
                     input,
+                    image_input,
                     current_model_display,
                     btn,
                     clear_btn,