Spaces:

burhan112
/

Multimodal_Ask-the-Image_Mini-App

Sleeping

burhan112 commited on May 1

Commit

8df60f2

verified ·

1 Parent(s): 0845f18

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -9,7 +9,7 @@ def multimodal_qa_app(image: Image.Image, audio_path: str):
     question_text = transcribe_audio(audio_path)
     answer = get_image_answer(image, question_text)
     audio_response = text_to_speech(answer)
-    return answer, audio_response
 interface = gr.Interface(
     fn=multimodal_qa_app,
@@ -18,6 +18,7 @@ interface = gr.Interface(
         gr.Audio(type="filepath", label="Ask a Question via Mic (10s max)")
     ],
     outputs=[
         gr.Textbox(label="Answer"),
         gr.Audio(label="Spoken Answer")
     ],

     question_text = transcribe_audio(audio_path)
     answer = get_image_answer(image, question_text)
     audio_response = text_to_speech(answer)
+    return question_text, answer, audio_response
 interface = gr.Interface(
     fn=multimodal_qa_app,
         gr.Audio(type="filepath", label="Ask a Question via Mic (10s max)")
     ],
     outputs=[
+        gr.Textbox(label="Transcribed Question"),
         gr.Textbox(label="Answer"),
         gr.Audio(label="Spoken Answer")
     ],