Generate text based on images and prompts
Convert spoken words into text
Generate images from text descriptions
Transcribe audio from microphone, file, or YouTube link