AutoCaptioner

API errors: Ensure your Together API key is set correctly
Unsupported formats: Only .png, .jpg, .jpeg, and .webp files are supported
Memory issues: For very large images, try processing in smaller batches

A tool to automatically

What is AutoCaptioner?

AutoCaptioner creates detailed, principled image captions for your LoRA dataset. These captions can be used to:

Create the virtual environment:

python -m venv venv
source venv/bin/activate
python -m pip install -r requirements.txt

Launch a user-friendly web interface for captioning and prompt optimization:

python demo.py

AutoCaptioner leverages the Llama-4-Maverick model through the Together AI platform to:

Original Image	Generated Caption
	"tr1gger photorealistic, mountain landscape, hiking gear and outdoor clothing, standing at viewpoint, awestruck, snow-capped peaks and valley, golden sunset light, wide-angle view"
	"tr1gger photorealistic, urban architecture, business suit with briefcase, walking on sidewalk, focused expression, downtown skyscrapers, evening city lights, street-level view"
	"tr1gger photorealistic, culinary presentation, chef's uniform and hat, plating dish, concentrated, modern restaurant kitchen, soft studio lighting, overhead camera angle"