Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated about 7 hours ago • 299
ViDoRe Benchmark Collection Benchmark for document retrieval using visual features, introduced in the ColPali paper. Datasets are using the QA format. • 10 items • Updated Jan 23 • 14
view article Article Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints May 1, 2024 • 73
view article Article seemore: Implement a Vision Language Model from Scratch By AviSoori1x • Jun 23, 2024 • 74