Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2406.04325

Papers - Video - Captions

Streaming Dense Video Captioning

Paper • 2404.01297 • Published Apr 1, 2024 • 13
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Paper • 2404.16994 • Published Apr 25, 2024 • 36
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6, 2024 • 74

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Paper • 2403.10517 • Published Mar 15, 2024 • 35
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

Paper • 2403.11481 • Published Mar 18, 2024 • 13
VideoMamba: State Space Model for Efficient Video Understanding

Paper • 2403.06977 • Published Mar 11, 2024 • 28
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

Paper • 2403.01422 • Published Mar 3, 2024 • 28

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 15 hours ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 83
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 147
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

Daily paper that worth reading in details later

Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20, 2024 • 95
Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23, 2024 • 71
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27, 2024 • 88
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 46

about 9 hours ago

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 17
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18, 2024 • 9
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18, 2024 • 10
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19, 2024 • 13

TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

Paper • 2312.16862 • Published Dec 28, 2023 • 31
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

Paper • 2312.17172 • Published Dec 28, 2023 • 28
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers

Paper • 2401.01974 • Published Jan 3, 2024 • 7
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Paper • 2401.01885 • Published Jan 3, 2024 • 28

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs