Post
Here is my selection of papers for today
https://huggingface.co/papers
Compact Neural Graphics Primitives with Learned Hash Probing
Restoration by Generation with Constrained Priors
SSR-Encoder: Encoding Selective Subject
Representation for Subject-Driven Generation
Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks
InsActor: Instruction-driven Physics-based Characters
Unsupervised Universal Image Segmentation
Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
DreamGaussian4D: Generative 4D Gaussian Splatting
City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web
DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision
DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaption by Combining 3D GANs and Diffusion Priors
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Prompt Expansion for Adaptive Text-to-Image Generation
PanGu-Draw
I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models
The LLM Surgeon
MathPile: A Billion-Token-Scale Pretraining Corpus for Math
MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant for Mobile Devices
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
https://huggingface.co/papers
Compact Neural Graphics Primitives with Learned Hash Probing
Restoration by Generation with Constrained Priors
SSR-Encoder: Encoding Selective Subject
Representation for Subject-Driven Generation
Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks
InsActor: Instruction-driven Physics-based Characters
Unsupervised Universal Image Segmentation
Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
DreamGaussian4D: Generative 4D Gaussian Splatting
City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web
DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision
DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaption by Combining 3D GANs and Diffusion Priors
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Prompt Expansion for Adaptive Text-to-Image Generation
PanGu-Draw
I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models
The LLM Surgeon
MathPile: A Billion-Token-Scale Pretraining Corpus for Math
MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant for Mobile Devices
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones