-
kuleshov-group/bd3lm-owt-block_size16
Fill-Mask • Updated • 4.25k • 2 -
kuleshov-group/bd3lm-owt-block_size4
Fill-Mask • Updated • 1.48k • 1 -
kuleshov-group/bd3lm-owt-block_size8
Fill-Mask • Updated • 1.4k • 1 -
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Paper • 2503.09573 • Published • 42
Collections
Discover the best community collections!
Collections including paper arxiv:2503.09573
-
Making Multimodal Generation Easier: When Diffusion Models Meet LLMs
Paper • 2310.08949 • Published • 1 -
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Paper • 2503.09573 • Published • 42 -
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Paper • 2308.04729 • Published • 32 -
PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation
Paper • 2411.08307 • Published • 6
-
RuCCoD: Towards Automated ICD Coding in Russian
Paper • 2502.21263 • Published • 122 -
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 104 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 42 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 25
-
FAN: Fourier Analysis Networks
Paper • 2410.02675 • Published • 26 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 84 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 21 -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 7
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 58 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 42 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 57
-
Depth Anything V2
Paper • 2406.09414 • Published • 97 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 51 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 38 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 113
-
RoFormer: Enhanced Transformer with Rotary Position Embedding
Paper • 2104.09864 • Published • 12 -
Attention Is All You Need
Paper • 1706.03762 • Published • 55 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 61 -
Zero-Shot Tokenizer Transfer
Paper • 2405.07883 • Published • 5
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 21 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 147 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 13 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 56 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 47