Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models Paper • 2402.14714 • Published Feb 22, 2024 • 4
FLEX: Expert-level False-Less EXecution Metric for Reliable Text-to-SQL Benchmark Paper • 2409.19014 • Published Sep 24, 2024
Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge Paper • 2502.16457 • Published 18 days ago • 11
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning Paper • 2502.17407 • Published 17 days ago • 24
Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge Paper • 2502.16457 • Published 18 days ago • 11
Gemma 2: Improving Open Language Models at a Practical Size Paper • 2408.00118 • Published Jul 31, 2024 • 77
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models Paper • 2404.02575 • Published Apr 3, 2024 • 50