SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions Paper • 2506.23046 • Published Jun 29 • 1
Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs Paper • 2406.11695 • Published Jun 17, 2024 • 2
Mind the Gap! Static and Interactive Evaluations of Large Audio Models Paper • 2502.15919 • Published Feb 21 • 4
Mind the Gap! Static and Interactive Evaluations of Large Audio Models Paper • 2502.15919 • Published Feb 21 • 4
Grounded Persuasive Language Generation for Automated Marketing Paper • 2502.16810 • Published Feb 24 • 13
HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions Paper • 2409.16427 • Published Sep 24, 2024 • 1
What Are Tools Anyway? A Survey from the Language Model Perspective Paper • 2403.15452 • Published Mar 18, 2024
Distilling an End-to-End Voice Assistant Without Instruction Training Data Paper • 2410.02678 • Published Oct 3, 2024 • 23
ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability Assessment Paper • 2305.14463 • Published May 23, 2023
Revisiting non-English Text Simplification: A Unified Multilingual Benchmark Paper • 2305.15678 • Published May 25, 2023
SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents Paper • 2403.08715 • Published Mar 13, 2024 • 21
WebArena: A Realistic Web Environment for Building Autonomous Agents Paper • 2307.13854 • Published Jul 25, 2023 • 25
Don't Copy the Teacher: Data and Model Challenges in Embodied Dialogue Paper • 2210.04443 • Published Oct 10, 2022
COBRA Frames: Contextual Reasoning about Effects and Harms of Offensive Statements Paper • 2306.01985 • Published Jun 3, 2023 • 1
Unintended Impacts of LLM Alignment on Global Representation Paper • 2402.15018 • Published Feb 22, 2024 • 1
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents Paper • 2310.11667 • Published Oct 18, 2023 • 4
FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation Paper • 1810.10147 • Published Oct 24, 2018