Everyday Physics in Korean Contexts: A Culturally Grounded Physical Reasoning Benchmark Paper • 2509.17807 • Published Sep 22, 2025 • 1
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures Paper • 2510.24081 • Published Oct 28, 2025 • 18
COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs Paper • 2601.01836 • Published 3 days ago • 5
COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs Paper • 2601.01836 • Published 3 days ago • 5
When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs Paper • 2508.03365 • Published Aug 5, 2025 • 4
Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study Paper • 2505.15389 • Published May 21, 2025 • 8
Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering Paper • 2503.15879 • Published Mar 20, 2025 • 6
REFIND: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models Paper • 2502.13622 • Published Feb 19, 2025 • 4