Establishing Baselines for Text Classification in Low-Resource Languages Paper • 2005.02068 • Published May 5, 2020
Improving Large-scale Language Models and Resources for Filipino Paper • 2111.06053 • Published Nov 11, 2021
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Paper • 2410.12705 • Published Oct 16, 2024 • 32
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published 5 days ago • 89
Simplifying Paragraph-level Question Generation via Transformer Language Models Paper • 2005.01107 • Published May 3, 2020
Evaluating Language Model Finetuning Techniques for Low-resource Languages Paper • 1907.00409 • Published Jun 30, 2019
Exploiting News Article Structure for Automatic Corpus Generation of Entailment Datasets Paper • 2010.11574 • Published Oct 22, 2020
Automatic WordNet Construction using Word Sense Induction through Sentence Embeddings Paper • 2204.03251 • Published Apr 7, 2022
Multilingual Large Language Models Are Not (Yet) Code-Switchers Paper • 2305.14235 • Published May 23, 2023
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark Paper • 2406.05967 • Published Jun 10, 2024 • 6
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages Paper • 2406.10118 • Published Jun 14, 2024 • 32