Submitted by Henghui Ding - MeViS: A Multi-Modal Dataset for Referring Motion Expression Video Segmentation FudanCVL 1