metadata
language:
- en
base_model:
- allenai/longformer-large-4096
This is the fine-tuned version of the longformer-large-4096
model additionally pre-trained on the S2ORC corpus (Lo et al., 2020), which is a large corpus of 81.1M English-language academic papers from different disciplines. This model uses the weights of the longformer large science checkpoint that was used as the starting point for training the MultiVerS model (Wadden et al., 2022) on the task of scientific claim verification.
Note that the vocabulary size of this model (50275) differs from the original longformer-large-4096
(50265) since 10 new tokens were included:
<|par|>, </|title|>, </|sec|>, <|sec-title|>, <|sent|>, <|title|>, <|abs|>, <|sec|>, </|sec-title|>, </|abs|>
.