tanikina's picture
add model description
f708205 verified
|
raw
history blame
908 Bytes
metadata
language:
  - en
base_model:
  - allenai/longformer-large-4096

This is the fine-tuned version of the longformer-large-4096 model additionally pre-trained on the S2ORC corpus (Lo et al., 2020), which is a large corpus of 81.1M English-language academic papers from different disciplines. This model uses the weights of the longformer large science checkpoint that was used as the starting point for training the MultiVerS model (Wadden et al., 2022) on the task of scientific claim verification.

Note that the vocabulary size of this model (50275) differs from the original longformer-large-4096 (50265) since 10 new tokens were included:

<|par|>, </|title|>, </|sec|>, <|sec-title|>, <|sent|>, <|title|>, <|abs|>, <|sec|>, </|sec-title|>, </|abs|>.