Max positional embedding causes error when exceeding 512.

#3

When I run:

from transformers import AutoModelForMaskedLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("vesteinn/DanskBERT")
model = AutoModelForMaskedLM.from_pretrained("vesteinn/DanskBERT")

text = "very long text "*1000

input_ids = tokenizer(text, return_tensors="pt")
input_ids["input_ids"].shape
# truncate to 512 tokens
input_ids = {k: v[:, :514] for k, v in input_ids.items()}

input_ids["input_ids"].shape

outputs = model.forward(**input_ids)

I get:

...
   2208     # remove once script supports set_grad_enabled
   2209     _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2210 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

IndexError: index out of range in self

Hi Kenneth!

This runs fine if you change 514 to 512 in your example, but I'm guessing you know that.

I was also confused by the max_position_embeddings being set to 514, but this might shed some light on it https://github.com/huggingface/transformers/issues/1363 and https://github.com/facebookresearch/fairseq/issues/1187 . The model was trained with fairseq and then ported to hf.

Ready to merge
This branch is ready to get merged automatically.
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment