Alibaba-NLP/gte-Qwen2-1.5B-instruct · The used memory increases with the number of input samples

storm2008

Oct 19, 2024

•

edited Oct 19, 2024

I have many long texts (about 3000 words for each), and this model is used to generate the text embeddings. The code is as follows:.

# load data 
datapath = "data/"
filename = "data.csv"
with open(datapath+filename,"r",encoding="gbk") as file:
    csvfile = csv.reader(file)
    data = [tmp[5] for tmp in csvfile]

# import models
tokenizer = AutoTokenizer.from_pretrained('Qwen/gte-Qwen2-1.5B-instruct')
LLMmodel = AutoModel.from_pretrained('Qwen/gte-Qwen2-1.5B-instruct')

# get the embedding of each input
res = torch.Tensor()
for index in range(len(data)):
    currdata = data[index]
    batch_dict = tokenizer(currdata, max_length=8192, 
                                padding=True, truncation=True, return_tensors='pt')
    # The problem occurs in the following code.
    # as the index increases, the used memory always increase, until memory runs out
    outputs = LLMmodel(**batch_dict)
    torch.cuda.empty_cache()  # this code cannot solve this problem
    embeddings = outputs.last_hidden_state[:, -1]
    embeddings = F.normalize(embeddings, p=2, dim=1)

    # res saves the result
    if len(res) == 0:
        res = embeddings
    else:
        res = torch.cat((res,embeddings),dim=0)

How can I solve this problem?

storm2008

Oct 19, 2024

solved, the problem in res = torch.cat((res,embeddings),dim=0)

storm2008 changed discussion status to closed Oct 19, 2024