knowledgator
/

Sheared-LLaMA-encoder-1.3B

@@ -19,6 +19,17 @@ tags:
 - **Repository:** https://github.com/McGill-NLP/llm2vec
 - **Paper:** https://arxiv.org/abs/2404.05961
 ## Installation
 ```bash
@@ -27,65 +38,130 @@ pip install llm2vec
 ## Usage
 ```python
-from llm2vec import LLM2Vec
 import torch
-from transformers import AutoTokenizer, AutoModel, AutoConfig
-from peft import PeftModel
 # Loading base Mistral model, along with custom code that enables bidirectional connections in decoder-only LLMs. MNTP LoRA weights are merged into the base model.
 tokenizer = AutoTokenizer.from_pretrained(
-    "McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp"
 )
-config = AutoConfig.from_pretrained(
-    "McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp", trust_remote_code=True
-)
-model = AutoModel.from_pretrained(
-    "McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp",
-    trust_remote_code=True,
-    config=config,
-    torch_dtype=torch.bfloat16,
-    device_map="cuda" if torch.cuda.is_available() else "cpu",
 )
-model = PeftModel.from_pretrained(
-    model,
-    "McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp",
 )
-model = model.merge_and_unload()  # This can take several minutes on cpu
-# Loading supervised model. This loads the trained LoRA weights on top of MNTP model. Hence the final weights are -- Base model + MNTP (LoRA) + supervised (LoRA).
-model = PeftModel.from_pretrained(
-    model, "McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp-supervised"
 )
-# Wrapper for encoding and pooling operations
-l2v = LLM2Vec(model, tokenizer, pooling_mode="mean", max_length=512)
-# Encoding queries using instructions
-instruction = (
-    "Given a web search query, retrieve relevant passages that answer the query:"
-)
-queries = [
-    [instruction, "how much protein should a female eat"],
-    [instruction, "summit define"],
-]
-q_reps = l2v.encode(queries)
-# Encoding documents. Instruction are not required for documents
-documents = [
-    "As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.",
-    "Definition of summit for English Language Learners. : 1  the highest point of a mountain : the top of a mountain. : 2  the highest level. : 3  a meeting or series of meetings between the leaders of two or more governments.",
-]
-d_reps = l2v.encode(documents)
-# Compute cosine similarity
-q_reps_norm = torch.nn.functional.normalize(q_reps, p=2, dim=1)
-d_reps_norm = torch.nn.functional.normalize(d_reps, p=2, dim=1)
-cos_sim = torch.mm(q_reps_norm, d_reps_norm.transpose(0, 1))
-print(cos_sim)
-"""
-tensor([[0.6500, 0.1291],
-        [0.0916, 0.4733]])
-"""
-```

 - **Repository:** https://github.com/McGill-NLP/llm2vec
 - **Paper:** https://arxiv.org/abs/2404.05961
+## Overview:
+This is a bi-directional version of Sheared-LLaMA-1.3B trained with masked token prediction on the Wikipedia dataset. Modern decoder models offer several advantages over classical encoders like BERT:
+They are pre-trained on more recent textual corpora
+They are trained on larger and more diverse datasets
+Modern decoders have better support for long-context windows
+Flash-attention support is available for these models
+Considering these benefits, we are excited to release a series of decoder models tuned to work in a bi-directional setting. This approach combines the strengths of modern decoder architectures with the versatility of bi-directional context understanding, potentially opening up new possibilities for various natural language processing tasks, such as NER.
+In comparison to original LLM2Vec we trained all weights of LLama model, it potentially improve bi-directional abilities of the model.
 ## Installation
 ```bash
 ## Usage
 ```python
+from llm2vec.models import LlamaBiModel
 import torch
+from transformers import AutoTokenizer
 # Loading base Mistral model, along with custom code that enables bidirectional connections in decoder-only LLMs. MNTP LoRA weights are merged into the base model.
 tokenizer = AutoTokenizer.from_pretrained(
+    "knowledgator/Sheared-LLaMA-encoder-1.3B"
 )
+model = LLamaBiModel.from_pretrained("knowledgator/Sheared-LLaMA-encoder-1.3B")
+text = "The quick brown fox jumps over the lazy dog."
+inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model = model.to(device)
+inputs = {k: v.to(device) for k, v in inputs.items()}
+with torch.no_grad():
+    outputs = model(**inputs)
+last_hidden_states = outputs.last_hidden_state
+```
+Here's an improved and expanded version of the README snippet:
+## Adapting for Different Discriminative Tasks
+Our bi-directional LLaMA model can be easily adapted for various discriminative tasks such as text classification, question answering, and token classification.
+To use these specialized versions, we provide a [fork of LLM2Vec](https://github.com/Knowledgator/llm2vec) with additional functionality.
+### Installation
+To get started, clone our fork of LLM2Vec and install it:
+```bash
+git clone https://github.com/Knowledgator/llm2vec.git
+cd llm2vec
+pip install -e .
+```
+Using `-e` flag installs the package in editable mode, which is useful for development.
+### Usage
+Here's how to import and use the models for different tasks:
+```python
+from llm2vec import (
+    AutoLLMEncoderForSequenceClassification,
+    AutoLLMEncoderForQuestionAnswering,
+    AutoLLMEncoderForTokenClassification
 )
+# Load models for different tasks
+classification_model = AutoLLMEncoderForSequenceClassification.from_pretrained('knowledgator/Sheared-LLaMA-encoder-1.3B')
+question_answering_model = AutoLLMEncoderForQuestionAnswering.from_pretrained('knowledgator/Sheared-LLaMA-encoder-1.3B')
+token_classification_model = AutoLLMEncoderForTokenClassification.from_pretrained('knowledgator/Sheared-LLaMA-encoder-1.3B')
+```
+### Example: Text Classification
+Here's a basic example of how to use the model for text classification:
+```python
+from transformers import AutoTokenizer
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained('knowledgator/Sheared-LLaMA-encoder-1.3B')
+# Prepare input
+text = "This movie is great!"
+inputs = tokenizer(text, return_tensors="pt")
+# Get classification logits
+outputs = classification_model(**inputs)
+logits = outputs.logits
+# The logits can be used with a softmax function to get probabilities
+# or you can use torch.argmax(logits, dim=1) to get the predicted class
+```
+### Fine-tuning
+To fine-tune these models on your specific task:
+1. Prepare your dataset in a format compatible with HuggingFace's `datasets` library.
+2. Use the `Trainer` class from HuggingFace's `transformers` library to fine-tune the model.
+Here's a basic example:
+```python
+from transformers import Trainer, TrainingArguments
+from datasets import load_dataset
+# Load your dataset
+dataset = load_dataset("your_dataset")
+# Define training arguments
+training_args = TrainingArguments(
+    output_dir="./results",
+    num_train_epochs=3,
+    per_device_train_batch_size=8,
+    per_device_eval_batch_size=8,
+    warmup_steps=500,
+    weight_decay=0.01,
+    logging_dir="./logs",
 )
+# Initialize Trainer
+trainer = Trainer(
+    model=classification_model,
+    args=training_args,
+    train_dataset=dataset["train"],
+    eval_dataset=dataset["test"],
 )
+# Fine-tune the model
+trainer.train()
+```
+### Contributing
+We welcome contributions! If you have suggestions for improvements or encounter any issues, please open an issue or submit a pull request on our [GitHub repository](https://github.com/Knowledgator/llm2vec).