BEE-spoke-data
/

smol_llama-101M-GQA-python

+---
+license: apache-2.0
+datasets:
+- BEE-spoke-data/pypi_clean-deduped
+source_model: BEE-spoke-data/smol_llama-101M-GQA
+language:
+- en
+tags:
+- python
+- codegen
+- markdown
+- smol_llama
+metrics:
+  - accuracy
+inference:
+  parameters:
+    max_new_tokens: 64
+    min_new_tokens: 8
+    num_beams: 4
+    early_stopping: true
+    no_repeat_ngram_size: 7
+    repetition_penalty: 1.05
+    renormalize_logits: true
+widget:
+  - text: |
+      def add_numbers(a, b):
+          return
+    example_title: Add Numbers Function
+  - text: |
+      class Car:
+          def __init__(self, make, model):
+              self.make = make
+              self.model = model
+          def display_car(self):
+    example_title: Car Class
+  - text: |
+      import pandas as pd
+      data = {'Name': ['Tom', 'Nick', 'John'], 'Age': [20, 21, 19]}
+      df = pd.DataFrame(data).convert_dtypes() # eda
+    example_title: Pandas DataFrame
+  - text: |
+      def factorial(n):
+          if n == 0:
+              return 1
+          else:
+    example_title: Factorial Function
+  - text: |
+      def fibonacci(n):
+          if n <= 0:
+              raise ValueError("Incorrect input")
+          elif n == 1:
+              return 0
+          elif n == 2:
+              return 1
+          else:
+    example_title: Fibonacci Function
+  - text: |
+      import matplotlib.pyplot as plt import numpy as np
+      x = np.linspace(0, 10, 100)
+      # simple plot
+    example_title: Matplotlib Plot
+  - text: |
+      def reverse_string(s:str) -> str:
+          return
+    example_title: Reverse String Function
+  - text: |
+      def is_palindrome(word:str) -> bool:
+          return
+    example_title: Palindrome Function
+  - text: |
+      def bubble_sort(lst: list):
+          n = len(lst)
+          for i in range(n):
+              for j in range(0, n-i-1):
+    example_title: Bubble Sort Function
+  - text: |
+      def binary_search(arr, low, high, x):
+          if high >= low:
+              mid = (high + low) // 2
+              if arr[mid] == x:
+                  return mid
+              elif arr[mid] > x:
+    example_title: Binary Search Function
+---
+# smol_llama-101M-GQA: python
+> These are some quick notes, will update a bit more over the next few days.
+This is the general pre-trained checkpoint `BEE-spoke-data/smol_llama-101M-GQA` trained further on a deduped version of `pypi` for one epoch.
+- It has the same architecture as the base, the only change being the inclusion (_& training on_) of new Python-related tokens
+- The model appears capable of generating basic Python code and README-style markdown
+- This experiment aims to test how well this model size can handle code generation; meaning **both** its capabilities and limitations.
+Please use with caution & understand that there may still be some bugs 🐛 to work out
+## Usage
+## Usage
+Please consider the following points before using the model:
+1. The model is trained exclusively with the "slow" llama2 tokenizer. Ensure to set `use_fast=False` while loading the tokenizer for optimal performance.
+2. It is recommended to use version `4.33.3` for successful model loading due to a known problem in transformers 4.34.1.
+Note: The use of the tokenizer in the API widget is unclear and may result in additional whitespace.
+Here's how to install the necessary packages and load the model:
+```python
+# pip install transformers==4.33.3 accelerate sentencepiece
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained(
+    "BEE-spoke-data/smol_llama-101M-GQA-python",
+    use_fast=False,
+)
+model = AutoModelForCausalLM.from_pretrained(
+    "BEE-spoke-data/smol_llama-101M-GQA-python",
+    device_map="auto",
+)
+# use as any other decoder
+```
+For code generation tasks, it is recommended to use beam search or similar methods instead of sampling. A more detailed example is also provided in the summary section below.
+### longer code-gen example
+Below is a quick script that can be used as a reference/starting point for writing your own, better one :)