Upload folder using huggingface_hub

Browse files

Files changed (13) hide show

README.md +196 -16
adapter_config.json +4 -4
adapter_model.safetensors +1 -1
optimizer.pt +3 -0
rng_state.pth +3 -0
scaler.pt +3 -0
scheduler.pt +3 -0
special_tokens_map.json +40 -0
tokenizer.json +0 -0
tokenizer.model +3 -0
tokenizer_config.json +89 -0
trainer_state.json +376 -0
training_args.bin +3 -0

README.md CHANGED Viewed

@@ -1,22 +1,202 @@
 ---
-base_model: unsloth/codellama-7b-bnb-4bit
-tags:
-- text-generation-inference
-- transformers
-- unsloth
-- llama
-- trl
-license: apache-2.0
-language:
-- en
 ---
-# Uploaded  model
-- **Developed by:** CodexAI
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/codellama-7b-bnb-4bit
-This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 ---
+base_model: unsloth/codellama-7b
+library_name: peft
 ---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.14.0

adapter_config.json CHANGED Viewed

@@ -23,13 +23,13 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "down_proj",
     "q_proj",
-    "v_proj",
-    "up_proj",
     "gate_proj",
-    "o_proj",
-    "k_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
+    "k_proj",
+    "up_proj",
     "down_proj",
+    "o_proj",
     "q_proj",
     "gate_proj",
+    "v_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:914aa42a2779a3f14f1c1a7106eada31dd3f1a0b6da3d3a0cada54169b777005
 size 159967880

 version https://git-lfs.github.com/spec/v1
+oid sha256:8e5825365e8f35ad2be0fba4551dd0bb1a89c29127006b550f42dbe48103b58c
 size 159967880

optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6572214ae96790947225891855adcce0a29f6be3999bd0eb846418c1764c449e
+size 81730644

rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d4c917636c7a58af68a29056522a757e9f9b99005b776641aa157c536967817d
+size 14244

scaler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c801982aae9be06d302403c1fff693e53dedf89c1d3b689ee29fedad84a96d23
+size 988

scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5836d5feabbfa4a8a0dfb2d2daf51efcfd7a4705772f290bfa6fbba9841feacb
+size 1064

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,40 @@

+{
+  "additional_special_tokens": [
+    "▁<PRE>",
+    "▁<MID>",
+    "▁<SUF>",
+    "▁<EOT>",
+    "▁<PRE>",
+    "▁<MID>",
+    "▁<SUF>",
+    "▁<EOT>"
+  ],
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:45ccb9c8b6b561889acea59191d66986d314e7cbd6a78abc6e49b139ca91c1e6
+size 500058

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,89 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32007": {
+      "content": "▁<PRE>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32008": {
+      "content": "▁<SUF>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32009": {
+      "content": "▁<MID>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32010": {
+      "content": "▁<EOT>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "▁<PRE>",
+    "▁<MID>",
+    "▁<SUF>",
+    "▁<EOT>",
+    "▁<PRE>",
+    "▁<MID>",
+    "▁<SUF>",
+    "▁<EOT>"
+  ],
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "eot_token": "▁<EOT>",
+  "extra_special_tokens": {},
+  "fill_token": "<FILL_ME>",
+  "legacy": null,
+  "middle_token": "▁<MID>",
+  "model_max_length": 16384,
+  "pad_token": "<unk>",
+  "padding_side": "right",
+  "prefix_token": "▁<PRE>",
+  "sp_model_kwargs": {},
+  "suffix_token": "▁<SUF>",
+  "tokenizer_class": "CodeLlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

trainer_state.json ADDED Viewed

	@@ -0,0 +1,376 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.0,
+  "eval_steps": 500,
+  "global_step": 4849,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.00020622808826562179,
+      "grad_norm": 0.23989851772785187,
+      "learning_rate": 2.061855670103093e-07,
+      "loss": 0.3507,
+      "step": 1
+    },
+    {
+      "epoch": 0.02062280882656218,
+      "grad_norm": 0.46579787135124207,
+      "learning_rate": 2.0618556701030927e-05,
+      "loss": 0.6141,
+      "step": 100
+    },
+    {
+      "epoch": 0.04124561765312436,
+      "grad_norm": 0.20079994201660156,
+      "learning_rate": 4.1237113402061855e-05,
+      "loss": 0.6044,
+      "step": 200
+    },
+    {
+      "epoch": 0.06186842647968653,
+      "grad_norm": 0.33593931794166565,
+      "learning_rate": 6.185567010309279e-05,
+      "loss": 0.5863,
+      "step": 300
+    },
+    {
+      "epoch": 0.08249123530624872,
+      "grad_norm": 0.5399625897407532,
+      "learning_rate": 8.247422680412371e-05,
+      "loss": 0.5629,
+      "step": 400
+    },
+    {
+      "epoch": 0.10311404413281089,
+      "grad_norm": 0.5116047859191895,
+      "learning_rate": 0.00010309278350515463,
+      "loss": 0.4852,
+      "step": 500
+    },
+    {
+      "epoch": 0.12373685295937306,
+      "grad_norm": 0.39057251811027527,
+      "learning_rate": 0.00012371134020618558,
+      "loss": 0.5038,
+      "step": 600
+    },
+    {
+      "epoch": 0.14435966178593523,
+      "grad_norm": 0.35319826006889343,
+      "learning_rate": 0.0001443298969072165,
+      "loss": 0.4892,
+      "step": 700
+    },
+    {
+      "epoch": 0.16498247061249743,
+      "grad_norm": 0.2662217319011688,
+      "learning_rate": 0.00016494845360824742,
+      "loss": 0.5089,
+      "step": 800
+    },
+    {
+      "epoch": 0.1856052794390596,
+      "grad_norm": 0.535358190536499,
+      "learning_rate": 0.00018556701030927837,
+      "loss": 0.4885,
+      "step": 900
+    },
+    {
+      "epoch": 0.20622808826562178,
+      "grad_norm": 0.2716768682003021,
+      "learning_rate": 0.00019997048441912246,
+      "loss": 0.5124,
+      "step": 1000
+    },
+    {
+      "epoch": 0.22685089709218395,
+      "grad_norm": 0.4777716100215912,
+      "learning_rate": 0.00019944624754044668,
+      "loss": 0.4911,
+      "step": 1100
+    },
+    {
+      "epoch": 0.24747370591874612,
+      "grad_norm": 0.2679508626461029,
+      "learning_rate": 0.00019827006532530193,
+      "loss": 0.4793,
+      "step": 1200
+    },
+    {
+      "epoch": 0.2680965147453083,
+      "grad_norm": 0.4718882739543915,
+      "learning_rate": 0.00019644964853733152,
+      "loss": 0.4712,
+      "step": 1300
+    },
+    {
+      "epoch": 0.28871932357187047,
+      "grad_norm": 0.2879483699798584,
+      "learning_rate": 0.00019399693138486107,
+      "loss": 0.5119,
+      "step": 1400
+    },
+    {
+      "epoch": 0.30934213239843267,
+      "grad_norm": 0.5847165584564209,
+      "learning_rate": 0.0001909279932831403,
+      "loss": 0.4973,
+      "step": 1500
+    },
+    {
+      "epoch": 0.32996494122499487,
+      "grad_norm": 0.3073234558105469,
+      "learning_rate": 0.0001872629534416197,
+      "loss": 0.4996,
+      "step": 1600
+    },
+    {
+      "epoch": 0.350587750051557,
+      "grad_norm": 0.2527044713497162,
+      "learning_rate": 0.00018302583896732187,
+      "loss": 0.4805,
+      "step": 1700
+    },
+    {
+      "epoch": 0.3712105588781192,
+      "grad_norm": 0.2731375992298126,
+      "learning_rate": 0.00017824442734898997,
+      "loss": 0.4934,
+      "step": 1800
+    },
+    {
+      "epoch": 0.39183336770468136,
+      "grad_norm": 0.4609294533729553,
+      "learning_rate": 0.00017295006435464848,
+      "loss": 0.4947,
+      "step": 1900
+    },
+    {
+      "epoch": 0.41245617653124356,
+      "grad_norm": 0.4430903494358063,
+      "learning_rate": 0.0001671774585363957,
+      "loss": 0.4581,
+      "step": 2000
+    },
+    {
+      "epoch": 0.43307898535780576,
+      "grad_norm": 0.39363059401512146,
+      "learning_rate": 0.00016096445368960415,
+      "loss": 0.4923,
+      "step": 2100
+    },
+    {
+      "epoch": 0.4537017941843679,
+      "grad_norm": 0.3113062083721161,
+      "learning_rate": 0.000154351780758231,
+      "loss": 0.5157,
+      "step": 2200
+    },
+    {
+      "epoch": 0.4743246030109301,
+      "grad_norm": 0.5598001480102539,
+      "learning_rate": 0.00014738279081268692,
+      "loss": 0.4948,
+      "step": 2300
+    },
+    {
+      "epoch": 0.49494741183749225,
+      "grad_norm": 0.5283440351486206,
+      "learning_rate": 0.00014010317085079503,
+      "loss": 0.4905,
+      "step": 2400
+    },
+    {
+      "epoch": 0.5155702206640544,
+      "grad_norm": 0.6192511916160583,
+      "learning_rate": 0.00013256064428497966,
+      "loss": 0.4947,
+      "step": 2500
+    },
+    {
+      "epoch": 0.5361930294906166,
+      "grad_norm": 0.385551393032074,
+      "learning_rate": 0.00012480465807921773,
+      "loss": 0.5283,
+      "step": 2600
+    },
+    {
+      "epoch": 0.5568158383171788,
+      "grad_norm": 0.40230363607406616,
+      "learning_rate": 0.00011688605858680692,
+      "loss": 0.5069,
+      "step": 2700
+    },
+    {
+      "epoch": 0.5774386471437409,
+      "grad_norm": 0.4681544005870819,
+      "learning_rate": 0.00010885675821407844,
+      "loss": 0.5028,
+      "step": 2800
+    },
+    {
+      "epoch": 0.5980614559703031,
+      "grad_norm": 0.35956326127052307,
+      "learning_rate": 0.00010076939509532679,
+      "loss": 0.5358,
+      "step": 2900
+    },
+    {
+      "epoch": 0.6186842647968653,
+      "grad_norm": 0.4002954065799713,
+      "learning_rate": 9.267698801004341e-05,
+      "loss": 0.4941,
+      "step": 3000
+    },
+    {
+      "epoch": 0.6393070736234275,
+      "grad_norm": 0.5405189990997314,
+      "learning_rate": 8.463258880473373e-05,
+      "loss": 0.5329,
+      "step": 3100
+    },
+    {
+      "epoch": 0.6599298824499897,
+      "grad_norm": 0.3197394609451294,
+      "learning_rate": 7.668893459795486e-05,
+      "loss": 0.5125,
+      "step": 3200
+    },
+    {
+      "epoch": 0.6805526912765518,
+      "grad_norm": 0.3258844017982483,
+      "learning_rate": 6.889810204863274e-05,
+      "loss": 0.5353,
+      "step": 3300
+    },
+    {
+      "epoch": 0.701175500103114,
+      "grad_norm": 0.4742829203605652,
+      "learning_rate": 6.131116595419178e-05,
+      "loss": 0.5276,
+      "step": 3400
+    },
+    {
+      "epoch": 0.7217983089296762,
+      "grad_norm": 0.2836833894252777,
+      "learning_rate": 5.397786441664373e-05,
+      "loss": 0.5076,
+      "step": 3500
+    },
+    {
+      "epoch": 0.7424211177562384,
+      "grad_norm": 0.3171875476837158,
+      "learning_rate": 4.6946272771725984e-05,
+      "loss": 0.5163,
+      "step": 3600
+    },
+    {
+      "epoch": 0.7630439265828006,
+      "grad_norm": 0.24590329825878143,
+      "learning_rate": 4.026248841872946e-05,
+      "loss": 0.5259,
+      "step": 3700
+    },
+    {
+      "epoch": 0.7836667354093627,
+      "grad_norm": 0.6508501768112183,
+      "learning_rate": 3.397032861719556e-05,
+      "loss": 0.5589,
+      "step": 3800
+    },
+    {
+      "epoch": 0.8042895442359249,
+      "grad_norm": 0.4867592751979828,
+      "learning_rate": 2.811104323165301e-05,
+      "loss": 0.5646,
+      "step": 3900
+    },
+    {
+      "epoch": 0.8249123530624871,
+      "grad_norm": 0.312364399433136,
+      "learning_rate": 2.2723044307569775e-05,
+      "loss": 0.5721,
+      "step": 4000
+    },
+    {
+      "epoch": 0.8455351618890493,
+      "grad_norm": 0.4810425341129303,
+      "learning_rate": 1.7887852500751822e-05,
+      "loss": 0.5181,
+      "step": 4100
+    },
+    {
+      "epoch": 0.8661579707156115,
+      "grad_norm": 0.3198936879634857,
+      "learning_rate": 1.3539539439376515e-05,
+      "loss": 0.5756,
+      "step": 4200
+    },
+    {
+      "epoch": 0.8867807795421736,
+      "grad_norm": 0.4786675274372101,
+      "learning_rate": 9.75804006323886e-06,
+      "loss": 0.5487,
+      "step": 4300
+    },
+    {
+      "epoch": 0.9074035883687358,
+      "grad_norm": 0.4790056347846985,
+      "learning_rate": 6.568144959657263e-06,
+      "loss": 0.5506,
+      "step": 4400
+    },
+    {
+      "epoch": 0.928026397195298,
+      "grad_norm": 0.587523877620697,
+      "learning_rate": 4.013449264074187e-06,
+      "loss": 0.5741,
+      "step": 4500
+    },
+    {
+      "epoch": 0.9486492060218602,
+      "grad_norm": 0.41147923469543457,
+      "learning_rate": 2.059119419840494e-06,
+      "loss": 0.5414,
+      "step": 4600
+    },
+    {
+      "epoch": 0.9692720148484224,
+      "grad_norm": 0.5501502752304077,
+      "learning_rate": 7.468660935561755e-07,
+      "loss": 0.6101,
+      "step": 4700
+    },
+    {
+      "epoch": 0.9898948236749845,
+      "grad_norm": 0.45654186606407166,
+      "learning_rate": 8.529209787123682e-08,
+      "loss": 0.5645,
+      "step": 4800
+    }
+  ],
+  "logging_steps": 100,
+  "max_steps": 4849,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.3135703548790374e+17,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fcdb031000815991decb49e64c1ff52b41b328f05666ceaf823e770ffa5bde97
+size 5624