Update README.md
Browse files
README.md
CHANGED
@@ -70,3 +70,67 @@ The following hyperparameters were used during training:
|
|
70 |
- Pytorch 1.13.0+cu116
|
71 |
- Datasets 2.8.0
|
72 |
- Tokenizers 0.13.2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
70 |
- Pytorch 1.13.0+cu116
|
71 |
- Datasets 2.8.0
|
72 |
- Tokenizers 0.13.2
|
73 |
+
# Day 1
|
74 |
+
|
75 |
+
1. Tried to use the Neural Magic Model "neuralmagic/oBERT-12-upstream-pruned-unstructured-97". The macro and micro f1 scores were much smaller at the
|
76 |
+
beginning of the model; the initial step did not increase much. However, it did outperform in the same epoch by .159 difference in the f1 score.
|
77 |
+
2. Modification of the code was more significant was able to add errors in my program to move to the CPU if there was an error in my program
|
78 |
+
``` Python
|
79 |
+
import gc
|
80 |
+
'''
|
81 |
+
Try and Catch the block when training the model using more memory than the GPU, it will produce an error.
|
82 |
+
1. Check the Amount of GPU memory used
|
83 |
+
2. Move the model to the CPU
|
84 |
+
3. Call the garbage collector
|
85 |
+
4. Free the GPU memory in the cache
|
86 |
+
5. Check the amount of GPU memory used to see if it is freed
|
87 |
+
'''
|
88 |
+
def check_gpu_memory():
|
89 |
+
print(torch.cuda.memory_allocated()/1e9)
|
90 |
+
return torch.cuda.memory_allocated()/1e9
|
91 |
+
try:
|
92 |
+
trainer.train()
|
93 |
+
except RuntimeError as e:
|
94 |
+
if "CUDA out of memory" in str(e):
|
95 |
+
print("CUDA out of memory")
|
96 |
+
print("Let's free some GPU memory and re-allocate")
|
97 |
+
check_gpu_memory()
|
98 |
+
## Move the model to CPU
|
99 |
+
model.to("cpu")
|
100 |
+
gc.collect()
|
101 |
+
## Free the GPU memory
|
102 |
+
torch.cuda.empty_cache()
|
103 |
+
check_gpu_memory()
|
104 |
+
else:
|
105 |
+
raise e
|
106 |
+
```
|
107 |
+
4. Able to check if there was a number of support my model can support in my model
|
108 |
+
``` Python
|
109 |
+
from transformers import Trainer, TrainingArguments
|
110 |
+
def is_on_colab():
|
111 |
+
if 'google.colab' in sys.modules:
|
112 |
+
return True
|
113 |
+
return False
|
114 |
+
|
115 |
+
training_args_fine_tune = TrainingArguments(
|
116 |
+
output_dir = "./multi-label-class-classification-on-github-issues" ,
|
117 |
+
num_train_epochs = 15,
|
118 |
+
learning_rate = 3e-5,
|
119 |
+
per_device_train_batch_size = 64 ,
|
120 |
+
evaluation_strategy = "epoch" ,
|
121 |
+
save_strategy="epoch" ,
|
122 |
+
load_best_model_at_end=True,
|
123 |
+
metric_for_best_model='micro f1',
|
124 |
+
save_total_limit=1,
|
125 |
+
log_level='error',
|
126 |
+
push_to_hub = True if is_on_colab else False ,
|
127 |
+
)
|
128 |
+
if torch.cuda.is_available():
|
129 |
+
## check if the Cuda GPU can bfloat16
|
130 |
+
if torch.cuda.is_bf16_supported():
|
131 |
+
print("Cuda GPU can support bfloat16")
|
132 |
+
training_args_fine_tune.fp16 = True
|
133 |
+
else:
|
134 |
+
print("Cuda GPU cannot support bfloat16 so instead we will use float16 ")
|
135 |
+
training_args_fine_tune.fp16 = True
|
136 |
+
```
|