End of training

Browse files

Files changed (13) hide show

.hydra/hydra.yaml +2 -2
README.md +17 -17
config.json +0 -2
configuration_measurement_pred.py +0 -2
logs/events.out.tfevents.1734712122.sac.ist.berkeley.edu.769282.0 +3 -0
model-00001-of-00002.safetensors +1 -1
model-00002-of-00002.safetensors +1 -1
modeling_gpt_neox_measurement_pred.py +5 -2
modeling_measurement_pred.py +19 -17
sensor_loc_stories.py +2 -0
sensor_locs_from_token.py +2 -0
train.log +1 -1
training_args.bin +1 -1

.hydra/hydra.yaml CHANGED Viewed

@@ -141,7 +141,7 @@ hydra:
     name: train
     chdir: null
     override_dirname: ''
-    id: '746838'
     num: 0
     config_name: pythia_stories_slurm
     env_set: {}
@@ -165,7 +165,7 @@ hydra:
     - path: ''
       schema: structured
       provider: schema
-    output_dir: /nas/ucb/oliveradk/measurement-pred/multirun/2024-12-16/19-29-19/0
     choices:
       hparams: hparams
       model: pythia_stories

     name: train
     chdir: null
     override_dirname: ''
+    id: '749101'
     num: 0
     config_name: pythia_stories_slurm
     env_set: {}
     - path: ''
       schema: structured
       provider: schema
+    output_dir: /nas/ucb/oliveradk/measurement-pred/multirun/2024-12-20/08-28-21/0
     choices:
       hparams: hparams
       model: pythia_stories

README.md CHANGED Viewed

@@ -6,27 +6,27 @@ tags:
 metrics:
 - accuracy
 model-index:
-- name: pythia-1.4b-deduped-measurement_pred-generated_stories
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# pythia-1.4b-deduped-measurement_pred-generated_stories
 This model is a fine-tuned version of [EleutherAI/pythia-1.4b-deduped](https://huggingface.co/EleutherAI/pythia-1.4b-deduped) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.7419
-- Accuracy: 0.8563
-- Accuracy Sensor 0: 0.8533
-- Auroc Sensor 0: 0.9416
-- Accuracy Sensor 1: 0.8563
-- Auroc Sensor 1: 0.9319
-- Accuracy Sensor 2: 0.8593
-- Auroc Sensor 2: 0.9223
-- Accuracy Aggregated: 0.8563
-- Auroc Aggregated: 0.9306
 ## Model description
@@ -61,11 +61,11 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss | Accuracy | Accuracy Sensor 0 | Auroc Sensor 0 | Accuracy Sensor 1 | Auroc Sensor 1 | Accuracy Sensor 2 | Auroc Sensor 2 | Accuracy Aggregated | Auroc Aggregated |
 |:-------------:|:------:|:----:|:---------------:|:--------:|:-----------------:|:--------------:|:-----------------:|:--------------:|:-----------------:|:--------------:|:-------------------:|:----------------:|
-| No log        | 0.9948 | 119  | 0.4279          | 0.7922   | 0.8000            | 0.9172         | 0.7852            | 0.8960         | 0.7689            | 0.8847         | 0.8148              | 0.9045           |
-| 0.5961        | 1.9979 | 239  | 0.3688          | 0.8348   | 0.8504            | 0.9329         | 0.8474            | 0.9203         | 0.8207            | 0.9080         | 0.8207              | 0.9240           |
-| 0.3567        | 2.9927 | 358  | 0.4799          | 0.8433   | 0.8548            | 0.9418         | 0.8489            | 0.9282         | 0.8222            | 0.9088         | 0.8474              | 0.9265           |
-| 0.1467        | 3.9958 | 478  | 0.5580          | 0.8541   | 0.8533            | 0.9414         | 0.8622            | 0.9297         | 0.8519            | 0.9198         | 0.8489              | 0.9290           |
-| 0.0439        | 4.9739 | 595  | 0.7419          | 0.8563   | 0.8533            | 0.9416         | 0.8563            | 0.9319         | 0.8593            | 0.9223         | 0.8563              | 0.9306           |
 ### Framework versions

 metrics:
 - accuracy
 model-index:
+- name: pythia-1_4b-deduped-measurement_pred-generated_stories
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# pythia-1_4b-deduped-measurement_pred-generated_stories
 This model is a fine-tuned version of [EleutherAI/pythia-1.4b-deduped](https://huggingface.co/EleutherAI/pythia-1.4b-deduped) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.6829
+- Accuracy: 0.8422
+- Accuracy Sensor 0: 0.8474
+- Auroc Sensor 0: 0.9408
+- Accuracy Sensor 1: 0.8474
+- Auroc Sensor 1: 0.9248
+- Accuracy Sensor 2: 0.8370
+- Auroc Sensor 2: 0.9156
+- Accuracy Aggregated: 0.8370
+- Auroc Aggregated: 0.9304
 ## Model description
 | Training Loss | Epoch  | Step | Validation Loss | Accuracy | Accuracy Sensor 0 | Auroc Sensor 0 | Accuracy Sensor 1 | Auroc Sensor 1 | Accuracy Sensor 2 | Auroc Sensor 2 | Accuracy Aggregated | Auroc Aggregated |
 |:-------------:|:------:|:----:|:---------------:|:--------:|:-----------------:|:--------------:|:-----------------:|:--------------:|:-----------------:|:--------------:|:-------------------:|:----------------:|
+| No log        | 0.9948 | 119  | 0.5074          | 0.7130   | 0.7793            | 0.9063         | 0.6785            | 0.8889         | 0.7111            | 0.8917         | 0.6830              | 0.9080           |
+| 0.606         | 1.9979 | 239  | 0.3942          | 0.8100   | 0.8178            | 0.9165         | 0.8252            | 0.8992         | 0.7956            | 0.9077         | 0.8015              | 0.9178           |
+| 0.368         | 2.9927 | 358  | 0.4412          | 0.8304   | 0.8400            | 0.9369         | 0.8237            | 0.9237         | 0.8252            | 0.9129         | 0.8326              | 0.9235           |
+| 0.1675        | 3.9958 | 478  | 0.5589          | 0.8474   | 0.8533            | 0.9411         | 0.8474            | 0.9284         | 0.8400            | 0.9112         | 0.8489              | 0.9296           |
+| 0.0537        | 4.9739 | 595  | 0.6829          | 0.8422   | 0.8474            | 0.9408         | 0.8474            | 0.9248         | 0.8370            | 0.9156         | 0.8370              | 0.9304           |
 ### Framework versions

config.json CHANGED Viewed

@@ -25,7 +25,6 @@
   "n_sensors": 3,
   "num_attention_heads": 16,
   "num_hidden_layers": 24,
-  "pad_token_id": 50277,
   "rope_scaling": null,
   "rotary_emb_base": 10000,
   "rotary_pct": 0.25,
@@ -36,7 +35,6 @@
   "tie_word_embeddings": false,
   "torch_dtype": "float32",
   "transformers_version": "4.41.0",
-  "use_aggregated": true,
   "use_cache": false,
   "use_parallel_residual": true,
   "vocab_size": 50304

   "n_sensors": 3,
   "num_attention_heads": 16,
   "num_hidden_layers": 24,
   "rope_scaling": null,
   "rotary_emb_base": 10000,
   "rotary_pct": 0.25,
   "tie_word_embeddings": false,
   "torch_dtype": "float32",
   "transformers_version": "4.41.0",
   "use_cache": false,
   "use_parallel_residual": true,
   "vocab_size": 50304

configuration_measurement_pred.py CHANGED Viewed

@@ -7,7 +7,6 @@ class MeasurementPredictorConfig(PretrainedConfig):
         sensor_token=" omit",
         sensor_loc_type="locs_from_token",
         n_sensors=3,
-        use_aggregated=True,
         sensors_weight = 0.7,
         aggregate_weight=0.3,
         **kwargs
@@ -15,7 +14,6 @@ class MeasurementPredictorConfig(PretrainedConfig):
         self.sensor_token = sensor_token
         self.sensor_loc_type = sensor_loc_type
         self.n_sensors = n_sensors
-        self.use_aggregated = use_aggregated
         self.sensors_weight = sensors_weight
         self.aggregate_weight = aggregate_weight
         super().__init__(**kwargs)

         sensor_token=" omit",
         sensor_loc_type="locs_from_token",
         n_sensors=3,
         sensors_weight = 0.7,
         aggregate_weight=0.3,
         **kwargs
         self.sensor_token = sensor_token
         self.sensor_loc_type = sensor_loc_type
         self.n_sensors = n_sensors
         self.sensors_weight = sensors_weight
         self.aggregate_weight = aggregate_weight
         super().__init__(**kwargs)

logs/events.out.tfevents.1734712122.sac.ist.berkeley.edu.769282.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:50a749ca4e5ea066cf649566551c642485bd190d4160a0f9016acc8e00efae45
+size 10287

model-00001-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:61d5e9510a7578fb80e266a7e5b0bbbfe47715526e8e790c1945a0bb172410ca
 size 4978000256

 version https://git-lfs.github.com/spec/v1
+oid sha256:3383bea28d5e40a4af2443f4c670650bbabcbacecbbc5aa59bc00c1e26c43da7
 size 4978000256

model-00002-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3de1edbac73e2144cd10cdd260b51f9aec52938edd4a16e87a2523d1ff7b7fa1
 size 268568360

 version https://git-lfs.github.com/spec/v1
+oid sha256:0d838e481da12b787afbd911a2e7670a415f79ec84cac70c2f00bd8941695a84
 size 268568360

modeling_gpt_neox_measurement_pred.py CHANGED Viewed

@@ -1,5 +1,5 @@
 from transformers.models.gpt_neox import GPTNeoXPreTrainedModel, GPTNeoXModel
 from .modeling_measurement_pred import MeasurementPredictorMixin
 from .configuration_gpt_neox_measurement_pred import GPTNeoXMeasurementPredictorConfig
@@ -9,4 +9,7 @@ class GPTNeoXMeasurementPredictor(GPTNeoXPreTrainedModel, MeasurementPredictorMi
     def __init__(self, config):
         super().__init__(config)
         self.gpt_neox = GPTNeoXModel(config)
-        self.post_init()

 from transformers.models.gpt_neox import GPTNeoXPreTrainedModel, GPTNeoXModel
+from transformers import PreTrainedTokenizerBase
 from .modeling_measurement_pred import MeasurementPredictorMixin
 from .configuration_gpt_neox_measurement_pred import GPTNeoXMeasurementPredictorConfig
     def __init__(self, config):
         super().__init__(config)
         self.gpt_neox = GPTNeoXModel(config)
+        self.post_init()
+    def set_pad_token(self, tokenizer: PreTrainedTokenizerBase):
+        tokenizer.add_special_tokens({"pad_token": "[PAD]"})

modeling_measurement_pred.py CHANGED Viewed

@@ -1,4 +1,5 @@
 from typing import Optional, Tuple, Union
 import torch
 from torch.nn import BCEWithLogitsLoss
@@ -20,16 +21,18 @@ class MeasurementPredictorMixin(PreTrainedModel):
         self.sensor_probes = torch.nn.ModuleList([
             torch.nn.Linear(config.emb_dim, 1) for _ in range(config.n_sensors)
         ])
-        self.use_aggregated = config.use_aggregated
-        if config.use_aggregated:
-            self.aggregate_probe = torch.nn.Linear(config.emb_dim, 1)
         self.sensors_weight = config.sensors_weight
         self.aggregate_weight = config.aggregate_weight
-        self.get_sensor_locs: SensorLocFinder = None
     def init_sensor_loc_finder(self, tokenizer: PreTrainedTokenizerBase):
-        self.get_sensor_locs = SENSOR_LOC_REGISTRY[self.sensor_loc_type](
             tokenizer, sensor_token=self.sensor_token, n_sensors=self.n_sensors
         )
@@ -67,28 +70,27 @@ class MeasurementPredictorMixin(PreTrainedModel):
             output_hidden_states=output_hidden_states,
             return_dict=return_dict,
         )
-        sensor_locs = self.get_sensor_locs(input_ids)
         sensor_embs = base_model_output.last_hidden_state.gather(
             1, sensor_locs.unsqueeze(-1).expand(-1, -1, self.config.emb_dim)
         )
-        assert sensor_embs.shape == (input_ids.shape[0], self.n_sensors, self.config.emb_dim), f"{sensor_embs.shape} != {(input_ids.shape[0], self.n_sensors, self.config.emb_dim)}"
         sensor_logits = torch.concat([self.sensor_probes[i](sensor_embs[:, i, :])
                                for i in range(self.n_sensors)], dim=-1)
-        logits = sensor_logits
-        if self.use_aggregated:
-            last_emb = base_model_output.last_hidden_state[:, -1, :]
-            aggregate_logits = self.aggregate_probe(last_emb)
-            logits = torch.concat([logits, aggregate_logits], dim=-1)
         loss = None
         if labels is not None:
             loss_fct = BCEWithLogitsLoss()
-            sensor_loss = loss_fct(sensor_logits, labels[:, :self.n_sensors]) * self.sensors_weight
             loss = sensor_loss
-            if self.use_aggregated: #TOOD: should be use aggregate
-                aggregate_loss = loss_fct(aggregate_logits, labels[:, -1:]) * self.aggregate_weight
-                loss += aggregate_loss
         if not return_dict:
             output = (logits, ) + base_model_output[1:]

 from typing import Optional, Tuple, Union
+from abc import abstractmethod
 import torch
 from torch.nn import BCEWithLogitsLoss
         self.sensor_probes = torch.nn.ModuleList([
             torch.nn.Linear(config.emb_dim, 1) for _ in range(config.n_sensors)
         ])
+        self.aggregate_probe = torch.nn.Linear(config.emb_dim, 1)
         self.sensors_weight = config.sensors_weight
         self.aggregate_weight = config.aggregate_weight
+        self.find_sensor_locs: SensorLocFinder = None
+    @abstractmethod
+    def set_pad_token(self, tokenizer: PreTrainedTokenizerBase):
+        pass
     def init_sensor_loc_finder(self, tokenizer: PreTrainedTokenizerBase):
+        self.find_sensor_locs = SENSOR_LOC_REGISTRY[self.sensor_loc_type](
             tokenizer, sensor_token=self.sensor_token, n_sensors=self.n_sensors
         )
             output_hidden_states=output_hidden_states,
             return_dict=return_dict,
         )
+        # get sensor embeddings (including aggregate)
+        sensor_locs = self.find_sensor_locs(input_ids)
         sensor_embs = base_model_output.last_hidden_state.gather(
             1, sensor_locs.unsqueeze(-1).expand(-1, -1, self.config.emb_dim)
         )
+        assert sensor_embs.shape == (input_ids.shape[0], self.n_sensors + 1, self.config.emb_dim), sensor_embs.shape
+        # get sensor and aggregate logits
         sensor_logits = torch.concat([self.sensor_probes[i](sensor_embs[:, i, :])
                                for i in range(self.n_sensors)], dim=-1)
+        aggregate_logits = self.aggregate_probe(sensor_embs[:, -1, :])
+        logits = torch.concat([sensor_logits, aggregate_logits], dim=-1)
+        # compute loss
         loss = None
         if labels is not None:
             loss_fct = BCEWithLogitsLoss()
+            sensor_loss = loss_fct(sensor_logits[:, :self.n_sensors], labels[:, :self.n_sensors]) * self.sensors_weight
             loss = sensor_loss
+            aggregate_loss = loss_fct(aggregate_logits, labels[:, -1:]) * self.aggregate_weight
+            loss += aggregate_loss
         if not return_dict:
             output = (logits, ) + base_model_output[1:]

sensor_loc_stories.py CHANGED Viewed

@@ -26,6 +26,8 @@ class StoriesSensorLocFinder(SensorLocFinder):
             torch.argmax(eqs.to(torch.uint8), dim=-2),
             input_ids.shape[-1] - 3,
         ).clamp(max=input_ids.shape[-1] - 3)
         return locs

             torch.argmax(eqs.to(torch.uint8), dim=-2),
             input_ids.shape[-1] - 3,
         ).clamp(max=input_ids.shape[-1] - 3)
+        aggregate_sensor_loc = locs[:, -1].unsqueeze(1)
+        locs = torch.cat([locs, aggregate_sensor_loc], dim=1)
         return locs

sensor_locs_from_token.py CHANGED Viewed

@@ -13,4 +13,6 @@ class SensorLocFinderFromToken(SensorLocFinder):
     def find_sensor_locs(self, input_ids: torch.Tensor) -> torch.Tensor:
         flat_sensor_token_idxs = (input_ids == self.sensor_token_id).nonzero(as_tuple=True)[1]
         sensor_token_idxs = flat_sensor_token_idxs.view(-1, self.n_sensors)
         return sensor_token_idxs

     def find_sensor_locs(self, input_ids: torch.Tensor) -> torch.Tensor:
         flat_sensor_token_idxs = (input_ids == self.sensor_token_id).nonzero(as_tuple=True)[1]
         sensor_token_idxs = flat_sensor_token_idxs.view(-1, self.n_sensors)
+        aggregate_sensor_token_idx = sensor_token_idxs[:, -1].unsqueeze(1)
+        sensor_token_idxs = torch.cat([sensor_token_idxs, aggregate_sensor_token_idx], dim=1)
         return sensor_token_idxs

train.log CHANGED Viewed

	@@ -1 +1 @@
1	- [2024-12-17 03:29:37,~~346~~][accelerate.utils.other][WARNING] - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


1	+ [2024-12-20 16:28:39,973][accelerate.utils.other][WARNING] - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:719c4552b3adb3bf3b788391f0962f6f1df87a6ec616e819e0ee1114d25946a7
 size 5112

 version https://git-lfs.github.com/spec/v1
+oid sha256:24298721d6c9062b1c51d5bfdf6167e9932ab9eeebebbe68472c2e6d0db2e09d
 size 5112