justheuristic commited on
Commit
795dc75
·
1 Parent(s): 81885f7

open in new tab

Browse files
Files changed (1) hide show
  1. static/tabs.html +20 -19
static/tabs.html CHANGED
@@ -93,7 +93,8 @@ a:visited {
93
  <p>
94
  <b>Dataset Streaming</b>
95
  Usually data is stored on disk and needs to be fully or partially loaded into CPU memory to be used for training.
96
- Large datasets used for pre-training measure in <a href="https://arxiv.org/abs/2101.00027">hundreds of gigabytes</a> or even <a href="https://laion.ai/laion-400-open-dataset/">terabytes</a>.
 
97
  This can pose a significant problem, as most desktop and cheap cloud instance simply do not have that much space.
98
  Furthermore, downloading the dataset over the internet would take up hours before one can even begin training.
99
  <!--Changing the dataset means downloading a new dataset in full and using additional disk space.-->
@@ -106,7 +107,7 @@ a:visited {
106
  </p>
107
  <center>
108
  Here's a tutorial for using these techniques:<br>
109
- <a href="https://colab.research.google.com/gist/justheuristic/75f6a2a731f05a213a55cd2c8a458aaf/fine-tune-a-language-model-with-dataset-streaming-and-8-bit-optimizers.ipynb">
110
  <img src="https://colab.research.google.com/assets/colab-badge.svg" width=360px>
111
  </a>
112
  </center>
@@ -159,7 +160,7 @@ a:visited {
159
  <li>
160
  <p>
161
  Another defense is replacing the naive averaging of the peers' gradients with an <b>aggregation technique robust to outliers</b>.
162
- <a href="https://arxiv.org/abs/2012.10333">Karimireddy et al. (2020)</a>
163
  suggested such a technique (named CenteredClip) and proved that it does not significantly affect the model's convergence.
164
  </p>
165
 
@@ -172,7 +173,7 @@ a:visited {
172
  </p>
173
 
174
  <p>
175
- Recently, <a href="https://arxiv.org/abs/2106.11257">Gorbunov et al. (2021)</a>
176
  proposed a robust aggregation protocol for decentralized systems that does not require this assumption.
177
  This protocol uses CenteredClip as a subroutine but is able to detect and ban participants who performed it incorrectly.
178
  </p>
@@ -182,54 +183,54 @@ a:visited {
182
  <div role="tabpanel" class="tab-pane" id="tab3">
183
  <p>In this section, we provide a roadmap for you to run the collaborative training yourself.</p>
184
  <p>
185
- <b>Got confused?</b> Feel free to ask any questions at our <a href="https://discord.gg/uGugx9zYvN">Discord</a>!
186
  </p>
187
  <ol>
188
  <li>
189
  Set up dataset streaming:
190
  <ul>
191
  <li>
192
- <a href="https://huggingface.co/docs/datasets/share_dataset.html">Upload</a> your dataset to Hugging Face Hub
193
- in a streaming-friendly format (<a href="https://huggingface.co/datasets/laion/laion_100m_vqgan_f8">example</a>).
194
  </li>
195
  <li>Set up dataset streaming (see the "Efficient Training" section).</li>
196
  </ul>
197
  </li>
198
  <li>
199
- Write code of training peers (<a href="https://github.com/learning-at-home/dalle-hivemind/blob/main/run_trainer.py">example</a>):
200
  <ul>
201
  <li>Implement your model, set up dataset streaming, and write the training loop.</li>
202
  <li>
203
  Get familiar with the hivemind library
204
- (e.g., via the <a href="https://learning-at-home.readthedocs.io/en/latest/user/quickstart.html">quickstart</a>).
205
  </li>
206
  <li>
207
  In the training loop, wrap up your PyTorch optimizer with
208
- <a href="https://learning-at-home.readthedocs.io/en/latest/modules/optim.html#hivemind.optim.experimental.optimizer.Optimizer">hivemind.Optimizer</a>
209
- (<a href="https://github.com/learning-at-home/dalle-hivemind/blob/main/task.py#L121">example</a>).
210
  </li>
211
  </ul>
212
  </li>
213
  <li>
214
- <b>(optional)</b> Write code of auxiliary peers (<a href="https://github.com/learning-at-home/dalle-hivemind/blob/main/run_aux_peer.py">example</a>):
215
  <ul>
216
  <li>
217
  Auxiliary peers a special kind of peers responsible for
218
- logging loss and other metrics (e.g., to <a href="https://wandb.ai/">Weights & Biases</a>)
219
- and uploading model checkpoints (e.g., to <a href="https://huggingface.co/docs/transformers/model_sharing">Hugging Face Hub</a>).
220
  </li>
221
  <li>
222
  Such peers don't need to calculate gradients and may be run on cheap machines without GPUs.
223
  </li>
224
  <li>
225
  They can serve as a convenient entry point to
226
- <a href="https://learning-at-home.readthedocs.io/en/latest/modules/dht.html">hivemind.DHT</a>
227
  (i.e., their address can be specified as <code>initial_peers</code>).
228
  </li>
229
  <li>
230
  It is useful to fix their address by providing <code>host_maddrs</code> and <code>identity_path</code>
231
  arguments to <code>hivemind.DHT</code>
232
- (these are forwarded to the underlying <a href="https://libp2p.io/">libp2p</a> daemon).
233
  </li>
234
  </ul>
235
  </li>
@@ -241,10 +242,10 @@ a:visited {
241
  People may run them online and/or download and run them on their own hardware.
242
  </li>
243
  <li>
244
- <a href="https://huggingface.co/organizations/new">Create</a> a Hugging Face organization
245
  with all resources related to the training
246
  (dataset, model, inference demo, links to a dashboard with loss and other metrics, etc.).
247
- Look at <a href="https://huggingface.co/training-transformers-together">ours</a> as an example.
248
  </li>
249
  <li>
250
  Set up an authentication system (see the "Security" section).
@@ -255,7 +256,7 @@ a:visited {
255
  ban accounts who behave maliciously.
256
  </li>
257
  <li>
258
- Set up an inference demo for your model (e.g., using <a href="https://huggingface.co/spaces">Spaces</a>) or
259
  a script that periodically uploads the inference results to show the training progress.
260
  </li>
261
  </ul>
 
93
  <p>
94
  <b>Dataset Streaming</b>
95
  Usually data is stored on disk and needs to be fully or partially loaded into CPU memory to be used for training.
96
+ Large datasets used for pre-training measure in <a target="_blank" rel="noopener noreferrer" href="https://arxiv.org/abs/2101.00027">hundreds of gigabytes</a>
97
+ or even <a target="_blank" rel="noopener noreferrer" href="https://laion.ai/laion-400-open-dataset/">terabytes</a>.
98
  This can pose a significant problem, as most desktop and cheap cloud instance simply do not have that much space.
99
  Furthermore, downloading the dataset over the internet would take up hours before one can even begin training.
100
  <!--Changing the dataset means downloading a new dataset in full and using additional disk space.-->
 
107
  </p>
108
  <center>
109
  Here's a tutorial for using these techniques:<br>
110
+ <a target="_blank" rel="noopener noreferrer" href="https://colab.research.google.com/gist/justheuristic/75f6a2a731f05a213a55cd2c8a458aaf/fine-tune-a-language-model-with-dataset-streaming-and-8-bit-optimizers.ipynb">
111
  <img src="https://colab.research.google.com/assets/colab-badge.svg" width=360px>
112
  </a>
113
  </center>
 
160
  <li>
161
  <p>
162
  Another defense is replacing the naive averaging of the peers' gradients with an <b>aggregation technique robust to outliers</b>.
163
+ <a target="_blank" rel="noopener noreferrer" href="https://arxiv.org/abs/2012.10333">Karimireddy et al. (2020)</a>
164
  suggested such a technique (named CenteredClip) and proved that it does not significantly affect the model's convergence.
165
  </p>
166
 
 
173
  </p>
174
 
175
  <p>
176
+ Recently, <a target="_blank" rel="noopener noreferrer" href="https://arxiv.org/abs/2106.11257">Gorbunov et al. (2021)</a>
177
  proposed a robust aggregation protocol for decentralized systems that does not require this assumption.
178
  This protocol uses CenteredClip as a subroutine but is able to detect and ban participants who performed it incorrectly.
179
  </p>
 
183
  <div role="tabpanel" class="tab-pane" id="tab3">
184
  <p>In this section, we provide a roadmap for you to run the collaborative training yourself.</p>
185
  <p>
186
+ <b>Got confused?</b> Feel free to ask any questions at our <a target="_blank" rel="noopener noreferrer" href="https://discord.gg/uGugx9zYvN">Discord</a>!
187
  </p>
188
  <ol>
189
  <li>
190
  Set up dataset streaming:
191
  <ul>
192
  <li>
193
+ <a target="_blank" rel="noopener noreferrer" href="https://huggingface.co/docs/datasets/share_dataset.html">Upload</a> your dataset to Hugging Face Hub
194
+ in a streaming-friendly format (<a target="_blank" rel="noopener noreferrer" href="https://huggingface.co/datasets/laion/laion_100m_vqgan_f8">example</a>).
195
  </li>
196
  <li>Set up dataset streaming (see the "Efficient Training" section).</li>
197
  </ul>
198
  </li>
199
  <li>
200
+ Write code of training peers (<a target="_blank" rel="noopener noreferrer" href="https://github.com/learning-at-home/dalle-hivemind/blob/main/run_trainer.py">example</a>):
201
  <ul>
202
  <li>Implement your model, set up dataset streaming, and write the training loop.</li>
203
  <li>
204
  Get familiar with the hivemind library
205
+ (e.g., via the <a target="_blank" rel="noopener noreferrer" href="https://learning-at-home.readthedocs.io/en/latest/user/quickstart.html">quickstart</a>).
206
  </li>
207
  <li>
208
  In the training loop, wrap up your PyTorch optimizer with
209
+ <a target="_blank" rel="noopener noreferrer" href="https://learning-at-home.readthedocs.io/en/latest/modules/optim.html#hivemind.optim.experimental.optimizer.Optimizer">hivemind.Optimizer</a>
210
+ (<a target="_blank" rel="noopener noreferrer" href="https://github.com/learning-at-home/dalle-hivemind/blob/main/task.py#L121">example</a>).
211
  </li>
212
  </ul>
213
  </li>
214
  <li>
215
+ <b>(optional)</b> Write code of auxiliary peers (<a target="_blank" rel="noopener noreferrer" href="https://github.com/learning-at-home/dalle-hivemind/blob/main/run_aux_peer.py">example</a>):
216
  <ul>
217
  <li>
218
  Auxiliary peers a special kind of peers responsible for
219
+ logging loss and other metrics (e.g., to <a target="_blank" rel="noopener noreferrer" href="https://wandb.ai/">Weights & Biases</a>)
220
+ and uploading model checkpoints (e.g., to <a target="_blank" rel="noopener noreferrer" href="https://huggingface.co/docs/transformers/model_sharing">Hugging Face Hub</a>).
221
  </li>
222
  <li>
223
  Such peers don't need to calculate gradients and may be run on cheap machines without GPUs.
224
  </li>
225
  <li>
226
  They can serve as a convenient entry point to
227
+ <a target="_blank" rel="noopener noreferrer" href="https://learning-at-home.readthedocs.io/en/latest/modules/dht.html">hivemind.DHT</a>
228
  (i.e., their address can be specified as <code>initial_peers</code>).
229
  </li>
230
  <li>
231
  It is useful to fix their address by providing <code>host_maddrs</code> and <code>identity_path</code>
232
  arguments to <code>hivemind.DHT</code>
233
+ (these are forwarded to the underlying <a target="_blank" rel="noopener noreferrer" href="https://libp2p.io/">libp2p</a> daemon).
234
  </li>
235
  </ul>
236
  </li>
 
242
  People may run them online and/or download and run them on their own hardware.
243
  </li>
244
  <li>
245
+ <a target="_blank" rel="noopener noreferrer" href="https://huggingface.co/organizations/new">Create</a> a Hugging Face organization
246
  with all resources related to the training
247
  (dataset, model, inference demo, links to a dashboard with loss and other metrics, etc.).
248
+ Look at <a target="_blank" rel="noopener noreferrer" href="https://huggingface.co/training-transformers-together">ours</a> as an example.
249
  </li>
250
  <li>
251
  Set up an authentication system (see the "Security" section).
 
256
  ban accounts who behave maliciously.
257
  </li>
258
  <li>
259
+ Set up an inference demo for your model (e.g., using <a target="_blank" rel="noopener noreferrer" href="https://huggingface.co/spaces">Spaces</a>) or
260
  a script that periodically uploads the inference results to show the training progress.
261
  </li>
262
  </ul>