Spaces:

training-transformers-together
/

dashboard-embedded

Runtime error

App Files Files Community

justheuristic commited on Dec 6, 2021

Commit

795dc75

1 Parent(s): 81885f7

open in new tab

Browse files

Files changed (1) hide show

static/tabs.html +20 -19

static/tabs.html CHANGED Viewed

@@ -93,7 +93,8 @@ a:visited {
                 <p>
                     <b>Dataset Streaming</b>
                     Usually data is stored on disk and needs to be fully or partially loaded into CPU memory to be used for training.
-                    Large datasets used for pre-training measure in <a href="https://arxiv.org/abs/2101.00027">hundreds of gigabytes</a> or even <a href="https://laion.ai/laion-400-open-dataset/">terabytes</a>.
                     This can pose a significant problem, as most desktop and cheap cloud instance simply do not have that much space.
                     Furthermore, downloading the dataset over the internet would take up hours before one can even begin training.
                     <!--Changing the dataset means downloading a new dataset in full and using additional disk space.-->
@@ -106,7 +107,7 @@ a:visited {
                 </p>
                 <center>
                     Here's a tutorial for using these techniques:<br>
-                    <a href="https://colab.research.google.com/gist/justheuristic/75f6a2a731f05a213a55cd2c8a458aaf/fine-tune-a-language-model-with-dataset-streaming-and-8-bit-optimizers.ipynb">
                         <img src="https://colab.research.google.com/assets/colab-badge.svg" width=360px>
                     </a>
                 </center>
@@ -159,7 +160,7 @@ a:visited {
                 <li>
                 <p>
                     Another defense is replacing the naive averaging of the peers' gradients with an <b>aggregation technique robust to outliers</b>.
-                    <a href="https://arxiv.org/abs/2012.10333">Karimireddy et al. (2020)</a>
                     suggested such a technique (named CenteredClip) and proved that it does not significantly affect the model's convergence.
                     </p>
@@ -172,7 +173,7 @@ a:visited {
                     </p>
                     <p>
-                    Recently, <a href="https://arxiv.org/abs/2106.11257">Gorbunov et al. (2021)</a>
                     proposed a robust aggregation protocol for decentralized systems that does not require this assumption.
                     This protocol uses CenteredClip as a subroutine but is able to detect and ban participants who performed it incorrectly.
                     </p>
@@ -182,54 +183,54 @@ a:visited {
             <div role="tabpanel" class="tab-pane" id="tab3">
                 <p>In this section, we provide a roadmap for you to run the collaborative training yourself.</p>
                 <p>
-                    <b>Got confused?</b> Feel free to ask any questions at our <a href="https://discord.gg/uGugx9zYvN">Discord</a>!
                 </p>
                 <ol>
                     <li>
                         Set up dataset streaming:
                         <ul>
                             <li>
-                                <a href="https://huggingface.co/docs/datasets/share_dataset.html">Upload</a> your dataset to Hugging Face Hub
-                                in a streaming-friendly format (<a href="https://huggingface.co/datasets/laion/laion_100m_vqgan_f8">example</a>).
                             </li>
                             <li>Set up dataset streaming (see the "Efficient Training" section).</li>
                         </ul>
                     </li>
                     <li>
-                        Write code of training peers (<a href="https://github.com/learning-at-home/dalle-hivemind/blob/main/run_trainer.py">example</a>):
                         <ul>
                             <li>Implement your model, set up dataset streaming, and write the training loop.</li>
                             <li>
                                 Get familiar with the hivemind library
-                                (e.g., via the <a href="https://learning-at-home.readthedocs.io/en/latest/user/quickstart.html">quickstart</a>).
                             </li>
                             <li>
                                 In the training loop, wrap up your PyTorch optimizer with
-                                <a href="https://learning-at-home.readthedocs.io/en/latest/modules/optim.html#hivemind.optim.experimental.optimizer.Optimizer">hivemind.Optimizer</a>
-                                (<a href="https://github.com/learning-at-home/dalle-hivemind/blob/main/task.py#L121">example</a>).
                             </li>
                         </ul>
                     </li>
                     <li>
-                        <b>(optional)</b> Write code of auxiliary peers (<a href="https://github.com/learning-at-home/dalle-hivemind/blob/main/run_aux_peer.py">example</a>):
                         <ul>
                             <li>
                                 Auxiliary peers a special kind of peers responsible for
-                                logging loss and other metrics (e.g., to <a href="https://wandb.ai/">Weights & Biases</a>)
-                                and uploading model checkpoints (e.g., to <a href="https://huggingface.co/docs/transformers/model_sharing">Hugging Face Hub</a>).
                             </li>
                             <li>
                                 Such peers don't need to calculate gradients and may be run on cheap machines without GPUs.
                             </li>
                             <li>
                                 They can serve as a convenient entry point to
-                                <a href="https://learning-at-home.readthedocs.io/en/latest/modules/dht.html">hivemind.DHT</a>
                                 (i.e., their address can be specified as <code>initial_peers</code>).
                             </li>
                             <li>
                                 It is useful to fix their address by providing <code>host_maddrs</code> and <code>identity_path</code>
                                 arguments to <code>hivemind.DHT</code>
-                                (these are forwarded to the underlying <a href="https://libp2p.io/">libp2p</a> daemon).
                             </li>
                         </ul>
                     </li>
@@ -241,10 +242,10 @@ a:visited {
                                 People may run them online and/or download and run them on their own hardware.
                             </li>
                             <li>
-                                <a href="https://huggingface.co/organizations/new">Create</a> a Hugging Face organization
                                 with all resources related to the training
                                 (dataset, model, inference demo, links to a dashboard with loss and other metrics, etc.).
-                                Look at <a href="https://huggingface.co/training-transformers-together">ours</a> as an example.
                             </li>
                             <li>
                                 Set up an authentication system (see the "Security" section).
@@ -255,7 +256,7 @@ a:visited {
                                 ban accounts who behave maliciously.
                             </li>
                             <li>
-                                Set up an inference demo for your model (e.g., using <a href="https://huggingface.co/spaces">Spaces</a>) or
                                 a script that periodically uploads the inference results to show the training progress.
                             </li>
                         </ul>

                 <p>
                     <b>Dataset Streaming</b>
                     Usually data is stored on disk and needs to be fully or partially loaded into CPU memory to be used for training.
+                    Large datasets used for pre-training measure in <a target="_blank" rel="noopener noreferrer" href="https://arxiv.org/abs/2101.00027">hundreds of gigabytes</a>
+                    or even <a target="_blank" rel="noopener noreferrer" href="https://laion.ai/laion-400-open-dataset/">terabytes</a>.
                     This can pose a significant problem, as most desktop and cheap cloud instance simply do not have that much space.
                     Furthermore, downloading the dataset over the internet would take up hours before one can even begin training.
                     <!--Changing the dataset means downloading a new dataset in full and using additional disk space.-->
                 </p>
                 <center>
                     Here's a tutorial for using these techniques:<br>
+                    <a target="_blank" rel="noopener noreferrer" href="https://colab.research.google.com/gist/justheuristic/75f6a2a731f05a213a55cd2c8a458aaf/fine-tune-a-language-model-with-dataset-streaming-and-8-bit-optimizers.ipynb">
                         <img src="https://colab.research.google.com/assets/colab-badge.svg" width=360px>
                     </a>
                 </center>
                 <li>
                 <p>
                     Another defense is replacing the naive averaging of the peers' gradients with an <b>aggregation technique robust to outliers</b>.
+                    <a target="_blank" rel="noopener noreferrer" href="https://arxiv.org/abs/2012.10333">Karimireddy et al. (2020)</a>
                     suggested such a technique (named CenteredClip) and proved that it does not significantly affect the model's convergence.
                     </p>
                     </p>
                     <p>
+                    Recently, <a target="_blank" rel="noopener noreferrer" href="https://arxiv.org/abs/2106.11257">Gorbunov et al. (2021)</a>
                     proposed a robust aggregation protocol for decentralized systems that does not require this assumption.
                     This protocol uses CenteredClip as a subroutine but is able to detect and ban participants who performed it incorrectly.
                     </p>
             <div role="tabpanel" class="tab-pane" id="tab3">
                 <p>In this section, we provide a roadmap for you to run the collaborative training yourself.</p>
                 <p>
+                    <b>Got confused?</b> Feel free to ask any questions at our <a target="_blank" rel="noopener noreferrer" href="https://discord.gg/uGugx9zYvN">Discord</a>!
                 </p>
                 <ol>
                     <li>
                         Set up dataset streaming:
                         <ul>
                             <li>
+                                <a target="_blank" rel="noopener noreferrer" href="https://huggingface.co/docs/datasets/share_dataset.html">Upload</a> your dataset to Hugging Face Hub
+                                in a streaming-friendly format (<a target="_blank" rel="noopener noreferrer" href="https://huggingface.co/datasets/laion/laion_100m_vqgan_f8">example</a>).
                             </li>
                             <li>Set up dataset streaming (see the "Efficient Training" section).</li>
                         </ul>
                     </li>
                     <li>
+                        Write code of training peers (<a target="_blank" rel="noopener noreferrer" href="https://github.com/learning-at-home/dalle-hivemind/blob/main/run_trainer.py">example</a>):
                         <ul>
                             <li>Implement your model, set up dataset streaming, and write the training loop.</li>
                             <li>
                                 Get familiar with the hivemind library
+                                (e.g., via the <a target="_blank" rel="noopener noreferrer" href="https://learning-at-home.readthedocs.io/en/latest/user/quickstart.html">quickstart</a>).
                             </li>
                             <li>
                                 In the training loop, wrap up your PyTorch optimizer with
+                                <a target="_blank" rel="noopener noreferrer" href="https://learning-at-home.readthedocs.io/en/latest/modules/optim.html#hivemind.optim.experimental.optimizer.Optimizer">hivemind.Optimizer</a>
+                                (<a target="_blank" rel="noopener noreferrer" href="https://github.com/learning-at-home/dalle-hivemind/blob/main/task.py#L121">example</a>).
                             </li>
                         </ul>
                     </li>
                     <li>
+                        <b>(optional)</b> Write code of auxiliary peers (<a target="_blank" rel="noopener noreferrer" href="https://github.com/learning-at-home/dalle-hivemind/blob/main/run_aux_peer.py">example</a>):
                         <ul>
                             <li>
                                 Auxiliary peers a special kind of peers responsible for
+                                logging loss and other metrics (e.g., to <a target="_blank" rel="noopener noreferrer" href="https://wandb.ai/">Weights & Biases</a>)
+                                and uploading model checkpoints (e.g., to <a target="_blank" rel="noopener noreferrer" href="https://huggingface.co/docs/transformers/model_sharing">Hugging Face Hub</a>).
                             </li>
                             <li>
                                 Such peers don't need to calculate gradients and may be run on cheap machines without GPUs.
                             </li>
                             <li>
                                 They can serve as a convenient entry point to
+                                <a target="_blank" rel="noopener noreferrer" href="https://learning-at-home.readthedocs.io/en/latest/modules/dht.html">hivemind.DHT</a>
                                 (i.e., their address can be specified as <code>initial_peers</code>).
                             </li>
                             <li>
                                 It is useful to fix their address by providing <code>host_maddrs</code> and <code>identity_path</code>
                                 arguments to <code>hivemind.DHT</code>
+                                (these are forwarded to the underlying <a target="_blank" rel="noopener noreferrer" href="https://libp2p.io/">libp2p</a> daemon).
                             </li>
                         </ul>
                     </li>
                                 People may run them online and/or download and run them on their own hardware.
                             </li>
                             <li>
+                                <a target="_blank" rel="noopener noreferrer" href="https://huggingface.co/organizations/new">Create</a> a Hugging Face organization
                                 with all resources related to the training
                                 (dataset, model, inference demo, links to a dashboard with loss and other metrics, etc.).
+                                Look at <a target="_blank" rel="noopener noreferrer" href="https://huggingface.co/training-transformers-together">ours</a> as an example.
                             </li>
                             <li>
                                 Set up an authentication system (see the "Security" section).
                                 ban accounts who behave maliciously.
                             </li>
                             <li>
+                                Set up an inference demo for your model (e.g., using <a target="_blank" rel="noopener noreferrer" href="https://huggingface.co/spaces">Spaces</a>) or
                                 a script that periodically uploads the inference results to show the training progress.
                             </li>
                         </ul>