Spaces:
Runtime error
Runtime error
Commit
·
81885f7
1
Parent(s):
b55d469
link to colab
Browse files- static/tabs.html +10 -3
static/tabs.html
CHANGED
@@ -89,7 +89,6 @@ a:visited {
|
|
89 |
is minimal.
|
90 |
The combination of offloading and 8-bit optimizers means that we conserve GPU memory (0 bytes per parameter)
|
91 |
and also use only a limited amount of CPU memory (2 bytes per parameter).
|
92 |
-
|
93 |
</p>
|
94 |
<p>
|
95 |
<b>Dataset Streaming</b>
|
@@ -98,12 +97,20 @@ a:visited {
|
|
98 |
This can pose a significant problem, as most desktop and cheap cloud instance simply do not have that much space.
|
99 |
Furthermore, downloading the dataset over the internet would take up hours before one can even begin training.
|
100 |
<!--Changing the dataset means downloading a new dataset in full and using additional disk space.-->
|
101 |
-
</p
|
|
|
102 |
To circumvent these problems, we stream the training dataset in the same way as you stream online videos.
|
103 |
Participants download a small random portion of the training dataset and immediately begin training on it,
|
104 |
while additional data is loaded in background. As such, we can train a model with virtually no memory
|
105 |
-
overhead from the dataset and switching to a new dataset is as simple as changing an argument to the
|
106 |
</p>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
107 |
</div>
|
108 |
<div role="tabpanel" class="tab-pane" id="tab2">
|
109 |
<p>In this section, we discuss common concerns related to security of the collaborative training.</p>
|
|
|
89 |
is minimal.
|
90 |
The combination of offloading and 8-bit optimizers means that we conserve GPU memory (0 bytes per parameter)
|
91 |
and also use only a limited amount of CPU memory (2 bytes per parameter).
|
|
|
92 |
</p>
|
93 |
<p>
|
94 |
<b>Dataset Streaming</b>
|
|
|
97 |
This can pose a significant problem, as most desktop and cheap cloud instance simply do not have that much space.
|
98 |
Furthermore, downloading the dataset over the internet would take up hours before one can even begin training.
|
99 |
<!--Changing the dataset means downloading a new dataset in full and using additional disk space.-->
|
100 |
+
</p>
|
101 |
+
<p>
|
102 |
To circumvent these problems, we stream the training dataset in the same way as you stream online videos.
|
103 |
Participants download a small random portion of the training dataset and immediately begin training on it,
|
104 |
while additional data is loaded in background. As such, we can train a model with virtually no memory
|
105 |
+
overhead from the dataset and switching to a new dataset is as simple as changing an argument to the dataset class.
|
106 |
</p>
|
107 |
+
<center>
|
108 |
+
Here's a tutorial for using these techniques:<br>
|
109 |
+
<a href="https://colab.research.google.com/gist/justheuristic/75f6a2a731f05a213a55cd2c8a458aaf/fine-tune-a-language-model-with-dataset-streaming-and-8-bit-optimizers.ipynb">
|
110 |
+
<img src="https://colab.research.google.com/assets/colab-badge.svg" width=360px>
|
111 |
+
</a>
|
112 |
+
</center>
|
113 |
+
|
114 |
</div>
|
115 |
<div role="tabpanel" class="tab-pane" id="tab2">
|
116 |
<p>In this section, we discuss common concerns related to security of the collaborative training.</p>
|