nouamanetazi HF staff commited on
Commit
5ee9a01
·
1 Parent(s): 5c51b0b

bullet points

Browse files
Files changed (1) hide show
  1. src/index.html +4 -4
src/index.html CHANGED
@@ -90,8 +90,8 @@
90
  the template on which we based this blog post.</aside>
91
 
92
  <p>The book is built on the following <strong>three general foundations</strong>:</p>
93
- <!-- RH: I would set what follows, up to "As you can see...," as a three-item bullet list -->
94
- <p><strong>Quick intros on theory and concepts:</strong> Before diving into code and experiments, we want you to understand how each method works at a high level and what its advantages and limits are. For example, you’ll learn about which parts of a language model eat away at your memory, and when during training it happens. You’ll also learn how we can work around memory constraints by parallelizing the models and increase throughput by scaling up GPUs. As a result, you'll understand how the following widget to compute the memory breakdown of a Transformer model works: </p>
95
  <aside>Note that we're still missing pipeline parallelism in this widget. To be added as an exercise for the reader.</aside>
96
 
97
  <div class="large-image-background-transparent">
@@ -235,7 +235,7 @@
235
  </a>
236
 
237
 
238
- <p><strong>Clear code implementations:</strong> Theory is one thing, but we discover all kinds of edge cases and important details when we implement something. That’s why we link to implementation references where possible. Depending on the case, we’ll use two code references:</p>
239
 
240
  <ul>
241
  <li>
@@ -250,7 +250,7 @@
250
  </li>
251
  </ul>
252
 
253
- <aside>If you want to watch a video on distributed training rather than reading the blog or picotron code, check out <a href="https://www.youtube.com/watch?v=u2VSwDDpaBM&list=PL-_armZiJvAnhcRr6yTJ0__f3Oi-LLi9S">Ferdinand's YouTube channel</a>.</aside>
254
 
255
  <!-- <p><img alt="Picotron implements each key concept in a self-contained way, such that the method can be studied separately and in isolation." src="assets/images/placeholder.png" /></p> -->
256
 
 
90
  the template on which we based this blog post.</aside>
91
 
92
  <p>The book is built on the following <strong>three general foundations</strong>:</p>
93
+
94
+ <p><strong>1. Quick intros on theory and concepts:</strong> Before diving into code and experiments, we want you to understand how each method works at a high level and what its advantages and limits are. For example, you’ll learn about which parts of a language model eat away at your memory, and when during training it happens. You’ll also learn how we can work around memory constraints by parallelizing the models and increase throughput by scaling up GPUs. As a result, you'll understand how the following widget to compute the memory breakdown of a Transformer model works: </p>
95
  <aside>Note that we're still missing pipeline parallelism in this widget. To be added as an exercise for the reader.</aside>
96
 
97
  <div class="large-image-background-transparent">
 
235
  </a>
236
 
237
 
238
+ <p><strong>2. Clear code implementations:</strong> Theory is one thing, but we discover all kinds of edge cases and important details when we implement something. That’s why we link to implementation references where possible. Depending on the case, we’ll use two code references:</p>
239
 
240
  <ul>
241
  <li>
 
250
  </li>
251
  </ul>
252
 
253
+ <aside>3. If you want to watch a video on distributed training rather than reading the blog or picotron code, check out <a href="https://www.youtube.com/watch?v=u2VSwDDpaBM&list=PL-_armZiJvAnhcRr6yTJ0__f3Oi-LLi9S">Ferdinand's YouTube channel</a>.</aside>
254
 
255
  <!-- <p><img alt="Picotron implements each key concept in a self-contained way, such that the method can be studied separately and in isolation." src="assets/images/placeholder.png" /></p> -->
256