Spaces:

nanotron
/

ultrascale-playbook

Running

App Files Files Community

114

nouamanetazi HF Staff commited on Mar 11

Commit

5ee9a01

1 Parent(s): 5c51b0b

bullet points

Browse files

Files changed (1) hide show

src/index.html +4 -4

src/index.html CHANGED Viewed

@@ -90,8 +90,8 @@
             the template on which we based this blog post.</aside>
         <p>The book is built on the following <strong>three general foundations</strong>:</p>
-<!-- RH: I would set what follows, up to "As you can see...," as a three-item bullet list -->
-        <p><strong>Quick intros on theory and concepts:</strong> Before diving into code and experiments, we want you to understand how each method works at a high level and what its advantages and limits are. For example, you’ll learn about which parts of a language model eat away at your memory, and when during training it happens. You’ll also learn how we can work around memory constraints by parallelizing the models and increase throughput by scaling up GPUs. As a result, you'll understand how the following widget to compute the memory breakdown of a Transformer model works: </p>
         <aside>Note that we're still missing pipeline parallelism in this widget. To be added as an exercise for the reader.</aside>
         <div class="large-image-background-transparent">
@@ -235,7 +235,7 @@
         </a>
-        <p><strong>Clear code implementations:</strong> Theory is one thing, but we discover all kinds of edge cases and important details when we implement something. That’s why we link to implementation references where possible. Depending on the case, we’ll use two code references:</p>
         <ul>
           <li>
@@ -250,7 +250,7 @@
           </li>
         </ul>
-        <aside>If you want to watch a video on distributed training rather than reading the blog or picotron code, check out <a href="https://www.youtube.com/watch?v=u2VSwDDpaBM&list=PL-_armZiJvAnhcRr6yTJ0__f3Oi-LLi9S">Ferdinand's YouTube channel</a>.</aside>
         <!-- <p><img alt="Picotron implements each key concept in a self-contained way, such that the method can be studied separately and in isolation." src="assets/images/placeholder.png" /></p> -->

             the template on which we based this blog post.</aside>
         <p>The book is built on the following <strong>three general foundations</strong>:</p>
+        <p><strong>1. Quick intros on theory and concepts:</strong> Before diving into code and experiments, we want you to understand how each method works at a high level and what its advantages and limits are. For example, you’ll learn about which parts of a language model eat away at your memory, and when during training it happens. You’ll also learn how we can work around memory constraints by parallelizing the models and increase throughput by scaling up GPUs. As a result, you'll understand how the following widget to compute the memory breakdown of a Transformer model works: </p>
         <aside>Note that we're still missing pipeline parallelism in this widget. To be added as an exercise for the reader.</aside>
         <div class="large-image-background-transparent">
         </a>
+        <p><strong>2. Clear code implementations:</strong> Theory is one thing, but we discover all kinds of edge cases and important details when we implement something. That’s why we link to implementation references where possible. Depending on the case, we’ll use two code references:</p>
         <ul>
           <li>
           </li>
         </ul>
+        <aside>3. If you want to watch a video on distributed training rather than reading the blog or picotron code, check out <a href="https://www.youtube.com/watch?v=u2VSwDDpaBM&list=PL-_armZiJvAnhcRr6yTJ0__f3Oi-LLi9S">Ferdinand's YouTube channel</a>.</aside>
         <!-- <p><img alt="Picotron implements each key concept in a self-contained way, such that the method can be studied separately and in isolation." src="assets/images/placeholder.png" /></p> -->