Update README.md
Browse files
README.md
CHANGED
@@ -9,3 +9,13 @@ pipeline_tag: text-generation
|
|
9 |

|
10 |
|
11 |
## Model description
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |

|
10 |
|
11 |
## Model description
|
12 |
+
|
13 |
+
Generative Pre-trained Transformer 2 (GPT-2), developed by OpenAI, represents the second iteration in their foundational series of GPT models. GPT-2 embarked on its journey with a substantial dataset comprising 8 million web pages. Initially unveiled in February 2019, it reached its pinnacle with the full release of the 1.5-billion-parameter model on November 5, 2019.
|
14 |
+
|
15 |
+
GPT-2 emerged as a direct evolution from its predecessor, GPT-1, boasting a tenfold augmentation in both parameter count and training dataset magnitude. Positioned as a versatile learner, its prowess across diverse tasks stemmed from its innate capacity to accurately prognosticate the subsequent item in a sequence. This predictive prowess endowed it with the capability to engage in text translation, answer inquiries derived from textual contexts, distill concise summaries from extensive passages, and produce text outputs rivalling human composition. Nonetheless, it occasionally exhibited tendencies towards repetitiveness or tangential incoherence, particularly when tasked with generating lengthy passages.
|
16 |
+
|
17 |
+
Architecturally akin to its antecedent GPT-1 and progeny GPT-3 and GPT-4, GPT-2 features a generative pre-trained transformer architecture, underpinned by a deep neural network framework, specifically a transformer model. Departing from antiquated recurrence- and convolution-based designs, this architecture capitalizes on attention mechanisms. These mechanisms afford the model the capability to selectively concentrate on segments of input text deemed most pertinent. This transformative architectural paradigm facilitates enhanced parallelization, markedly surpassing preceding benchmarks established by RNN/CNN/LSTM-based models.
|
18 |
+
|
19 |
+
## Training
|
20 |
+
|
21 |
+
The transformer architecture provides a capability that allows GPT models to be trained on larger datasets compared to previous NLP (natural language processing) models. The GPT-1 model demonstrated the validity of this approach; however, GPT-2 aimed to further investigate the emergent properties of networks trained on extremely large datasets. CommonCrawl, a large corpus previously used to train NLP systems, was considered due to its extensive size. However, further examination revealed that much of the content was unintelligible. Consequently, OpenAI developed a new dataset called WebText. Instead of indiscriminately scraping content from the World Wide Web, WebText collected content only from pages linked to by Reddit posts that had received at least three upvotes prior to December 2017. The dataset was then cleaned; HTML documents were parsed into plain text, duplicate pages were removed, and Wikipedia pages were excluded due to the risk of overfitting, as they were prevalent in many other datasets. Additionally, this model was retrained using the OpenWebText corpus by Anezatra. Utilizing DistilGPT, the model was aimed at reducing its size to create a lighter and more efficient version. The DistilGPT technique maintains the model's learning capabilities while reducing the number of parameters, thus speeding up training and inference processes and utilizing resources more efficiently.
|