Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
| BERT | |
| ---------------------------------------------------- | |
| Overview | |
| ~~~~~~~~~~~~~~~~~~~~~ | |
| The BERT model was proposed in `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`__ | |
| by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. It's a bidirectional transformer | |
| pre-trained using a combination of masked language modeling objective and next sentence prediction | |
| on a large corpus comprising the Toronto Book Corpus and Wikipedia. | |
| The abstract from the paper is the following: | |
| *We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations | |
| from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional | |
| representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, | |
| the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models | |
| for a wide range of tasks, such as question answering and language inference, without substantial task-specific | |
| architecture modifications.* | |
| *BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural | |
| language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI | |
| accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute | |
| improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).* | |
| Tips: | |
| - BERT is a model with absolute position embeddings so it's usually advised to pad the inputs on | |
| the right rather than the left. | |
| - BERT was trained with a masked language modeling (MLM) objective. It is therefore efficient at predicting masked | |
| tokens and at NLU in general, but is not optimal for text generation. Models trained with a causal language | |
| modeling (CLM) objective are better in that regard. | |
| - Alongside MLM, BERT was trained using a next sentence prediction (NSP) objective using the [CLS] token as a sequence | |
| approximate. The user may use this token (the first token in a sequence built with special tokens) to get a sequence | |
| prediction rather than a token prediction. However, averaging over the sequence may yield better results than using | |
| the [CLS] token. | |
| BertConfig | |
| ~~~~~~~~~~~~~~~~~~~~~ | |
| .. autoclass:: transformers.BertConfig | |
| :members: | |
| BertTokenizer | |
| ~~~~~~~~~~~~~~~~~~~~~ | |
| .. autoclass:: transformers.BertTokenizer | |
| :members: | |
| BertModel | |
| ~~~~~~~~~~~~~~~~~~~~ | |
| .. autoclass:: transformers.BertModel | |
| :members: | |
| BertForPreTraining | |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| .. autoclass:: transformers.BertForPreTraining | |
| :members: | |
| BertForMaskedLM | |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| .. autoclass:: transformers.BertForMaskedLM | |
| :members: | |
| BertForNextSentencePrediction | |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| .. autoclass:: transformers.BertForNextSentencePrediction | |
| :members: | |
| BertForSequenceClassification | |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| .. autoclass:: transformers.BertForSequenceClassification | |
| :members: | |
| BertForMultipleChoice | |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| .. autoclass:: transformers.BertForMultipleChoice | |
| :members: | |
| BertForTokenClassification | |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| .. autoclass:: transformers.BertForTokenClassification | |
| :members: | |
| BertForQuestionAnswering | |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| .. autoclass:: transformers.BertForQuestionAnswering | |
| :members: | |
| TFBertModel | |
| ~~~~~~~~~~~~~~~~~~~~ | |
| .. autoclass:: transformers.TFBertModel | |
| :members: | |
| TFBertForPreTraining | |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| .. autoclass:: transformers.TFBertForPreTraining | |
| :members: | |
| TFBertForMaskedLM | |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| .. autoclass:: transformers.TFBertForMaskedLM | |
| :members: | |
| TFBertForNextSentencePrediction | |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| .. autoclass:: transformers.TFBertForNextSentencePrediction | |
| :members: | |
| TFBertForSequenceClassification | |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| .. autoclass:: transformers.TFBertForSequenceClassification | |
| :members: | |
| TFBertForMultipleChoice | |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| .. autoclass:: transformers.TFBertForMultipleChoice | |
| :members: | |
| TFBertForTokenClassification | |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| .. autoclass:: transformers.TFBertForTokenClassification | |
| :members: | |
| TFBertForQuestionAnswering | |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| .. autoclass:: transformers.TFBertForQuestionAnswering | |
| :members: | |