Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
ALBERT | |
---------------------------------------------------- | |
Overview | |
~~~~~~~~~~~~~~~~~~~~~ | |
The ALBERT model was proposed in `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations <https://arxiv.org/abs/1909.11942>`_ | |
by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. It presents | |
two parameter-reduction techniques to lower memory consumption and increase the trainig speed of BERT: | |
- Splitting the embedding matrix into two smaller matrices | |
- Using repeating layers split among groups | |
The abstract from the paper is the following: | |
*Increasing model size when pretraining natural language representations often results in improved performance on | |
downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations, | |
longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction | |
techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows | |
that our proposed methods lead to models that scale much better compared to the original BERT. We also use a | |
self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream | |
tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, | |
RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.* | |
Tips: | |
- ALBERT is a model with absolute position embeddings so it's usually advised to pad the inputs on | |
the right rather than the left. | |
- ALBERT uses repeating layers which results in a small memory footprint, however the computational cost remains | |
similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same | |
number of (repeating) layers. | |
AlbertConfig | |
~~~~~~~~~~~~~~~~~~~~~ | |
.. autoclass:: transformers.AlbertConfig | |
:members: | |
AlbertTokenizer | |
~~~~~~~~~~~~~~~~~~~~~ | |
.. autoclass:: transformers.AlbertTokenizer | |
:members: | |
AlbertModel | |
~~~~~~~~~~~~~~~~~~~~ | |
.. autoclass:: transformers.AlbertModel | |
:members: | |
AlbertForMaskedLM | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
.. autoclass:: transformers.AlbertForMaskedLM | |
:members: | |
AlbertForSequenceClassification | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
.. autoclass:: transformers.AlbertForSequenceClassification | |
:members: | |
AlbertForQuestionAnswering | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
.. autoclass:: transformers.AlbertForQuestionAnswering | |
:members: | |
TFAlbertModel | |
~~~~~~~~~~~~~~~~~~~~ | |
.. autoclass:: transformers.TFAlbertModel | |
:members: | |
TFAlbertForMaskedLM | |
~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
.. autoclass:: transformers.TFAlbertForMaskedLM | |
:members: | |
TFAlbertForSequenceClassification | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
.. autoclass:: transformers.TFAlbertForSequenceClassification | |
:members: | |