Spaces:
Runtime error
Runtime error
<!--Copyright 2020 The HuggingFace Team. All rights reserved. | |
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations under the License. | |
--> | |
# MobileBERT | |
## Overview | |
The MobileBERT model was proposed in [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny | |
Zhou. It's a bidirectional transformer based on the BERT model, which is compressed and accelerated using several | |
approaches. | |
The abstract from the paper is the following: | |
*Natural Language Processing (NLP) has recently achieved great success by using huge pre-trained models with hundreds | |
of millions of parameters. However, these models suffer from heavy model sizes and high latency such that they cannot | |
be deployed to resource-limited mobile devices. In this paper, we propose MobileBERT for compressing and accelerating | |
the popular BERT model. Like the original BERT, MobileBERT is task-agnostic, that is, it can be generically applied to | |
various downstream NLP tasks via simple fine-tuning. Basically, MobileBERT is a thin version of BERT_LARGE, while | |
equipped with bottleneck structures and a carefully designed balance between self-attentions and feed-forward networks. | |
To train MobileBERT, we first train a specially designed teacher model, an inverted-bottleneck incorporated BERT_LARGE | |
model. Then, we conduct knowledge transfer from this teacher to MobileBERT. Empirical studies show that MobileBERT is | |
4.3x smaller and 5.5x faster than BERT_BASE while achieving competitive results on well-known benchmarks. On the | |
natural language inference tasks of GLUE, MobileBERT achieves a GLUEscore o 77.7 (0.6 lower than BERT_BASE), and 62 ms | |
latency on a Pixel 4 phone. On the SQuAD v1.1/v2.0 question answering task, MobileBERT achieves a dev F1 score of | |
90.0/79.2 (1.5/2.1 higher than BERT_BASE).* | |
Tips: | |
- MobileBERT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather | |
than the left. | |
- MobileBERT is similar to BERT and therefore relies on the masked language modeling (MLM) objective. It is therefore | |
efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation. Models trained | |
with a causal language modeling (CLM) objective are better in that regard. | |
This model was contributed by [vshampor](https://huggingface.co/vshampor). The original code can be found [here](https://github.com/google-research/mobilebert). | |
## Documentation resources | |
- [Text classification task guide](../tasks/sequence_classification) | |
- [Token classification task guide](../tasks/token_classification) | |
- [Question answering task guide](../tasks/question_answering) | |
- [Masked language modeling task guide](../tasks/masked_language_modeling) | |
- [Multiple choice task guide](../tasks/multiple_choice) | |
## MobileBertConfig | |
[[autodoc]] MobileBertConfig | |
## MobileBertTokenizer | |
[[autodoc]] MobileBertTokenizer | |
## MobileBertTokenizerFast | |
[[autodoc]] MobileBertTokenizerFast | |
## MobileBert specific outputs | |
[[autodoc]] models.mobilebert.modeling_mobilebert.MobileBertForPreTrainingOutput | |
[[autodoc]] models.mobilebert.modeling_tf_mobilebert.TFMobileBertForPreTrainingOutput | |
## MobileBertModel | |
[[autodoc]] MobileBertModel | |
- forward | |
## MobileBertForPreTraining | |
[[autodoc]] MobileBertForPreTraining | |
- forward | |
## MobileBertForMaskedLM | |
[[autodoc]] MobileBertForMaskedLM | |
- forward | |
## MobileBertForNextSentencePrediction | |
[[autodoc]] MobileBertForNextSentencePrediction | |
- forward | |
## MobileBertForSequenceClassification | |
[[autodoc]] MobileBertForSequenceClassification | |
- forward | |
## MobileBertForMultipleChoice | |
[[autodoc]] MobileBertForMultipleChoice | |
- forward | |
## MobileBertForTokenClassification | |
[[autodoc]] MobileBertForTokenClassification | |
- forward | |
## MobileBertForQuestionAnswering | |
[[autodoc]] MobileBertForQuestionAnswering | |
- forward | |
## TFMobileBertModel | |
[[autodoc]] TFMobileBertModel | |
- call | |
## TFMobileBertForPreTraining | |
[[autodoc]] TFMobileBertForPreTraining | |
- call | |
## TFMobileBertForMaskedLM | |
[[autodoc]] TFMobileBertForMaskedLM | |
- call | |
## TFMobileBertForNextSentencePrediction | |
[[autodoc]] TFMobileBertForNextSentencePrediction | |
- call | |
## TFMobileBertForSequenceClassification | |
[[autodoc]] TFMobileBertForSequenceClassification | |
- call | |
## TFMobileBertForMultipleChoice | |
[[autodoc]] TFMobileBertForMultipleChoice | |
- call | |
## TFMobileBertForTokenClassification | |
[[autodoc]] TFMobileBertForTokenClassification | |
- call | |
## TFMobileBertForQuestionAnswering | |
[[autodoc]] TFMobileBertForQuestionAnswering | |
- call | |