![](https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/TmmBAiyUIfm_mM3OtySkw.png)
BERT release
Regroups the original BERT models released by the Google team. Except for the models marked otherwise, the checkpoints support English.
Fill-Mask • Updated • 5.94M • 287Note Base BERT model, smaller variant. Trained on the "cased" dataset, meaning that it wasn't lowercase and all accents were kept. 12-layer, 768-hidden, 12-heads , 110M parameters
google-bert/bert-base-uncased
Fill-Mask • Updated • 88.2M • 2.11kNote Base BERT model, smaller variant. Trained on the "uncased" dataset, meaning that it was lowercase and all accents were removed. 12-layer, 768-hidden, 12-heads , 110M parameters
google-bert/bert-large-cased
Fill-Mask • Updated • 99.4k • 33Note Large BERT model, larger variant. Trained on the "cased" dataset, meaning that it wasn't lowercase and all accents were kept. 24-layer, 1024-hidden, 16-heads, 340M parameters
google-bert/bert-large-uncased
Fill-Mask • Updated • 1.9M • • 126Note Large BERT model, larger variant. Trained on the "uncased" dataset, meaning that it was lowercase and all accents were removed. 24-layer, 1024-hidden, 16-heads, 340M parameters
google-bert/bert-base-multilingual-cased
Fill-Mask • Updated • 14.2M • • 476Note Base BERT model, smaller variant. The list of supported languages is available here: https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
google-bert/bert-base-chinese
Fill-Mask • Updated • 1.58M • 1.1kNote Base BERT model, smaller variant. Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters
google-bert/bert-large-cased-whole-word-masking
Fill-Mask • Updated • 3.3k • • 16Note Large BERT model, larger variant. Trained on the "cased" dataset, meaning that it wasn't lowercase and all accents were kept. Whole word masking indicates a different preprocessing where entire words are masked rather than subwords. The BERT team reports better metrics with the wwm models. 24-layer, 1024-hidden, 16-heads, 340M parameters
google-bert/bert-large-uncased-whole-word-masking
Fill-Mask • Updated • 20.7k • 19Note Large BERT model, larger variant. Trained on the "uncased" dataset, meaning that it was lowercase and all accents were removed. Whole word masking indicates a different preprocessing where entire words are masked rather than subwords. The BERT team reports better metrics with the wwm models. 24-layer, 1024-hidden, 16-heads, 340M parameters