claudios
/

cubert-20210711-Python-512

Model card Files Files and versions

claudios commited on May 7, 2024

Commit

a1dacca

·

verified ·

1 Parent(s): 5668463

Create README.md

Files changed (1) hide show

README.md +29 -0

README.md ADDED Viewed

	@@ -0,0 +1,29 @@

+---
+license: apache-2.0
+arxiv: 2001.00059
+pipeline_tag: fill-mask
+tags:
+- code
+- cubert
+---
+# CuBERT: Learning and Evaluating Contextual Embedding of Source Code
+## Overview
+This model is the unofficial HuggingFace version of "[CuBERT](https://github.com/google-research/google-research/tree/master/cubert)". In particular, this version comes from [gs://cubert/20210711_Python/pre_trained_model_epochs_2__length_512](https://console.cloud.google.com/storage/browser/cubert/20210711_Python/pre_trained_model_epochs_2__length_512). It was trained 2021-07-11 for 2 epochs with a 512 token context window on the Python BigQuery dataset. I manually converted the Tensorflow checkpoint to PyTorch, the tokenizer to a HuggingFace tokenizer, and have uploaded them here. All credit goes to Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, and Kensen Shi.
+Citation:
+```bibtex
+@inproceedings{cubert,
+author    = {Aditya Kanade and
+             Petros Maniatis and
+             Gogul Balakrishnan and
+             Kensen Shi},
+title     = {Learning and evaluating contextual embedding of source code},
+booktitle = {Proceedings of the 37th International Conference on Machine Learning,
+               {ICML} 2020, 12-18 July 2020},
+series    = {Proceedings of Machine Learning Research},
+publisher = {{PMLR}},
+year      = {2020},
+}
+```