Spaces:

exbert-project
/

exbert

Running on CPU Upgrade

App Files Files Community

exbert / server /transformers /docs /source /model_doc /gpt.rst

bhoov

git subrepo clone (merge) --branch=exbert-mods https://github.com/bhoov/transformers.git server/transformers

75466df over 5 years ago

raw

history blame

3.25 kB

	OpenAI GPT
	----------------------------------------------------

	Overview
	~~~~~~~~~~~~~~~~~~~~~

	OpenAI GPT model was proposed in `Improving Language Understanding by Generative Pre-Training <https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf>`__
	by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. It's a causal (unidirectional)
	transformer pre-trained using language modeling on a large corpus will long range dependencies, the Toronto Book Corpus.

	The abstract from the paper is the following:

	*Natural language understanding comprises a wide range of diverse tasks such
	as textual entailment, question answering, semantic similarity assessment, and
	document classification. Although large unlabeled text corpora are abundant,
	labeled data for learning these specific tasks is scarce, making it challenging for
	discriminatively trained models to perform adequately. We demonstrate that large
	gains on these tasks can be realized by generative pre-training of a language model
	on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each
	specific task. In contrast to previous approaches, we make use of task-aware input
	transformations during fine-tuning to achieve effective transfer while requiring
	minimal changes to the model architecture. We demonstrate the effectiveness of
	our approach on a wide range of benchmarks for natural language understanding.
	Our general task-agnostic model outperforms discriminatively trained models that
	use architectures specifically crafted for each task, significantly improving upon the
	state of the art in 9 out of the 12 tasks studied.*

	Tips:

	- GPT is a model with absolute position embeddings so it's usually advised to pad the inputs on
	the right rather than the left.
	- GPT was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next
	token in a sequence. Leveraging this feature allows GPT-2 to generate syntactically coherent text as
	it can be observed in the `run_generation.py` example script.

	`Write With Transformer <https://transformer.huggingface.co/doc/gpt>`__ is a webapp created and hosted by
	Hugging Face showcasing the generative capabilities of several models. GPT is one of them.

	OpenAIGPTConfig
	~~~~~~~~~~~~~~~~~~~~~

	.. autoclass:: transformers.OpenAIGPTConfig
	:members:


	OpenAIGPTTokenizer
	~~~~~~~~~~~~~~~~~~~~~~~~~~

	.. autoclass:: transformers.OpenAIGPTTokenizer
	:members:


	OpenAIGPTModel
	~~~~~~~~~~~~~~~~~~~~~~~~~

	.. autoclass:: transformers.OpenAIGPTModel
	:members:


	OpenAIGPTLMHeadModel
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	.. autoclass:: transformers.OpenAIGPTLMHeadModel
	:members:


	OpenAIGPTDoubleHeadsModel
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	.. autoclass:: transformers.OpenAIGPTDoubleHeadsModel
	:members:


	TFOpenAIGPTModel
	~~~~~~~~~~~~~~~~~~~~~~~~~

	.. autoclass:: transformers.TFOpenAIGPTModel
	:members:


	TFOpenAIGPTLMHeadModel
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	.. autoclass:: transformers.TFOpenAIGPTLMHeadModel
	:members:


	TFOpenAIGPTDoubleHeadsModel
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	.. autoclass:: transformers.TFOpenAIGPTDoubleHeadsModel
	:members: