README.md · igorktech/hierarchical-dialog-bert at 37e663cc669a7803918685a5c5ea232fd011e48c

metadata

license: cc-by-nc-sa-4.0
pipeline_tag: fill-mask
language: en
datasets:
  - OpenSubtitles

Model description

This model is based on An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification. Ilias Chalkidis, Xiang Dai, Manos Fergadiotis, Prodromos Malakasiotis, and Desmond Elliott. 2022. arXiv:2210.05529 (Preprint).

Initial weights were taken from google/bert_uncased_L-8_H-256_A-4.

Maximum input length is 512 tokens that is enoungh to encode dialog with few previous utterances (average sentence length per utterance in SWDA, MAPTASK, MRDA, BT_OASIS, FRAMES, AMI, DSTC3 is less than 11 tokens).