PubMedBERT for biomedical extractive summarization

Description

Work done for my Bachelor's thesis.

PubMedBERT fine-tuned on MS^2 for extractive summarization.
The model architecture is similar to BERTSum.
Training code is available at biomed-ext-summ.

Usage

summarizer = pipeline("summarization",
  model = "NotXia/pubmedbert-bio-ext-summ",
  tokenizer = AutoTokenizer.from_pretrained("NotXia/pubmedbert-bio-ext-summ"),
  trust_remote_code = True,
  device = 0
)

sentences = ["sent1.", "sent2.", "sent3?"]
summarizer({"sentences": sentences}, strategy="count", strategy_args=2)
>>> (['sent1.', 'sent2.'], [0, 1])

Strategies

Strategies to summarize the document:

  • length: summary with a maximum length (strategy_args is the maximum length).
  • count: summary with the given number of sentences (strategy_args is the number of sentences).
  • ratio: summary proportional to the length of the document (strategy_args is the ratio [0, 1]).
  • threshold: summary only with sentences with a score higher than a given value (strategy_args is the minimum score).
Downloads last month
100
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.

Dataset used to train NotXia/pubmedbert-bio-ext-summ