AraBART-AHS / README.md
benakrab's picture
add bibtex citation provided by Springer
4c30ef3 verified
metadata
license: apache-2.0
language:
  - ar
tags:
  - Summarization
  - Arabic Headline Dataset
  - AHS
  - AraBART

AraBART-AHS

Model Description

AraBART-AHS is a fine-tuned version of AraBART model on Arabic Headline Dataset (AHS).

Uses

This model is intended to be used to generate Arabic abstractive summaries in particular headlines of articles.

How to Use

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, pipeline


model_name = "benakrab/AraBART-AHS"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

summarizer = pipeline("summarization", model=model, tokenizer=tokenizer)


text = "ينصح بممارسة الكتابة بشكل يومي ، وجعلها روتينا يوميا ؛ و ذلك لتطوير المهارات اللغوية و الكتابية و تحسينها ، إذ إنه كلما كتب الشخص أكثر كلما شعر بتطور أكبر ، و لذلك يجب تخصيص وقت لكتابة فقرة على الأقل ، أو صفحة كاملة يوميا ، و يمكن الاستيقاظ باكرا للكتابة ، أو النوم متأخرا بعد كتابة فقرة ما ، و لو لفترة قصيرة لا تتعدى خمس عشرة دقيقة عند عدم وجود وقت كاف أثناء النهار"

summary = summarizer(text)[0].get("summary_text")
print(summary)

Citation

@InProceedings{10.1007/978-3-031-79164-2_15,
    author="Benbarka, Mustapha
    and Kassimi, Moulay Abdellah",
    editor="Hdioud, Boutaina
    and Aouragh, Si Lhoussain",
    title="Fine-Tuning AraBART on AHS Dataset for Arabic Abstractive Summarization",
    booktitle="Arabic Language Processing: From Theory to Practice",
    year="2025",
    publisher="Springer Nature Switzerland",
    address="Cham",
    pages="170--182",
    abstract="Recent studies dealing with Abstractive Summarization are dominated by the use of Pre-trained Language Models based on Transformers. While the main contributions are applied to English, a review of the literature highlights the existence of a trend towards applying this framework on Arabic. This paper describes the full pipeline of Fine-tuning a Pre-trained Language Model based on Transformers for Arabic Abstractive Summarization. The model used is AraBART. The experiments are conducted on AHS dataset. Our work also challenges the quality of this dataset regarding the effects of repetitive summaries on the performances of the model. We found that their effect is substantial pointing out the need of a thorough study to be conducted on this dataset. A score of 54.69 {\$}{\$}ROUGE{\_}1{\$}{\$}ROUGE1is obtained on the test dataset. This score drops to 46.32 when the repetitive summaries are removed. A detailed analysis is provided discussing this issue.",
    isbn="978-3-031-79164-2"
}