---
language:
- fa
metrics:
- bertscore
- rouge
base_model:
- Ahmad/parsT5-base
pipeline_tag: text-generation
tags:
- legal
- Simplification
- text-to-text
---
# Persian Simplification Model (parsT5 Base)

---

## Overview

This model is a fine-tuned ParsT5 (base) version designed explicitly for the Persian Simplification Task. The training data consists of Persian legal texts. The model is trained using supervised fine-tuning and employs the **Unlimiformer Algorithm** to handle large inputs effectively.

- **Architecture**: Ahmad/parsT5-base
- **Language**: Persian
- **Task**: Text Simplification
- **Training Setup**:
  - **Algorithm for reducing computation**: Unlimiformer
  - **Epochs**: 12
  - **Hardware**: NVIDIA GPU 4070
  - **Trainable Blocks**: Last Encoder-Decoder
  - **Optimizer** : AdamW + lr_scheduler
  - **Input max Tokens**: 4096
  - **Output max Tokens**: 512

---

## Readability Scores

The following table summarizes the readability scores for the original texts and the predictions generated by the fine-tuned model:

| Metric         | Original Texts | Predictions |
|----------------|----------------|-------------|
| Gunning Fox    | 14.9676        | **7.5891**  |
| ARI            | 11.8796        | **6.7869**  |
| Dale-Chall     | 2.6473         | **1.2679**  |
| Flesch-Dayani  | 228.2377       | **244.0153**|

---

## Evaluation Results

The fine-tuned model was evaluated using **Rouge** and **BERTScore (mBERT)** metrics. For comparison, the performance of two other Persian LLMs based on LLaMA is also presented:
 

| Prediction Model                              | Rouge1  | Rouge2  | RougeL  | Precision | Recall  | F1      |
|-----------------------------------------------|---------|---------|---------|-----------|---------|---------|
| Fine-Tuned Model                              | **38.08%** | **15.83%** | **19.41%** | **76.75%**  | 71.06% | **73.71%** |
| ViraIntelligentDataMining/PersianLLaMA-13B    | 28.64%  | 9.81%   | 13.67%  | 68.36%    | 73.44%  | 70.80%  |
| MehdiHosseiniMoghadam_AVA_Llama_3_V2          | 30.07%  | 10.33%  | 16.39%  | 68.47%    | **73.47%**  | 70.87%  |

---

## How to Use

You can load and use this model with the Hugging Face library as follows:

```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load the model
tokenizer = AutoTokenizer.from_pretrained("Moryjj/FineTuned-parsT5-Simplification")
model = AutoModelForSeq2SeqLM.from_pretrained("Moryjj/FineTuned-parsT5-Simplification")

# Example usage
input_text = "متن پیچیده فارسی"
inputs = tokenizer(input_text, return_tensors="pt", max_length=4096, truncation=True)
outputs = model.generate(**inputs)

# Decode the output
simplified_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(simplified_text)
```

### Contact Information
For inquiries or feedback, please contact:

Author: Mohammadreza Joneidi Jafari

Email: m.r.joneidi.02@gmail.com