Moryjj's picture
Update README.md
f750c19 verified
---
language:
- fa
metrics:
- bertscore
- rouge
base_model:
- Ahmad/parsT5-base
pipeline_tag: text-generation
tags:
- legal
- Simplification
- text-to-text
---
# Persian Simplification Model (parsT5 Base)
---
## Overview
This model is a fine-tuned ParsT5 (base) version designed explicitly for the Persian Simplification Task. The training data consists of Persian legal texts. The model is trained using supervised fine-tuning and employs the **Unlimiformer Algorithm** to handle large inputs effectively.
- **Architecture**: Ahmad/parsT5-base
- **Language**: Persian
- **Task**: Text Simplification
- **Training Setup**:
- **Algorithm for reducing computation**: Unlimiformer
- **Epochs**: 12
- **Hardware**: NVIDIA GPU 4070
- **Trainable Blocks**: Last Encoder-Decoder
- **Optimizer** : AdamW + lr_scheduler
- **Input max Tokens**: 4096
- **Output max Tokens**: 512
---
## Readability Scores
The following table summarizes the readability scores for the original texts and the predictions generated by the fine-tuned model:
| Metric | Original Texts | Predictions |
|----------------|----------------|-------------|
| Gunning Fox | 14.9676 | **7.5891** |
| ARI | 11.8796 | **6.7869** |
| Dale-Chall | 2.6473 | **1.2679** |
| Flesch-Dayani | 228.2377 | **244.0153**|
---
## Evaluation Results
The fine-tuned model was evaluated using **Rouge** and **BERTScore (mBERT)** metrics. For comparison, the performance of two other Persian LLMs based on LLaMA is also presented:
| Prediction Model | Rouge1 | Rouge2 | RougeL | Precision | Recall | F1 |
|-----------------------------------------------|---------|---------|---------|-----------|---------|---------|
| Fine-Tuned Model | **38.08%** | **15.83%** | **19.41%** | **76.75%** | 71.06% | **73.71%** |
| ViraIntelligentDataMining/PersianLLaMA-13B | 28.64% | 9.81% | 13.67% | 68.36% | 73.44% | 70.80% |
| MehdiHosseiniMoghadam_AVA_Llama_3_V2 | 30.07% | 10.33% | 16.39% | 68.47% | **73.47%** | 70.87% |
---
## How to Use
You can load and use this model with the Hugging Face library as follows:
```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
# Load the model
tokenizer = AutoTokenizer.from_pretrained("Moryjj/FineTuned-parsT5-Simplification")
model = AutoModelForSeq2SeqLM.from_pretrained("Moryjj/FineTuned-parsT5-Simplification")
# Example usage
input_text = "متن پیچیده فارسی"
inputs = tokenizer(input_text, return_tensors="pt", max_length=4096, truncation=True)
outputs = model.generate(**inputs)
# Decode the output
simplified_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(simplified_text)
```
### Contact Information
For inquiries or feedback, please contact:
Author: Mohammadreza Joneidi Jafari
Email: [email protected]