File size: 2,936 Bytes
b3abb8f 72b7ec5 fe5042c de002d2 fe5042c de002d2 fe5042c 72b7ec5 fe5042c 72b7ec5 de002d2 36b2054 de002d2 72b7ec5 fe5042c de002d2 fe5042c de002d2 fe5042c de002d2 fe5042c de002d2 fe5042c de002d2 fe5042c de002d2 fe5042c 72b7ec5 9223eda 325ba48 de002d2 7395b10 de002d2 00415df fe5042c de002d2 fe5042c de002d2 fe5042c de002d2 fe5042c de002d2 fe5042c de002d2 f750c19 fe5042c de002d2 fe5042c de002d2 fe5042c de002d2 fe5042c de002d2 b3abb8f de002d2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
---
language:
- fa
metrics:
- bertscore
- rouge
base_model:
- Ahmad/parsT5-base
pipeline_tag: text-generation
tags:
- legal
- Simplification
- text-to-text
---
# Persian Simplification Model (parsT5 Base)
---
## Overview
This model is a fine-tuned ParsT5 (base) version designed explicitly for the Persian Simplification Task. The training data consists of Persian legal texts. The model is trained using supervised fine-tuning and employs the **Unlimiformer Algorithm** to handle large inputs effectively.
- **Architecture**: Ahmad/parsT5-base
- **Language**: Persian
- **Task**: Text Simplification
- **Training Setup**:
- **Algorithm for reducing computation**: Unlimiformer
- **Epochs**: 12
- **Hardware**: NVIDIA GPU 4070
- **Trainable Blocks**: Last Encoder-Decoder
- **Optimizer** : AdamW + lr_scheduler
- **Input max Tokens**: 4096
- **Output max Tokens**: 512
---
## Readability Scores
The following table summarizes the readability scores for the original texts and the predictions generated by the fine-tuned model:
| Metric | Original Texts | Predictions |
|----------------|----------------|-------------|
| Gunning Fox | 14.9676 | **7.5891** |
| ARI | 11.8796 | **6.7869** |
| Dale-Chall | 2.6473 | **1.2679** |
| Flesch-Dayani | 228.2377 | **244.0153**|
---
## Evaluation Results
The fine-tuned model was evaluated using **Rouge** and **BERTScore (mBERT)** metrics. For comparison, the performance of two other Persian LLMs based on LLaMA is also presented:
| Prediction Model | Rouge1 | Rouge2 | RougeL | Precision | Recall | F1 |
|-----------------------------------------------|---------|---------|---------|-----------|---------|---------|
| Fine-Tuned Model | **38.08%** | **15.83%** | **19.41%** | **76.75%** | 71.06% | **73.71%** |
| ViraIntelligentDataMining/PersianLLaMA-13B | 28.64% | 9.81% | 13.67% | 68.36% | 73.44% | 70.80% |
| MehdiHosseiniMoghadam_AVA_Llama_3_V2 | 30.07% | 10.33% | 16.39% | 68.47% | **73.47%** | 70.87% |
---
## How to Use
You can load and use this model with the Hugging Face library as follows:
```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
# Load the model
tokenizer = AutoTokenizer.from_pretrained("Moryjj/FineTuned-parsT5-Simplification")
model = AutoModelForSeq2SeqLM.from_pretrained("Moryjj/FineTuned-parsT5-Simplification")
# Example usage
input_text = "متن پیچیده فارسی"
inputs = tokenizer(input_text, return_tensors="pt", max_length=4096, truncation=True)
outputs = model.generate(**inputs)
# Decode the output
simplified_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(simplified_text)
```
### Contact Information
For inquiries or feedback, please contact:
Author: Mohammadreza Joneidi Jafari
Email: [email protected] |