--- language: - fa metrics: - bertscore - rouge base_model: - Ahmad/parsT5-base pipeline_tag: text-generation tags: - legal - Simplification - text-to-text --- # Persian Simplification Model (parsT5 Base) --- ## Overview This model is a fine-tuned ParsT5 (base) version designed explicitly for the Persian Simplification Task. The training data consists of Persian legal texts. The model is trained using supervised fine-tuning and employs the **Unlimiformer Algorithm** to handle large inputs effectively. - **Architecture**: Ahmad/parsT5-base - **Language**: Persian - **Task**: Text Simplification - **Training Setup**: - **Algorithm for reducing computation**: Unlimiformer - **Epochs**: 12 - **Hardware**: NVIDIA GPU 4070 - **Trainable Blocks**: Last Encoder-Decoder - **Optimizer** : AdamW + lr_scheduler - **Input max Tokens**: 4096 - **Output max Tokens**: 512 --- ## Readability Scores The following table summarizes the readability scores for the original texts and the predictions generated by the fine-tuned model: | Metric | Original Texts | Predictions | |----------------|----------------|-------------| | Gunning Fox | 14.9676 | **7.5891** | | ARI | 11.8796 | **6.7869** | | Dale-Chall | 2.6473 | **1.2679** | | Flesch-Dayani | 228.2377 | **244.0153**| --- ## Evaluation Results The fine-tuned model was evaluated using **Rouge** and **BERTScore (mBERT)** metrics. For comparison, the performance of two other Persian LLMs based on LLaMA is also presented: | Prediction Model | Rouge1 | Rouge2 | RougeL | Precision | Recall | F1 | |-----------------------------------------------|---------|---------|---------|-----------|---------|---------| | Fine-Tuned Model | **38.08%** | **15.83%** | **19.41%** | **76.75%** | 71.06% | **73.71%** | | ViraIntelligentDataMining/PersianLLaMA-13B | 28.64% | 9.81% | 13.67% | 68.36% | 73.44% | 70.80% | | MehdiHosseiniMoghadam_AVA_Llama_3_V2 | 30.07% | 10.33% | 16.39% | 68.47% | **73.47%** | 70.87% | --- ## How to Use You can load and use this model with the Hugging Face library as follows: ```python from transformers import AutoModelForSeq2SeqLM, AutoTokenizer # Load the model tokenizer = AutoTokenizer.from_pretrained("Moryjj/FineTuned-parsT5-Simplification") model = AutoModelForSeq2SeqLM.from_pretrained("Moryjj/FineTuned-parsT5-Simplification") # Example usage input_text = "متن پیچیده فارسی" inputs = tokenizer(input_text, return_tensors="pt", max_length=4096, truncation=True) outputs = model.generate(**inputs) # Decode the output simplified_text = tokenizer.decode(outputs[0], skip_special_tokens=True) print(simplified_text) ``` ### Contact Information For inquiries or feedback, please contact: Author: Mohammadreza Joneidi Jafari Email: m.r.joneidi.02@gmail.com