ManshoorAI

Overview

This project fine-tunes GPT-2 to generate Persian neo-poetry inspired by the works of Sohrab Sepehri and Forough Farokhzad.
The model is a work in progress. I look forward to hear your thoughts.

LSTM Model

I also trained a simple LSTM model with same data in my Github page Here. you can compare the results to see the power of Transformers!

Model Details

  • Base Model: GPT-2 (pretrained by OpenAI)
  • intermediate Model: HooshvareLab/gpt2-fa
  • Dataset: Curated poems from Sohrab Sepehri and Forough Farokhzad
  • Fine-Tuning: PEFT/LoRA
  • Language: Persian (Farsi)
  • Output: Generates poetry with free verse and metaphorical depth

Installation & Usage

You can load the model using the HuggingFace transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from hazm import Normalizer

model_name = "rahiminia/manshoorai"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

def generate_poetry(prompt, max_length=30):
    prompt = Normalizer().normalize(prompt)
    generator = pipeline('text-generation', model=model, tokenizer=tokenizer)
    output = generator(prompt, max_length=max_length)
    print(output['generated_text'])

print(generate_poetry("شب آرام و خاموش"))

You can also use optimum onnxruntime to use ONNX model checkpoint:

from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForCausalLM

model = ORTModelForCausalLM.from_pretrained("rahiminia/manshoorai", use_cache=False, use_io_binding=False)
tokenizer = AutoTokenizer.from_pretrained("rahiminia/manshoorai")

onnx = pipeline("text-generation", model=model, tokenizer=tokenizer)

prompt = 'در این شب سیاه'
pred = onnx(prompt)
print(pred[0]['generated_text'])

Training Details

  • Tokenizer: Tokenizer with Byte Pair Encoding (BPE) from HooshvareLab/gpt2-fa
  • Training: Fine-tuned using PyTorch and the transformers library
  • Hyperparameters: Adjusted learning rate and weight decay

Sample Outputs

Prompt: "باران که می‌بارد"

Generated Text:

  • ManshoorAI
    باران که می‌بارد من، به باغ راه یافته بودم
    من این دشت را دیدم
    که پر از درخت است
    و در آن برگ هایم هیچ گونه سبز نیست
    
  • Base Model (GPT2-fa)
    باران که می‌بارد با خود بگوید که دیگر چه شده بود؟ اگر آن جوان از پشت نرده‌ها به پایین میرفت؛
    

Prompt: "در این شب سیاه"

Generated Text:

در این شب سیاه
چشم‌های سیاه اتاق‌ها
همه دیده‌های من هستند
از هر پلک چه می‌بینم. 
و هر چهره روشن دیگر
من را در سکوت خانه فرو برده

Limitations & Biases

  • This is a work in progress, with many improvements yet to be made.
  • The model may occasionally generate repetitive or incoherent lines.
  • It does not strictly follow classical Persian poetry rules but leans towards free verse.
  • Biases in the training dataset might influence stylistic preferences.

Contributions & Feedback

If you use this model or have suggestions for improvement, feel free to open an issue or contribute via Hugging Face Spaces.

License

This model is released under the MIT License. Please ensure ethical use and proper attribution when sharing generated works.

Downloads last month
44
Safetensors
Model size
118M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for rahiminia/manshoorai

Quantized
(1)
this model