Update README.md
Browse files
README.md
CHANGED
|
@@ -12,9 +12,40 @@ This model is a fine-tuning of paust/pko-t5-large model using AIHUB "summary and
|
|
| 12 |
|
| 13 |
μ΄ λͺ¨λΈμ paust/pko-t5-large modelμ AIHUB "μμ½λ¬Έ λ° λ ν¬νΈ μμ± λ°μ΄ν°"λ₯Ό μ΄μ©νμ¬ fine tunning ν κ²μ
λλ€. μ΄ λͺ¨λΈμ νκΈλ‘λ μ₯λ¬Έμ μ§§κ² μμ½ν΄ μ€λλ€.
|
| 14 |
|
| 15 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
-
More information needed
|
| 18 |
|
| 19 |
## Intended uses & limitations
|
| 20 |
|
|
|
|
| 12 |
|
| 13 |
μ΄ λͺ¨λΈμ paust/pko-t5-large modelμ AIHUB "μμ½λ¬Έ λ° λ ν¬νΈ μμ± λ°μ΄ν°"λ₯Ό μ΄μ©νμ¬ fine tunning ν κ²μ
λλ€. μ΄ λͺ¨λΈμ νκΈλ‘λ μ₯λ¬Έμ μ§§κ² μμ½ν΄ μ€λλ€.
|
| 14 |
|
| 15 |
+
## Usage
|
| 16 |
+
```python
|
| 17 |
+
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
| 18 |
+
import nltk
|
| 19 |
+
nltk.download('punkt')
|
| 20 |
+
|
| 21 |
+
model_dir = "lcw99/t5-base-korean-text-summary"
|
| 22 |
+
tokenizer = AutoTokenizer.from_pretrained(model_dir)
|
| 23 |
+
model = AutoModelForSeq2SeqLM.from_pretrained(model_dir)
|
| 24 |
+
|
| 25 |
+
max_input_length = 512
|
| 26 |
+
|
| 27 |
+
text = """
|
| 28 |
+
μ£ΌμΈκ³΅ κ°μΈκ΅¬(νμ μ°)λ βμ리λ¨μμ νμ΄κ° λ§μ΄ λλλ° λ€ κ°λ€λ²λ¦°λ€βλ μΉκ΅¬
|
| 29 |
+
λ°μμ(νλ΄μ)μ μκΈ°λ₯Ό λ£κ³ μ리λ¨μ° νμ΄λ₯Ό νκ΅μ μμΆνκΈ° μν΄ μ리λ¨μΌλ‘ κ°λ€.
|
| 30 |
+
κ΅λ¦½μμ°κ³Όνμ μΈ‘μ βμ€μ λ‘ λ¨λμμμ νμ΄κ° λ§μ΄ μ΄κ³ μλ₯΄ν¨ν°λλ₯Ό λΉλ‘―ν λ¨λ―Έ κ΅κ°μμ νμ΄κ° λ§μ΄ μ‘νλ€βλ©°
|
| 31 |
+
βμλ¦¬λ¨ μ°μμλ νμ΄κ° λ§μ΄ μμν κ²βμ΄λΌκ³ μ€λͺ
νλ€.
|
| 32 |
+
|
| 33 |
+
κ·Έλ¬λ κ΄μΈμ²μ λ°λ₯΄λ©΄ νκ΅μ μ리λ¨μ° νμ΄κ° μμ
λ μ μ μλ€.
|
| 34 |
+
μΌκ°μμ βλμ λ²κΈ° μν΄ μ리λ¨μ° νμ΄λ₯Ό ꡬνλ¬ κ° μ€μ μ κ°μ°μ±μ΄ λ¨μ΄μ§λ€βλ μ§μ λ νλ€.
|
| 35 |
+
λλΌλ§ λ°°κ²½μ΄ λ 2008~2010λ
μλ μ΄λ―Έ κ΅λ΄μ μλ₯΄ν¨ν°λ, μΉ λ , λ―Έκ΅ λ± μλ©λ¦¬μΉ΄μ° νμ΄κ° μμ
λκ³ μμκΈ° λλ¬Έμ΄λ€.
|
| 36 |
+
μ€μ μ‘°λ΄ν μ²΄ν¬ μμ μ νμ‘°νλ βνλ ₯μ Kμ¨βλ νμ΄ μ¬μ
μ΄ μλλΌ μ리λ¨μ μ λ°μ© νΉμμ©μ λ΄μ νλ μ¬μ
μ νλ¬ μ리λ¨μ κ°μλ€.
|
| 37 |
+
"""
|
| 38 |
+
|
| 39 |
+
inputs = ["summarize: " + text]
|
| 40 |
+
|
| 41 |
+
inputs = tokenizer(inputs, max_length=max_input_length, truncation=True, return_tensors="pt")
|
| 42 |
+
output = model.generate(**inputs, num_beams=8, do_sample=True, min_length=10, max_length=100)
|
| 43 |
+
decoded_output = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
|
| 44 |
+
predicted_title = nltk.sent_tokenize(decoded_output.strip())[0]
|
| 45 |
+
|
| 46 |
+
print(predicted_title)
|
| 47 |
+
```
|
| 48 |
|
|
|
|
| 49 |
|
| 50 |
## Intended uses & limitations
|
| 51 |
|