Leonard Püttmann
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -39,6 +39,36 @@ text_to_translate = "Vorrei una tazza di tè nero, per favore."
|
|
39 |
response = generate_response(text_to_translate)
|
40 |
print(response)
|
41 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
|
43 |
## Evaluation
|
44 |
Done on the Opus 100 test set.
|
|
|
39 |
response = generate_response(text_to_translate)
|
40 |
print(response)
|
41 |
```
|
42 |
+
As this model is trained on translating sentence pairs, it is best to split longer text into individual sentences, ideally using SpaCy. You can then translate the sentences and join the translations at the end like this:
|
43 |
+
```python
|
44 |
+
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
45 |
+
import spacy
|
46 |
+
# First, install spaCy and the Italian language model if you haven't already
|
47 |
+
# !pip install spacy
|
48 |
+
# !python -m spacy download it_core_news_sm
|
49 |
+
|
50 |
+
nlp = spacy.load("it_core_news_sm")
|
51 |
+
|
52 |
+
tokenizer = AutoTokenizer.from_pretrained("LeonardPuettmann/mt0-Quadrifoglio-mt-it-en")
|
53 |
+
model = AutoModelForSeq2SeqLM.from_pretrained("LeonardPuettmann/mt0-Quadrifoglio-mt-it-en")
|
54 |
+
|
55 |
+
def generate_response(input_text):
|
56 |
+
input_ids = tokenizer("translate Italian to English: " + input_text, return_tensors="pt").input_ids
|
57 |
+
output = model.generate(input_ids, max_new_tokens=256)
|
58 |
+
return tokenizer.decode(output[0], skip_special_tokens=True)
|
59 |
+
|
60 |
+
text = "Ciao, come stai? Oggi è una bella giornata. Spero che tu stia bene."
|
61 |
+
doc = nlp(text)
|
62 |
+
sentences = [sent.text for sent in doc.sents]
|
63 |
+
|
64 |
+
sentence_translations = []
|
65 |
+
for i, sentence in enumerate(sentences):
|
66 |
+
sentence_translation = generate_response(sentence)
|
67 |
+
sentence_translations.append(sentence_translation)
|
68 |
+
|
69 |
+
full_translation = " ".join(sentence_translations)
|
70 |
+
print(full_translation)
|
71 |
+
```
|
72 |
|
73 |
## Evaluation
|
74 |
Done on the Opus 100 test set.
|