Leonard Püttmann commited on
Commit
d74cc3f
·
verified ·
1 Parent(s): 09cba0f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -0
README.md CHANGED
@@ -39,6 +39,36 @@ text_to_translate = "Vorrei una tazza di tè nero, per favore."
39
  response = generate_response(text_to_translate)
40
  print(response)
41
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ## Evaluation
44
  Done on the Opus 100 test set.
 
39
  response = generate_response(text_to_translate)
40
  print(response)
41
  ```
42
+ As this model is trained on translating sentence pairs, it is best to split longer text into individual sentences, ideally using SpaCy. You can then translate the sentences and join the translations at the end like this:
43
+ ```python
44
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
45
+ import spacy
46
+ # First, install spaCy and the Italian language model if you haven't already
47
+ # !pip install spacy
48
+ # !python -m spacy download it_core_news_sm
49
+
50
+ nlp = spacy.load("it_core_news_sm")
51
+
52
+ tokenizer = AutoTokenizer.from_pretrained("LeonardPuettmann/mt0-Quadrifoglio-mt-it-en")
53
+ model = AutoModelForSeq2SeqLM.from_pretrained("LeonardPuettmann/mt0-Quadrifoglio-mt-it-en")
54
+
55
+ def generate_response(input_text):
56
+ input_ids = tokenizer("translate Italian to English: " + input_text, return_tensors="pt").input_ids
57
+ output = model.generate(input_ids, max_new_tokens=256)
58
+ return tokenizer.decode(output[0], skip_special_tokens=True)
59
+
60
+ text = "Ciao, come stai? Oggi è una bella giornata. Spero che tu stia bene."
61
+ doc = nlp(text)
62
+ sentences = [sent.text for sent in doc.sents]
63
+
64
+ sentence_translations = []
65
+ for i, sentence in enumerate(sentences):
66
+ sentence_translation = generate_response(sentence)
67
+ sentence_translations.append(sentence_translation)
68
+
69
+ full_translation = " ".join(sentence_translations)
70
+ print(full_translation)
71
+ ```
72
 
73
  ## Evaluation
74
  Done on the Opus 100 test set.