Narrativa
/

byt5-base-finetuned-tweet-qa

@@ -22,37 +22,33 @@ Authors: *Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang,
 ## Details of the downstream task (Question Answering) - Dataset 📚
-[TweetQA](hhttps://huggingface.co/datasets/tweets_hate_speech_detection)
-The objective of this task is to detect hate speech in tweets. For the sake of simplicity, we say a tweet contains hate speech if it has a racist or sexist sentiment associated with it. So, the task is to classify racist or sexist tweets from other tweets.
-Formally, given a training sample of tweets and labels, where label ‘1’ denotes the tweet is racist/sexist and label ‘0’ denotes the tweet is not racist/sexist, your objective is to predict the labels on the given test dataset.
 - Data Instances:
-The dataset contains a label denoting is the tweet a hate speech or not
 ```json
-{'label': 0,  # not a hate speech
- 'tweet': ' @user when a father is dysfunctional and is so selfish he drags his kids into his dysfunction.   #run'}
 ```
 - Data Fields:
-**label**: 1 - it is a hate speech, 0 - not a hate speech
-**tweet**: content of the tweet as a string
-- Data Splits:
-The data contains training data with **31962** entries
-## Test set metrics 🧾
-We created a representative test set with the 5% of the entries.
-The dataset is so imbalanced and we got a **F1 score of 79.8**
 ## Model in Action 🚀
@@ -65,21 +61,22 @@ pip install -q ./transformers
 ```python
 from transformers import AutoTokenizer, T5ForConditionalGeneration
-ckpt = 'Narrativa/byt5-base-tweet-hate-detection'
 tokenizer = AutoTokenizer.from_pretrained(ckpt)
-model = T5ForConditionalGeneration.from_pretrained(ckpt).to("cuda")
-def classify_tweet(tweet):
-    inputs = tokenizer([tweet], padding='max_length', truncation=True, max_length=512, return_tensors='pt')
     input_ids = inputs.input_ids.to('cuda')
     attention_mask = inputs.attention_mask.to('cuda')
     output = model.generate(input_ids, attention_mask=attention_mask)
     return tokenizer.decode(output[0], skip_special_tokens=True)
-classify_tweet('here goes your tweet...')
 ```
 Created by: [Narrativa](https://www.narrativa.com/)

 ## Details of the downstream task (Question Answering) - Dataset 📚
+[TweetQA](hhttps://huggingface.co/datasets/tweet_qa)
+With social media becoming increasingly popular on which lots of news and real-time events are reported, developing automated question answering systems is critical to the effectiveness of many applications that rely on real-time knowledge. While previous question answering (QA) datasets have concentrated on formal text like news and Wikipedia, we present the first large-scale dataset for QA over social media data. To make the tweets are meaningful and contain interesting information, we gather tweets used by journalists to write news articles. We then ask human annotators to write questions and answers upon these tweets. Unlike other QA datasets like SQuAD in which the answers are extractive, we allow the answers to be abstractive. The task requires model to read a short tweet and a question and outputs a text phrase (does not need to be in the tweet) as the answer.
 - Data Instances:
+Sample
 ```json
+{
+    "Question": "who is the tallest host?",
+    "Answer": ["sam bee","sam bee"],
+    "Tweet": "Don't believe @ConanOBrien's height lies. Sam Bee is the tallest host in late night. #alternativefacts\u2014 Full Frontal (@FullFrontalSamB) January 22, 2017",
+    "qid": "3554ee17d86b678be34c4dc2c04e334f"
+}
 ```
 - Data Fields:
+Question: a question based on information from a tweet
+Answer: list of possible answers from the tweet
+Tweet: source tweet
+qid: question id
 ## Model in Action 🚀
 ```python
 from transformers import AutoTokenizer, T5ForConditionalGeneration
+ckpt = 'Narrativa/byt5-base-finetuned-tweet-qa'
 tokenizer = AutoTokenizer.from_pretrained(ckpt)
+model = T5ForConditionalGeneration.from_pretrained(ckpt).to('cuda')
+def get_answer(question, context):
+    input_text = 'question: %s context: %s' % (question, context)
+    inputs = tokenizer([input_text], return_tensors='pt')
     input_ids = inputs.input_ids.to('cuda')
     attention_mask = inputs.attention_mask.to('cuda')
     output = model.generate(input_ids, attention_mask=attention_mask)
     return tokenizer.decode(output[0], skip_special_tokens=True)
+get_answer('here goes your question', 'And here the context/tweet...')
 ```
 Created by: [Narrativa](https://www.narrativa.com/)