Commit
·
2e1dca3
1
Parent(s):
83312d5
Update README.md
Browse files
README.md
CHANGED
@@ -23,7 +23,7 @@ Authors: *Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang,
|
|
23 |
|
24 |
## Details of the downstream task (Question Answering) - Dataset 📚
|
25 |
|
26 |
-
[TweetQA](
|
27 |
|
28 |
|
29 |
With social media becoming increasingly more popular, lots of news and real-time events are being covered. Developing automated question answering systems is critical to the effectiveness of many applications that rely on real-time knowledge. While previous question answering (QA) datasets have focused on formal text such as news and Wikipedia, we present the first large-scale dataset for QA over social media data. To make sure that the tweets are meaningful and contain interesting information, we gather tweets used by journalists to write news articles. We then ask human annotators to write questions and answers upon these tweets. Unlike other QA datasets like SQuAD (in which the answers are extractive), we allow the answers to be abstractive. The task requires the model to read a short tweet and a question and outputs a text phrase (does not need to be in the tweet) as the answer.
|
@@ -36,7 +36,7 @@ Sample
|
|
36 |
{
|
37 |
"Question": "who is the tallest host?",
|
38 |
"Answer": ["sam bee","sam bee"],
|
39 |
-
"Tweet": "Don't believe @ConanOBrien's height lies. Sam Bee is the tallest host in late night. #alternativefacts
|
40 |
"qid": "3554ee17d86b678be34c4dc2c04e334f"
|
41 |
}
|
42 |
```
|
@@ -77,7 +77,7 @@ def get_answer(question, context):
|
|
77 |
return tokenizer.decode(output[0], skip_special_tokens=True)
|
78 |
|
79 |
|
80 |
-
context = "Don't believe @ConanOBrien's height lies. Sam Bee is the tallest host in late night. #alternativefacts
|
81 |
question = "who is the tallest host?"
|
82 |
|
83 |
get_answer(question, context)
|
|
|
23 |
|
24 |
## Details of the downstream task (Question Answering) - Dataset 📚
|
25 |
|
26 |
+
[TweetQA](https://huggingface.co/datasets/tweet_qa)
|
27 |
|
28 |
|
29 |
With social media becoming increasingly more popular, lots of news and real-time events are being covered. Developing automated question answering systems is critical to the effectiveness of many applications that rely on real-time knowledge. While previous question answering (QA) datasets have focused on formal text such as news and Wikipedia, we present the first large-scale dataset for QA over social media data. To make sure that the tweets are meaningful and contain interesting information, we gather tweets used by journalists to write news articles. We then ask human annotators to write questions and answers upon these tweets. Unlike other QA datasets like SQuAD (in which the answers are extractive), we allow the answers to be abstractive. The task requires the model to read a short tweet and a question and outputs a text phrase (does not need to be in the tweet) as the answer.
|
|
|
36 |
{
|
37 |
"Question": "who is the tallest host?",
|
38 |
"Answer": ["sam bee","sam bee"],
|
39 |
+
"Tweet": "Don't believe @ConanOBrien's height lies. Sam Bee is the tallest host in late night. #alternativefacts\\\\\\\\\\\\\\\\u2014 Full Frontal (@FullFrontalSamB) January 22, 2017",
|
40 |
"qid": "3554ee17d86b678be34c4dc2c04e334f"
|
41 |
}
|
42 |
```
|
|
|
77 |
return tokenizer.decode(output[0], skip_special_tokens=True)
|
78 |
|
79 |
|
80 |
+
context = "Don't believe @ConanOBrien's height lies. Sam Bee is the tallest host in late night. #alternativefacts\\\\\\\\\\\\\\\\u2014 Full Frontal (@FullFrontalSamB) January 22, 2017"
|
81 |
question = "who is the tallest host?"
|
82 |
|
83 |
get_answer(question, context)
|