Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Commit
·
1a411ea
1
Parent(s):
de3b367
change base datasets links to the dataset original paper
Browse files- tasks_config/pt_config.yaml +18 -18
tasks_config/pt_config.yaml
CHANGED
|
@@ -62,8 +62,8 @@ tasks:
|
|
| 62 |
level exam widely applied every year by the Brazilian government to students that
|
| 63 |
wish to undertake a University degree. This dataset contains 1,430 questions that don't require
|
| 64 |
image understanding of the exams from 2010 to 2018, 2022 and 2023."
|
| 65 |
-
link: https://
|
| 66 |
-
sources: ["https://www.ime.usp.br/~ddm/project/enem/", "https://github.com/piresramon/gpt-4-enem", "https://huggingface.co/datasets/maritaca-ai/enem"]
|
| 67 |
baseline_sources: ["https://www.sejalguem.com/enem", "https://vestibular.brasilescola.uol.com.br/enem/confira-as-medias-e-notas-maximas-e-minimas-do-enem-2020/349732.html"]
|
| 68 |
bluex:
|
| 69 |
benchmark: bluex
|
|
@@ -81,8 +81,8 @@ tasks:
|
|
| 81 |
description: "BLUEX is a multimodal dataset consisting of the two leading
|
| 82 |
university entrance exams conducted in Brazil: Convest (Unicamp) and Fuvest (USP),
|
| 83 |
spanning from 2018 to 2024. The benchmark comprises of 724 questions that do not have accompanying images"
|
| 84 |
-
link: https://
|
| 85 |
-
sources: ["https://github.com/portuguese-benchmark-datasets/bluex", "https://huggingface.co/datasets/portuguese-benchmark-datasets/BLUEX"]
|
| 86 |
baseline_sources: ["https://www.comvest.unicamp.br/wp-content/uploads/2023/08/Relatorio_F1_2023.pdf", "https://acervo.fuvest.br/fuvest/2018/FUVEST_2018_indice_discriminacao_1_fase_ins.pdf"]
|
| 87 |
oab_exams:
|
| 88 |
benchmark: oab_exams
|
|
@@ -104,8 +104,8 @@ tasks:
|
|
| 104 |
expert_human_baseline: 75.0
|
| 105 |
description: OAB Exams is a dataset of more than 2,000 questions from the Brazilian Bar
|
| 106 |
Association's exams, from 2010 to 2018.
|
| 107 |
-
link: https://
|
| 108 |
-
sources: ["https://github.com/legal-nlp/oab-exams"]
|
| 109 |
baseline_sources: ["http://fgvprojetos.fgv.br/publicacao/exame-de-ordem-em-numeros", "http://fgvprojetos.fgv.br/publicacao/exame-de-ordem-em-numeros-vol2", "http://fgvprojetos.fgv.br/publicacao/exame-de-ordem-em-numeros-vol3"]
|
| 110 |
assin2_rte:
|
| 111 |
benchmark: assin2_rte
|
|
@@ -124,8 +124,8 @@ tasks:
|
|
| 124 |
of Portuguese. Recognising Textual Entailment (RTE), also called Natural Language
|
| 125 |
Inference (NLI), is the task of predicting if a given text (premise) entails (implies) in
|
| 126 |
other text (hypothesis)."
|
| 127 |
-
link: https://
|
| 128 |
-
sources: ["https://sites.google.com/view/assin2/", "https://huggingface.co/datasets/assin2"]
|
| 129 |
assin2_sts:
|
| 130 |
benchmark: assin2_sts
|
| 131 |
col_name: ASSIN2 STS
|
|
@@ -139,8 +139,8 @@ tasks:
|
|
| 139 |
expert_human_baseline: null
|
| 140 |
description: "Same as dataset as above. Semantic Textual Similarity (STS)
|
| 141 |
‘measures the degree of semantic equivalence between two sentences’."
|
| 142 |
-
link: https://
|
| 143 |
-
sources: ["https://sites.google.com/view/assin2/", "https://huggingface.co/datasets/assin2"]
|
| 144 |
faquad_nli:
|
| 145 |
benchmark: faquad_nli
|
| 146 |
col_name: FAQUAD NLI
|
|
@@ -161,8 +161,8 @@ tasks:
|
|
| 161 |
Brazilian higher education system. FaQuAD-NLI is a modified version of the
|
| 162 |
FaQuAD dataset that repurposes the question answering task as a textual
|
| 163 |
entailment task between a question and its possible answers."
|
| 164 |
-
link: https://
|
| 165 |
-
sources: ["https://github.com/liafacom/faquad/"]
|
| 166 |
hatebr_offensive:
|
| 167 |
benchmark: hatebr_offensive
|
| 168 |
col_name: HateBR Offensive
|
|
@@ -178,8 +178,8 @@ tasks:
|
|
| 178 |
on the web and social media. The HateBR was collected from Brazilian Instagram comments of politicians and manually annotated
|
| 179 |
by specialists. It is composed of 7,000 documents annotated with a binary classification (offensive
|
| 180 |
versus non-offensive comments)."
|
| 181 |
-
link: https://
|
| 182 |
-
sources: ["https://github.com/franciellevargas/HateBR", "https://huggingface.co/datasets/
|
| 183 |
portuguese_hate_speech:
|
| 184 |
benchmark: portuguese_hate_speech
|
| 185 |
col_name: PT Hate Speech
|
|
@@ -192,8 +192,8 @@ tasks:
|
|
| 192 |
human_baseline: null
|
| 193 |
expert_human_baseline: null
|
| 194 |
description: "Portuguese dataset for hate speech detection composed of 5,668 tweets with binary annotations (i.e. 'hate' vs. 'no-hate')"
|
| 195 |
-
link: https://
|
| 196 |
-
sources: ["https://github.com/paulafortuna/Portuguese-Hate-Speech-Dataset", "https://huggingface.co/datasets/hate_speech_portuguese"]
|
| 197 |
tweetsentbr:
|
| 198 |
benchmark: tweetsentbr
|
| 199 |
col_name: tweetSentBR
|
|
@@ -209,6 +209,6 @@ tasks:
|
|
| 209 |
It was labeled by several annotators following steps stablished on the literature for
|
| 210 |
improving reliability on the task of Sentiment Analysis. Each Tweet was annotated
|
| 211 |
in one of the three following classes: Positive, Negative, Neutral."
|
| 212 |
-
link: https://
|
| 213 |
-
sources: ["https://bitbucket.org/HBrum/tweetsentbr"
|
| 214 |
|
|
|
|
| 62 |
level exam widely applied every year by the Brazilian government to students that
|
| 63 |
wish to undertake a University degree. This dataset contains 1,430 questions that don't require
|
| 64 |
image understanding of the exams from 2010 to 2018, 2022 and 2023."
|
| 65 |
+
link: https://www.ime.usp.br/~ddm/project/enem/ENEM-GuidingTest.pdf
|
| 66 |
+
sources: ["https://huggingface.co/datasets/eduagarcia/enem_challenge", "https://www.ime.usp.br/~ddm/project/enem/", "https://github.com/piresramon/gpt-4-enem", "https://huggingface.co/datasets/maritaca-ai/enem"]
|
| 67 |
baseline_sources: ["https://www.sejalguem.com/enem", "https://vestibular.brasilescola.uol.com.br/enem/confira-as-medias-e-notas-maximas-e-minimas-do-enem-2020/349732.html"]
|
| 68 |
bluex:
|
| 69 |
benchmark: bluex
|
|
|
|
| 81 |
description: "BLUEX is a multimodal dataset consisting of the two leading
|
| 82 |
university entrance exams conducted in Brazil: Convest (Unicamp) and Fuvest (USP),
|
| 83 |
spanning from 2018 to 2024. The benchmark comprises of 724 questions that do not have accompanying images"
|
| 84 |
+
link: https://arxiv.org/abs/2307.05410
|
| 85 |
+
sources: ["https://huggingface.co/datasets/eduagarcia-temp/BLUEX_without_images", "https://github.com/portuguese-benchmark-datasets/bluex", "https://huggingface.co/datasets/portuguese-benchmark-datasets/BLUEX"]
|
| 86 |
baseline_sources: ["https://www.comvest.unicamp.br/wp-content/uploads/2023/08/Relatorio_F1_2023.pdf", "https://acervo.fuvest.br/fuvest/2018/FUVEST_2018_indice_discriminacao_1_fase_ins.pdf"]
|
| 87 |
oab_exams:
|
| 88 |
benchmark: oab_exams
|
|
|
|
| 104 |
expert_human_baseline: 75.0
|
| 105 |
description: OAB Exams is a dataset of more than 2,000 questions from the Brazilian Bar
|
| 106 |
Association's exams, from 2010 to 2018.
|
| 107 |
+
link: https://arxiv.org/abs/1712.05128
|
| 108 |
+
sources: ["https://huggingface.co/datasets/eduagarcia/oab_exams", "https://github.com/legal-nlp/oab-exams"]
|
| 109 |
baseline_sources: ["http://fgvprojetos.fgv.br/publicacao/exame-de-ordem-em-numeros", "http://fgvprojetos.fgv.br/publicacao/exame-de-ordem-em-numeros-vol2", "http://fgvprojetos.fgv.br/publicacao/exame-de-ordem-em-numeros-vol3"]
|
| 110 |
assin2_rte:
|
| 111 |
benchmark: assin2_rte
|
|
|
|
| 124 |
of Portuguese. Recognising Textual Entailment (RTE), also called Natural Language
|
| 125 |
Inference (NLI), is the task of predicting if a given text (premise) entails (implies) in
|
| 126 |
other text (hypothesis)."
|
| 127 |
+
link: https://dl.acm.org/doi/abs/10.1007/978-3-030-41505-1_39
|
| 128 |
+
sources: ["https://huggingface.co/datasets/eduagarcia/portuguese_benchmark", "https://sites.google.com/view/assin2/", "https://huggingface.co/datasets/assin2"]
|
| 129 |
assin2_sts:
|
| 130 |
benchmark: assin2_sts
|
| 131 |
col_name: ASSIN2 STS
|
|
|
|
| 139 |
expert_human_baseline: null
|
| 140 |
description: "Same as dataset as above. Semantic Textual Similarity (STS)
|
| 141 |
‘measures the degree of semantic equivalence between two sentences’."
|
| 142 |
+
link: https://dl.acm.org/doi/abs/10.1007/978-3-030-41505-1_39
|
| 143 |
+
sources: ["https://huggingface.co/datasets/eduagarcia/portuguese_benchmark", "https://sites.google.com/view/assin2/", "https://huggingface.co/datasets/assin2"]
|
| 144 |
faquad_nli:
|
| 145 |
benchmark: faquad_nli
|
| 146 |
col_name: FAQUAD NLI
|
|
|
|
| 161 |
Brazilian higher education system. FaQuAD-NLI is a modified version of the
|
| 162 |
FaQuAD dataset that repurposes the question answering task as a textual
|
| 163 |
entailment task between a question and its possible answers."
|
| 164 |
+
link: https://ieeexplore.ieee.org/abstract/document/8923668
|
| 165 |
+
sources: ["https://github.com/liafacom/faquad/", "https://huggingface.co/datasets/ruanchaves/faquad-nli"]
|
| 166 |
hatebr_offensive:
|
| 167 |
benchmark: hatebr_offensive
|
| 168 |
col_name: HateBR Offensive
|
|
|
|
| 178 |
on the web and social media. The HateBR was collected from Brazilian Instagram comments of politicians and manually annotated
|
| 179 |
by specialists. It is composed of 7,000 documents annotated with a binary classification (offensive
|
| 180 |
versus non-offensive comments)."
|
| 181 |
+
link: https://arxiv.org/abs/2103.14972
|
| 182 |
+
sources: ["https://huggingface.co/datasets/eduagarcia/portuguese_benchmark", "https://github.com/franciellevargas/HateBR", "https://huggingface.co/datasets/ruanchaves/hatebr"]
|
| 183 |
portuguese_hate_speech:
|
| 184 |
benchmark: portuguese_hate_speech
|
| 185 |
col_name: PT Hate Speech
|
|
|
|
| 192 |
human_baseline: null
|
| 193 |
expert_human_baseline: null
|
| 194 |
description: "Portuguese dataset for hate speech detection composed of 5,668 tweets with binary annotations (i.e. 'hate' vs. 'no-hate')"
|
| 195 |
+
link: https://aclanthology.org/W19-3510/
|
| 196 |
+
sources: ["https://huggingface.co/datasets/eduagarcia/portuguese_benchmark", "https://github.com/paulafortuna/Portuguese-Hate-Speech-Dataset", "https://huggingface.co/datasets/hate_speech_portuguese"]
|
| 197 |
tweetsentbr:
|
| 198 |
benchmark: tweetsentbr
|
| 199 |
col_name: tweetSentBR
|
|
|
|
| 209 |
It was labeled by several annotators following steps stablished on the literature for
|
| 210 |
improving reliability on the task of Sentiment Analysis. Each Tweet was annotated
|
| 211 |
in one of the three following classes: Positive, Negative, Neutral."
|
| 212 |
+
link: https://arxiv.org/abs/1712.08917
|
| 213 |
+
sources: ["https://bitbucket.org/HBrum/tweetsentbr"]
|
| 214 |
|