unb-lamfo-nlp-mcti
/

NLP-Classification-MCTI

English

Clsssification

science

Model card Files Files and versions

xet

Community

MarcosDib commited on Dec 12, 2022

Commit

2522575

1 Parent(s): a854db5

Update README.md

Browse files

Files changed (1) hide show

README.md +7 -7

README.md CHANGED Viewed

@@ -83,7 +83,7 @@ Other 24 smaller models are released afterward.
 The detailed release history can be found on the [here](https://huggingface.co/unb-lamfo-nlp-mcti) on github.
 | Model                        | #params | Language |
-|:----------------------------:|:-------:|:--------:|
 | [`mcti-base-uncased`]        | 110M    | English  |
 | [`mcti-large-uncased`]       | 340M    | English  | sub
 | [`mcti-base-cased`]          | 110M    | English  |
@@ -91,7 +91,7 @@ The detailed release history can be found on the [here](https://huggingface.co/u
 | [`-base-multilingual-cased`] | 110M    | Multiple |
 | Dataset                              | Compatibility to base* |
-|:------------------------------------:|:----------------------:|
 | Labeled MCTI                         | 100%                   |
 | Full MCTI                            | 100%                   |
 | BBC News Articles                    | 56.77%                 |
@@ -202,13 +202,13 @@ The following assumptions were considered:
 - Preprocessing experiments compare accuracy in a shallow neural network (SNN);
 - Pre-processing was investigated for the classification goal.
-From the Database obtained in Meta 4, stored in the project's [GitHub](github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/scraps-desenvolvimento/Rotulagem/db_PPF_validacao_para%20UNB_%20FINAL.xlsx), a Notebook was developed in [Google Colab](colab.research.google.com)
-to implement the [pre-processing code](github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento.ipynb), which also can be found on the project's GitHub.
 Several Python packages were used to develop the preprocessing code:
 |                         Objective                      |   Package    |
-|:------------------------------------------------------:|:------------:|
 | Resolve contractions and slang usage in text           | [contractions](https://pypi.org/project/contractions) |
 | Natural Language Processing                            | [nltk](https://pypi.org/project/nltk)         |
 | Others data manipulations and calculations included in Python 3.10: io, json, math, re (regular expressions), shutil, time, unicodedata;    | [numpy](https://pypi.org/project/numpy)        |
@@ -224,7 +224,7 @@ As detailed in the notebook on [GitHub](https://github.com/mcti-sefip/mcti-sefip
 bases, derived from the base of goal 4, with the application of the methods shown in Figure 2.
 |  Base  |                   Textos originais                           |
-|:------:|:------------------------------------------------------------:|
 | xp1    | Expandir Contrações                                          |
 | xp2    | Expandir Contrações + Transformar texto em minúsculo         |
 | xp3    | Expandir Contrações + Remover Pontuação                      |
@@ -233,7 +233,7 @@ bases, derived from the base of goal 4, with the application of the methods show
 | xp6    | xp4 + Lematização                                            |
 | xp7    | xp4 + Stemização + Remoção de StopWords                      |
 | xp8    | ap4 + Lematização + Remoção de StopWords                     |
-               Table 2 – Pre-processing methods evaluated
 ### Pretraining

 The detailed release history can be found on the [here](https://huggingface.co/unb-lamfo-nlp-mcti) on github.
 | Model                        | #params | Language |
+|------------------------------|:-------:|:--------:|
 | [`mcti-base-uncased`]        | 110M    | English  |
 | [`mcti-large-uncased`]       | 340M    | English  | sub
 | [`mcti-base-cased`]          | 110M    | English  |
 | [`-base-multilingual-cased`] | 110M    | Multiple |
 | Dataset                              | Compatibility to base* |
+|--------------------------------------|:----------------------:|
 | Labeled MCTI                         | 100%                   |
 | Full MCTI                            | 100%                   |
 | BBC News Articles                    | 56.77%                 |
 - Preprocessing experiments compare accuracy in a shallow neural network (SNN);
 - Pre-processing was investigated for the classification goal.
+From the Database obtained in Meta 4, stored in the project's [GitHub](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/scraps-desenvolvimento/Rotulagem/db_PPF_validacao_para%20UNB_%20FINAL.xlsx), a Notebook was developed in [Google Colab](https://colab.research.google.com)
+to implement the [pre-processing code](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento.ipynb), which also can be found on the project's GitHub.
 Several Python packages were used to develop the preprocessing code:
 |                         Objective                      |   Package    |
+|--------------------------------------------------------|--------------|
 | Resolve contractions and slang usage in text           | [contractions](https://pypi.org/project/contractions) |
 | Natural Language Processing                            | [nltk](https://pypi.org/project/nltk)         |
 | Others data manipulations and calculations included in Python 3.10: io, json, math, re (regular expressions), shutil, time, unicodedata;    | [numpy](https://pypi.org/project/numpy)        |
 bases, derived from the base of goal 4, with the application of the methods shown in Figure 2.
 |  Base  |                   Textos originais                           |
+|--------|--------------------------------------------------------------|
 | xp1    | Expandir Contrações                                          |
 | xp2    | Expandir Contrações + Transformar texto em minúsculo         |
 | xp3    | Expandir Contrações + Remover Pontuação                      |
 | xp6    | xp4 + Lematização                                            |
 | xp7    | xp4 + Stemização + Remoção de StopWords                      |
 | xp8    | ap4 + Lematização + Remoção de StopWords                     |
+Table 2 – Pre-processing methods evaluated
 ### Pretraining