unb-lamfo-nlp-mcti
/

NLP-Classification-MCTI

English

Clsssification

science

Model card Files Files and versions Community

BecomeAllan commited on Jan 10, 2023

Commit

273581b

1 Parent(s): 53e3dbf

update links to nlp-mcti-ppf

Browse files

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -170,8 +170,8 @@ The following assumptions were considered:
 - Preprocessing experiments compare accuracy in a shallow neural network (SNN);
 - Pre-processing was investigated for the classification goal.
-From the Database obtained in Goal 4, stored in the project's [GitHub](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/scraps-desenvolvimento/Rotulagem/db_PPF_validacao_para%20UNB_%20FINAL.xlsx), a Notebook was developed in [Google Colab](https://colab.research.google.com)
-to implement the [preprocessing code](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento.ipynb), which also can be found on the project's GitHub.
 Several Python packages were used to develop the preprocessing code:
@@ -189,7 +189,7 @@ Table 3: Python packages used
 | Translation from multiple languages to English         | [translators](https://pypi.org/project/translators)  |
-As detailed in the notebook on [GitHub](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-processamento/Pre_Processamento/MCTI_PPF_Pr%C3%A9_processamento), in the pre-processing, code was created to build and evaluate 8 (eight) different
 bases, derived from the base of goal 4, with the application of the methods shown in table 4.
 Table 4: Preprocessing methods evaluated
@@ -234,7 +234,7 @@ was the computational cost required to train the vector representation models (w
 document-embedding). The training time is so close that it did not have such a large weight for the analysis.
 As the last step, a spreadsheet was generated for the model (xp8) with the fields opo_pre and opo_pre_tkn, containing the
-preprocessed text in sentence format and tokens, respectively. This [database](https://github.com/mcti-sefip/mcti-sefip-ppfcd2020/blob/pre-processamento/Pre_Processamento/oportunidades_final_pre_processado.xlsx) was made
 available on the project's GitHub with the inclusion of columns opo_pre (text) and opo_pre_tkn (tokenized).
 ### Pretraining

 - Preprocessing experiments compare accuracy in a shallow neural network (SNN);
 - Pre-processing was investigated for the classification goal.
+From the Database obtained in Goal 4, stored in the project's [GitHub](https://github.com/mcti-sefip/NLP-MCTI-PPF/blob/main/Data/scrapy/Rotulagem/db_PPF_validacao_para%20UNB_%20FINAL.xlsx), a Notebook was developed in [Google Colab](https://colab.research.google.com)
+to implement the [preprocessing code](https://github.com/mcti-sefip/NLP-MCTI-PPF/blob/main/Pre_Processing/MCTI_PPF_Pr%C3%A9_processamento.ipynb), which also can be found on the project's GitHub.
 Several Python packages were used to develop the preprocessing code:
 | Translation from multiple languages to English         | [translators](https://pypi.org/project/translators)  |
+As detailed in the notebook on [GitHub](https://github.com/mcti-sefip/NLP-MCTI-PPF/blob/main/Pre_Processing/MCTI_PPF_Pr%C3%A9_processamento.ipynb), in the pre-processing, code was created to build and evaluate 8 (eight) different
 bases, derived from the base of goal 4, with the application of the methods shown in table 4.
 Table 4: Preprocessing methods evaluated
 document-embedding). The training time is so close that it did not have such a large weight for the analysis.
 As the last step, a spreadsheet was generated for the model (xp8) with the fields opo_pre and opo_pre_tkn, containing the
+preprocessed text in sentence format and tokens, respectively. This [database](https://github.com/mcti-sefip/NLP-MCTI-PPF/blob/main/Pre_Processing/oportunidades_final_pre_processado.xlsx) was made
 available on the project's GitHub with the inclusion of columns opo_pre (text) and opo_pre_tkn (tokenized).
 ### Pretraining