Commit
·
273581b
1
Parent(s):
53e3dbf
update links to nlp-mcti-ppf
Browse files
README.md
CHANGED
@@ -170,8 +170,8 @@ The following assumptions were considered:
|
|
170 |
- Preprocessing experiments compare accuracy in a shallow neural network (SNN);
|
171 |
- Pre-processing was investigated for the classification goal.
|
172 |
|
173 |
-
From the Database obtained in Goal 4, stored in the project's [GitHub](https://github.com/mcti-sefip/
|
174 |
-
to implement the [preprocessing code](https://github.com/mcti-sefip/
|
175 |
|
176 |
Several Python packages were used to develop the preprocessing code:
|
177 |
|
@@ -189,7 +189,7 @@ Table 3: Python packages used
|
|
189 |
| Translation from multiple languages to English | [translators](https://pypi.org/project/translators) |
|
190 |
|
191 |
|
192 |
-
As detailed in the notebook on [GitHub](https://github.com/mcti-sefip/
|
193 |
bases, derived from the base of goal 4, with the application of the methods shown in table 4.
|
194 |
|
195 |
Table 4: Preprocessing methods evaluated
|
@@ -234,7 +234,7 @@ was the computational cost required to train the vector representation models (w
|
|
234 |
document-embedding). The training time is so close that it did not have such a large weight for the analysis.
|
235 |
|
236 |
As the last step, a spreadsheet was generated for the model (xp8) with the fields opo_pre and opo_pre_tkn, containing the
|
237 |
-
preprocessed text in sentence format and tokens, respectively. This [database](https://github.com/mcti-sefip/
|
238 |
available on the project's GitHub with the inclusion of columns opo_pre (text) and opo_pre_tkn (tokenized).
|
239 |
|
240 |
### Pretraining
|
|
|
170 |
- Preprocessing experiments compare accuracy in a shallow neural network (SNN);
|
171 |
- Pre-processing was investigated for the classification goal.
|
172 |
|
173 |
+
From the Database obtained in Goal 4, stored in the project's [GitHub](https://github.com/mcti-sefip/NLP-MCTI-PPF/blob/main/Data/scrapy/Rotulagem/db_PPF_validacao_para%20UNB_%20FINAL.xlsx), a Notebook was developed in [Google Colab](https://colab.research.google.com)
|
174 |
+
to implement the [preprocessing code](https://github.com/mcti-sefip/NLP-MCTI-PPF/blob/main/Pre_Processing/MCTI_PPF_Pr%C3%A9_processamento.ipynb), which also can be found on the project's GitHub.
|
175 |
|
176 |
Several Python packages were used to develop the preprocessing code:
|
177 |
|
|
|
189 |
| Translation from multiple languages to English | [translators](https://pypi.org/project/translators) |
|
190 |
|
191 |
|
192 |
+
As detailed in the notebook on [GitHub](https://github.com/mcti-sefip/NLP-MCTI-PPF/blob/main/Pre_Processing/MCTI_PPF_Pr%C3%A9_processamento.ipynb), in the pre-processing, code was created to build and evaluate 8 (eight) different
|
193 |
bases, derived from the base of goal 4, with the application of the methods shown in table 4.
|
194 |
|
195 |
Table 4: Preprocessing methods evaluated
|
|
|
234 |
document-embedding). The training time is so close that it did not have such a large weight for the analysis.
|
235 |
|
236 |
As the last step, a spreadsheet was generated for the model (xp8) with the fields opo_pre and opo_pre_tkn, containing the
|
237 |
+
preprocessed text in sentence format and tokens, respectively. This [database](https://github.com/mcti-sefip/NLP-MCTI-PPF/blob/main/Pre_Processing/oportunidades_final_pre_processado.xlsx) was made
|
238 |
available on the project's GitHub with the inclusion of columns opo_pre (text) and opo_pre_tkn (tokenized).
|
239 |
|
240 |
### Pretraining
|