Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
INC Plastic Treaty App
Set up Development
Install poetry environment
Please install poetry first before executing the following commands
poetry install
Install pre-commit hooks
pre-commit install
pre-commit
Start app
- excute the following command with the respective version
streamlit run src/app/v1/app.py
Data Overview
- taxonomy_related data
data/authors_taxonomy.json
: raw countries taxonomydata/draft_cat_taxonomy.json
: draft cat taxonomy.jsondata/authors_filter.json
: processed taxonomy for frontend filteringdata/draftcat_taxonomy_filter.json
: processed taxonomy for frontend filteringdata/inc_df_v6_small.csv
: processed data of scrapingdata/inc_df.csv
: data for document Storagedata/taxonomies.txt
: raw taxonomies collection
- application related data
database/document_store.pkl
: Document Storedatabase/document_store.pkl
: Meta data of the document store with countries and draft labs as columns
Code Structure
Data Preprocessing
src/data_processing/document_store_data.py
: Generates the data for the document store with last processing stepssrc/data_processing/get_meta_data_filter.py
: Generates the meta data from document store for the filtering in the frontendsrc/data_processing/taxonomy_processing.py
: Transforms the taxonomies for the filters in the frontend.
Frontend
src/app/
: versions of appsrc/utils
: The utils folder contains functions used in the frontend application. Please check the imports to see which functions are used.styles
: css styles for apps.streamlit
: Basic Theme of Frontend- Some Settings and Changes related to the Spaces App have to be done directly in the respective Streamlit App.
Backend
src/document_store/document_store.py
: Generates the document storesrc/rag/pipeline.py
: RAG Pipelinesrc/rag/prompt/prompt_template.yaml
: Prompt Template for RAG
Changes Document Storage
- Step 1: Delete the
database/document_store.pkl
and thedatabase/meta_data.csv
files or give them a new name if you want to keep them until a successful change of the document storage. - Step 2: Run the script
data_processing/document_store_data.py
. This changes update thedata/inc_df.csv
data. If you need to make changes in thedata/inc_df_small_v6.csv
, they have to made either manually or you have to implement a new script with the changes. - Step 3*: Run the script
data_processing/get_meta_data_filter.py
. This will save a new document store and the meta_data from the document store. - Step 4: If the changes affect also the taxonomy then you need to update the taxonomies as well. To do so, first update manually
data/authors_taxonomy.json
anddata/draftcat_taxonomy.json
. Then run the scriptsrc/data_processing/taxonomy_processing.py
. - Frequent Bugs after changes of the data:
- new country or draft lab category or changes in the naming. Solution: Check the taxonomy files and update the
src/data_processing/taxonomy_processing.py
- countries with special characters. Solution: Check
data/inc_df_small_v6.csv
if thedata_processing/document_store_data.py
fails. Check thedatabase/meta_data.py
if the frontend application fails after changes and the taxonomies + filters.
- new country or draft lab category or changes in the naming. Solution: Check the taxonomy files and update the
Changes App
- Please always check changes in the app locally before pushing to the respective Spaces App.
- To check the Spaces App locally you can clone it like Git Repositories. Do avoid making changes directly in the interface of the Spaces App.
- Also if you copy changes from git to Spaces only copy the files where you have made changes. You need to make some adjustment before you copy the files:
- Please remove all OPENAI_KEYS from the app.py file and the the pipeline.py files.
- Make sure you added the following at the
OPENAI_API_KEY = os.environ.get("OPEN_API_KEY")
in the pipeline.py file - Remove from app.py and pipeline.py
src
from imports. Otherwise you will get a ModuleNotFoundError.