INC Plastic Treaty App

Set up Development

Please install poetry first before executing the following commands

poetry install

pre-commit install
pre-commit

streamlit run src/app/v1/app.py

taxonomy_related data
- data/authors_taxonomy.json: raw countries taxonomy
- data/draft_cat_taxonomy.json: draft cat taxonomy.json
- data/authors_filter.json: processed taxonomy for frontend filtering
- data/draftcat_taxonomy_filter.json: processed taxonomy for frontend filtering
- data/inc_df_v6_small.csv: processed data of scraping
- data/inc_df.csv: data for document Storage
- data/taxonomies.txt: raw taxonomies collection
application related data
- database/document_store.pkl: Document Store
- database/document_store.pkl: Meta data of the document store with countries and draft labs as columns

src/data_processing/document_store_data.py: Generates the data for the document store with last processing steps
src/data_processing/get_meta_data_filter.py: Generates the meta data from document store for the filtering in the frontend
src/data_processing/taxonomy_processing.py: Transforms the taxonomies for the filters in the frontend.

src/app/: versions of app
src/utils: The utils folder contains functions used in the frontend application. Please check the imports to see which functions are used.
styles: css styles for apps
.streamlit: Basic Theme of Frontend
Some Settings and Changes related to the Spaces App have to be done directly in the respective Streamlit App.

Step 1: Delete the database/document_store.pkl and the database/meta_data.csv files or give them a new name if you want to keep them until a successful change of the document storage.
Step 2: Run the script data_processing/document_store_data.py. This changes update the data/inc_df.csv data. If you need to make changes in the data/inc_df_small_v6.csv, they have to made either manually or you have to implement a new script with the changes.
Step 3*: Run the script data_processing/get_meta_data_filter.py. This will save a new document store and the meta_data from the document store.
Step 4: If the changes affect also the taxonomy then you need to update the taxonomies as well. To do so, first update manually data/authors_taxonomy.json and data/draftcat_taxonomy.json. Then run the script src/data_processing/taxonomy_processing.py.
Frequent Bugs after changes of the data:
- new country or draft lab category or changes in the naming. Solution: Check the taxonomy files and update the src/data_processing/taxonomy_processing.py
- countries with special characters. Solution: Check data/inc_df_small_v6.csv if the data_processing/document_store_data.py fails. Check the database/meta_data.pyif the frontend application fails after changes and the taxonomies + filters.

Please always check changes in the app locally before pushing to the respective Spaces App.
To check the Spaces App locally you can clone it like Git Repositories. Do avoid making changes directly in the interface of the Spaces App.
Also if you copy changes from git to Spaces only copy the files where you have made changes. You need to make some adjustment before you copy the files:
- Please remove all OPENAI_KEYS from the app.py file and the the pipeline.py files.
- Make sure you added the following at the OPENAI_API_KEY = os.environ.get("OPEN_API_KEY") in the pipeline.py file
- Remove from app.py and pipeline.py src from imports. Otherwise you will get a ModuleNotFoundError.