NegotiateAI / README.md
TeresaK's picture
Upload 35 files
5d4054c verified
|
raw
history blame
4.26 kB

INC Plastic Treaty App

Set up Development

Install poetry environment

Please install poetry first before executing the following commands

poetry install

Install pre-commit hooks

pre-commit install
pre-commit

Start app

  • excute the following command with the respective version
streamlit run src/app/v1/app.py

Data Overview

  • taxonomy_related data
    • data/authors_taxonomy.json: raw countries taxonomy
    • data/draft_cat_taxonomy.json: draft cat taxonomy.json
    • data/authors_filter.json: processed taxonomy for frontend filtering
    • data/draftcat_taxonomy_filter.json: processed taxonomy for frontend filtering
    • data/inc_df_v6_small.csv: processed data of scraping
    • data/inc_df.csv: data for document Storage
    • data/taxonomies.txt: raw taxonomies collection
  • application related data
    • database/document_store.pkl: Document Store
    • database/document_store.pkl: Meta data of the document store with countries and draft labs as columns

Code Structure

Data Preprocessing

  • src/data_processing/document_store_data.py: Generates the data for the document store with last processing steps
  • src/data_processing/get_meta_data_filter.py: Generates the meta data from document store for the filtering in the frontend
  • src/data_processing/taxonomy_processing.py: Transforms the taxonomies for the filters in the frontend.

Frontend

  • src/app/: versions of app
  • src/utils: The utils folder contains functions used in the frontend application. Please check the imports to see which functions are used.
  • styles: css styles for apps
  • .streamlit: Basic Theme of Frontend
  • Some Settings and Changes related to the Spaces App have to be done directly in the respective Streamlit App.

Backend

  • src/document_store/document_store.py: Generates the document store
  • src/rag/pipeline.py: RAG Pipeline
  • src/rag/prompt/prompt_template.yaml: Prompt Template for RAG

Changes Document Storage

  • Step 1: Delete the database/document_store.pkl and the database/meta_data.csv files or give them a new name if you want to keep them until a successful change of the document storage.
  • Step 2: Run the script data_processing/document_store_data.py. This changes update the data/inc_df.csv data. If you need to make changes in the data/inc_df_small_v6.csv, they have to made either manually or you have to implement a new script with the changes.
  • Step 3*: Run the script data_processing/get_meta_data_filter.py. This will save a new document store and the meta_data from the document store.
  • Step 4: If the changes affect also the taxonomy then you need to update the taxonomies as well. To do so, first update manually data/authors_taxonomy.json and data/draftcat_taxonomy.json. Then run the script src/data_processing/taxonomy_processing.py.
  • Frequent Bugs after changes of the data:
    • new country or draft lab category or changes in the naming. Solution: Check the taxonomy files and update the src/data_processing/taxonomy_processing.py
    • countries with special characters. Solution: Check data/inc_df_small_v6.csv if the data_processing/document_store_data.py fails. Check the database/meta_data.pyif the frontend application fails after changes and the taxonomies + filters.

Changes App

  • Please always check changes in the app locally before pushing to the respective Spaces App.
  • To check the Spaces App locally you can clone it like Git Repositories. Do avoid making changes directly in the interface of the Spaces App.
  • Also if you copy changes from git to Spaces only copy the files where you have made changes. You need to make some adjustment before you copy the files:
    • Please remove all OPENAI_KEYS from the app.py file and the the pipeline.py files.
    • Make sure you added the following at the OPENAI_API_KEY = os.environ.get("OPEN_API_KEY") in the pipeline.py file
    • Remove from app.py and pipeline.py src from imports. Otherwise you will get a ModuleNotFoundError.