ctp-slack-bot / README.md
LiKenun's picture
Refactor #6
f0fe0fd
---
title: CTP Slack Bot
emoji: 🦥
colorFrom: red
colorTo: green
sdk: docker
pinned: false
license: mit
short_description: Spring 2025 CTP Slack Bot RAG system
app_port: 8080
---
# CTP Slack Bot
## _Modus Operandi_ in a Nutshell
* Intelligently responds to Slack messages (when mentioned) based on a repository of data.
* Periodically checks for new content to add to its repository.
## How to Run the Application
You need to configure it first. This is done via environment variables, or an `.env` file based on the template, `.env.template`.
Obtaining the values requires setting up API tokens/secrets with:
* Slack: for `slack_bot_token` and `slack_app_token`
* MongoDB: for `mongodb_uri`
* OpenAI: for `openai_api_key`
* Google Drive: for `google_project_id`, `google_client_id`, `google_client_email`, `google_private_key_id`, and `google_private_key`
* For Google Drive, set up a service account. It’s the only supported authentication type.
### Normally
Just run the Docker image. 😉
Build it with:
```sh
docker build . -t ctp-slack-bot
```
Run it with:
```sh
docker run --volume ./logs:/data --env-file=.env -p 8000:8000 --name my-ctp-slack-bot-instance ctp-slack-bot
```
### For Development
Development usually requires rapid iteration. That means a change in the code ought to be reflected as soon as possible in the behavior of the application.
First, make sure you are set up with a Python virtual environment created by the Python `venv` module and that it’s activated. Then install dependencies from `pyproject.toml` within the environment using:
```sh
pip3 install -e .
```
Make a copy of `.env.template` as `.env` and define the environment variables. (You can also define them by other means, but this has the least friction.) This file should not be committed and is excluded by `.gitignore`!
If `localhost` port `8080` is free, running the following will make the application available on that port:
```sh
scripts/run-dev.sh
```
Visiting http://localhost:8080/health will return HTTP status OK and a payload containing the health status of individual components if everything is working.
## Tech Stack
* Hugging Face Spaces for hosting
* OpenAI for embeddings and language models
* Google Drive for reference data (i.e., the material to be incorporated into the bot’s knowledge base)
* MongoDB for data persistence
* Docker for containerization
* Python
* Slack Bolt client for interfacing with Slack
* See `pyproject.toml` for additional Python packages.
## General Project Structure
Not every file or folder is listed, but the important stuff is here.
* `src/`
* `ctp_slack_bot/`
* `core/`: fundamental components like configuration (using pydantic), logging setup (loguru), and custom exceptions
* `config.py`: application settings model
* `db/`: data connection and interface logic
* `repositories/`: data collection/table interface logic
* `mongo_db_vectorized_repository_base.py`: base implementation of a repository corresponding to a MongoDB collection with a search index
* `vectorized_chunk_repository.py`: repository interface for `VectorizedChunk`s
* `models/`: data models
* `mime_type_handlers`: parsers for converting bytes of different MIME types to `Chunk`s
* `services/`: business logic
* `answer_retrieval_service.py`: obtains an answer to a question from a language model using relevant context
* `application_health_service.py`: collects the health status of the application components
* `content_ingestion_service.py`: converts content into chunks and stores them into the database
* `context_retrieval_service.py`: queries for relevant context from the database to answer a question
* `embeddings_model_service.py`: converts text to embeddings
* `event_brokerage_service.py`: brokers events between decoupled components
* `google_drive_service.py`: interfaces with Google Drive
* `language_model_service.py`: answers questions using relevant context
* `question_dispatch_service.py`: listens for questions and retrieves relevant context to get answers
* `task_service.py`: runs periodic background tasks
* `slack_service.py`: handles events from Slack and sends back responses
* `vectorization_service.py`: converts chunks into chunks with embeddings
* `tasks/`: scheduled tasks to run in the background
* `utils/`: reusable utilities
* `app.py`: application entry point
* `containers.py`: the dependency injection container
* `tests/`: unit tests
* `scripts/`: utility scripts for development, deployment, etc.
* `run-dev.sh`: script to run the application locally
* `notebooks/`: Jupyter notebooks for exploration and model development
* `.env`: local environment variables for development purposes (to be created for local use only from `.env.template`)
* `Dockerfile`: Docker container build definition
* `pyproject.toml`: project definition and dependencies