ctp-slack-bot / README.md
LiKenun's picture
Refactor #6
f0fe0fd
metadata
title: CTP Slack Bot
emoji: 🦥
colorFrom: red
colorTo: green
sdk: docker
pinned: false
license: mit
short_description: Spring 2025 CTP Slack Bot RAG system
app_port: 8080

CTP Slack Bot

Modus Operandi in a Nutshell

  • Intelligently responds to Slack messages (when mentioned) based on a repository of data.
  • Periodically checks for new content to add to its repository.

How to Run the Application

You need to configure it first. This is done via environment variables, or an .env file based on the template, .env.template.

Obtaining the values requires setting up API tokens/secrets with:

  • Slack: for slack_bot_token and slack_app_token
  • MongoDB: for mongodb_uri
  • OpenAI: for openai_api_key
  • Google Drive: for google_project_id, google_client_id, google_client_email, google_private_key_id, and google_private_key
    • For Google Drive, set up a service account. It’s the only supported authentication type.

Normally

Just run the Docker image. 😉

Build it with:

docker build . -t ctp-slack-bot

Run it with:

docker run --volume ./logs:/data --env-file=.env -p 8000:8000 --name my-ctp-slack-bot-instance ctp-slack-bot

For Development

Development usually requires rapid iteration. That means a change in the code ought to be reflected as soon as possible in the behavior of the application.

First, make sure you are set up with a Python virtual environment created by the Python venv module and that it’s activated. Then install dependencies from pyproject.toml within the environment using:

pip3 install -e .

Make a copy of .env.template as .env and define the environment variables. (You can also define them by other means, but this has the least friction.) This file should not be committed and is excluded by .gitignore!

If localhost port 8080 is free, running the following will make the application available on that port:

scripts/run-dev.sh

Visiting http://localhost:8080/health will return HTTP status OK and a payload containing the health status of individual components if everything is working.

Tech Stack

  • Hugging Face Spaces for hosting
  • OpenAI for embeddings and language models
  • Google Drive for reference data (i.e., the material to be incorporated into the bot’s knowledge base)
  • MongoDB for data persistence
  • Docker for containerization
  • Python
    • Slack Bolt client for interfacing with Slack
    • See pyproject.toml for additional Python packages.

General Project Structure

Not every file or folder is listed, but the important stuff is here.

  • src/
    • ctp_slack_bot/
      • core/: fundamental components like configuration (using pydantic), logging setup (loguru), and custom exceptions
        • config.py: application settings model
      • db/: data connection and interface logic
        • repositories/: data collection/table interface logic
          • mongo_db_vectorized_repository_base.py: base implementation of a repository corresponding to a MongoDB collection with a search index
          • vectorized_chunk_repository.py: repository interface for VectorizedChunks
      • models/: data models
      • mime_type_handlers: parsers for converting bytes of different MIME types to Chunks
      • services/: business logic
        • answer_retrieval_service.py: obtains an answer to a question from a language model using relevant context
        • application_health_service.py: collects the health status of the application components
        • content_ingestion_service.py: converts content into chunks and stores them into the database
        • context_retrieval_service.py: queries for relevant context from the database to answer a question
        • embeddings_model_service.py: converts text to embeddings
        • event_brokerage_service.py: brokers events between decoupled components
        • google_drive_service.py: interfaces with Google Drive
        • language_model_service.py: answers questions using relevant context
        • question_dispatch_service.py: listens for questions and retrieves relevant context to get answers
        • task_service.py: runs periodic background tasks
        • slack_service.py: handles events from Slack and sends back responses
        • vectorization_service.py: converts chunks into chunks with embeddings
      • tasks/: scheduled tasks to run in the background
      • utils/: reusable utilities
      • app.py: application entry point
      • containers.py: the dependency injection container
  • tests/: unit tests
  • scripts/: utility scripts for development, deployment, etc.
    • run-dev.sh: script to run the application locally
  • notebooks/: Jupyter notebooks for exploration and model development
  • .env: local environment variables for development purposes (to be created for local use only from .env.template)
  • Dockerfile: Docker container build definition
  • pyproject.toml: project definition and dependencies