File size: 5,134 Bytes
5fd100c
 
 
906b3be
 
5fd100c
 
906b3be
5fd100c
775cc8d
5fd100c
 
 
005a292
 
c6a2a56
 
3da2136
c6a2a56
 
6532466
 
86644e7
 
 
 
bb7c9a3
 
 
 
86644e7
 
6532466
 
 
 
b9c8796
 
 
 
 
 
 
 
 
1fd6030
b9c8796
 
6532466
 
 
 
 
 
 
 
 
 
64566ca
 
bb7c9a3
6532466
 
 
 
 
bb7c9a3
 
3da2136
6532466
3da2136
 
 
 
 
 
 
 
6532466
3da2136
64566ca
bb7c9a3
 
3da2136
 
 
a1a6d79
bb7c9a3
 
a1a6d79
 
bb7c9a3
a1a6d79
3da2136
 
bb7c9a3
3da2136
 
 
 
a1a6d79
3da2136
 
f0fe0fd
3da2136
 
bb7c9a3
3da2136
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
title: CTP Slack Bot
emoji: 🦥
colorFrom: red
colorTo: green
sdk: docker
pinned: false
license: mit
short_description: Spring 2025 CTP Slack Bot RAG system
app_port: 8080
---


# CTP Slack Bot

## _Modus Operandi_ in a Nutshell

* Intelligently responds to Slack messages (when mentioned) based on a repository of data.
* Periodically checks for new content to add to its repository.

## How to Run the Application

You need to configure it first. This is done via environment variables, or an `.env` file based on the template, `.env.template`.

Obtaining the values requires setting up API tokens/secrets with:

* Slack: for `slack_bot_token` and `slack_app_token`
* MongoDB: for `mongodb_uri`
* OpenAI: for `openai_api_key`
* Google Drive: for `google_project_id`, `google_client_id`, `google_client_email`, `google_private_key_id`, and `google_private_key`
    * For Google Drive, set up a service account. It’s the only supported authentication type.

### Normally

Just run the Docker image. 😉

Build it with:

```sh
docker build . -t ctp-slack-bot
```

Run it with:

```sh
docker run --volume ./logs:/data --env-file=.env -p 8000:8000 --name my-ctp-slack-bot-instance ctp-slack-bot
```

### For Development

Development usually requires rapid iteration. That means a change in the code ought to be reflected as soon as possible in the behavior of the application.

First, make sure you are set up with a Python virtual environment created by the Python `venv` module and that it’s activated. Then install dependencies from `pyproject.toml` within the environment using:

```sh
pip3 install -e .
```

Make a copy of `.env.template` as `.env` and define the environment variables. (You can also define them by other means, but this has the least friction.) This file should not be committed and is excluded by `.gitignore`!

If `localhost` port `8080` is free, running the following will make the application available on that port:

```sh
scripts/run-dev.sh
```

Visiting http://localhost:8080/health will return HTTP status OK and a payload containing the health status of individual components if everything is working.

## Tech Stack

* Hugging Face Spaces for hosting
* OpenAI for embeddings and language models
* Google Drive for reference data (i.e., the material to be incorporated into the bot’s knowledge base)
* MongoDB for data persistence
* Docker for containerization
* Python
    * Slack Bolt client for interfacing with Slack
    * See `pyproject.toml` for additional Python packages.

## General Project Structure

Not every file or folder is listed, but the important stuff is here.

* `src/`
    * `ctp_slack_bot/`
        * `core/`: fundamental components like configuration (using pydantic), logging setup (loguru), and custom exceptions
            * `config.py`: application settings model
        * `db/`: data connection and interface logic
            * `repositories/`: data collection/table interface logic
                * `mongo_db_vectorized_repository_base.py`: base implementation of a repository corresponding to a MongoDB collection with a search index
                * `vectorized_chunk_repository.py`: repository interface for `VectorizedChunk`s
        * `models/`: data models
        * `mime_type_handlers`: parsers for converting bytes of different MIME types to `Chunk`s
        * `services/`: business logic
            * `answer_retrieval_service.py`: obtains an answer to a question from a language model using relevant context
            * `application_health_service.py`: collects the health status of the application components
            * `content_ingestion_service.py`: converts content into chunks and stores them into the database
            * `context_retrieval_service.py`: queries for relevant context from the database to answer a question
            * `embeddings_model_service.py`: converts text to embeddings
            * `event_brokerage_service.py`: brokers events between decoupled components
            * `google_drive_service.py`: interfaces with Google Drive
            * `language_model_service.py`: answers questions using relevant context
            * `question_dispatch_service.py`: listens for questions and retrieves relevant context to get answers
            * `task_service.py`: runs periodic background tasks
            * `slack_service.py`: handles events from Slack and sends back responses
            * `vectorization_service.py`: converts chunks into chunks with embeddings
        * `tasks/`: scheduled tasks to run in the background
        * `utils/`: reusable utilities
        * `app.py`: application entry point
        * `containers.py`: the dependency injection container
* `tests/`: unit tests
* `scripts/`: utility scripts for development, deployment, etc.
    * `run-dev.sh`: script to run the application locally
* `notebooks/`: Jupyter notebooks for exploration and model development
* `.env`: local environment variables for development purposes (to be created for local use only from `.env.template`)
* `Dockerfile`: Docker container build definition
* `pyproject.toml`: project definition and dependencies