Spaces:
Build error
Build error
Update README.md
Browse files
README.md
CHANGED
@@ -1,104 +1,89 @@
|
|
1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
-
|
4 |
|
5 |
-
|
6 |
|
7 |
-
|
8 |
|
9 |
-
|
10 |
|
11 |
-
|
12 |
-
python add_course_workflow.py --course [COURSE_NAME]
|
13 |
-
```
|
14 |
|
15 |
-
|
|
|
|
|
16 |
|
17 |
-
|
18 |
-
2. Prompt you to manually add URLs to the course content
|
19 |
-
3. Merge the course data into the main dataset
|
20 |
-
4. Add contextual information to document nodes
|
21 |
-
5. Create vector stores
|
22 |
-
6. Upload databases to HuggingFace
|
23 |
-
7. Update UI configuration
|
24 |
|
25 |
-
|
26 |
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
|
31 |
-
|
32 |
|
33 |
-
|
|
|
|
|
34 |
|
35 |
-
|
36 |
-
python update_docs_workflow.py
|
37 |
-
```
|
38 |
|
39 |
-
|
|
|
|
|
40 |
|
41 |
-
|
42 |
-
python update_docs_workflow.py --sources transformers peft
|
43 |
-
```
|
44 |
|
45 |
-
|
46 |
|
47 |
-
|
48 |
-
2. Processing markdown files to create JSONL data
|
49 |
-
3. Adding contextual information to document nodes
|
50 |
-
4. Creating vector stores
|
51 |
-
5. Uploading databases to HuggingFace
|
52 |
|
53 |
-
|
54 |
|
55 |
-
|
|
|
|
|
|
|
56 |
|
57 |
-
|
58 |
-
python upload_jsonl_to_hf.py
|
59 |
-
```
|
60 |
|
61 |
-
|
|
|
|
|
|
|
62 |
|
63 |
-
|
64 |
|
65 |
-
|
|
|
|
|
66 |
|
67 |
-
|
68 |
-
- **Process Markdown**: `process_md_files.py`
|
69 |
-
- **Add Context**: `add_context_to_nodes.py`
|
70 |
-
- **Create Vector Stores**: `create_vector_stores.py`
|
71 |
-
- **Upload to HuggingFace**: `upload_dbs_to_hf.py`
|
72 |
|
73 |
-
|
74 |
|
75 |
-
|
76 |
-
- For new courses, use `add_course_workflow.py`
|
77 |
-
- For updated documentation, use `update_docs_workflow.py`
|
78 |
|
79 |
-
|
80 |
-
-
|
81 |
-
-
|
82 |
-
|
83 |
-
|
|
|
|
|
|
|
84 |
|
85 |
-
|
86 |
|
87 |
-
4. When adding a new course, verify that it appears in the Gradio UI:
|
88 |
-
- The workflow automatically updates `main.py` and `setup.py` to include the new source
|
89 |
-
- Check that the new source appears in the dropdown menu in the UI
|
90 |
-
- Make sure it's properly included in the default selected sources
|
91 |
-
- Restart the Gradio app to see the changes
|
92 |
-
|
93 |
-
5. First time setup or missing files:
|
94 |
-
- Both workflows automatically check for and download required data files:
|
95 |
-
- `all_sources_data.jsonl` - Contains the raw document data
|
96 |
-
- `all_sources_contextual_nodes.pkl` - Contains the processed nodes with added context
|
97 |
-
- If the PKL file exists, the `--new-context-only` flag will only process new content
|
98 |
-
- You must have proper HuggingFace credentials with access to the private repository
|
99 |
-
|
100 |
-
6. Make sure you have the required environment variables set:
|
101 |
-
- `OPENAI_API_KEY` for LLM processing
|
102 |
-
- `COHERE_API_KEY` for embeddings
|
103 |
-
- `HF_TOKEN` for HuggingFace uploads
|
104 |
-
- `GITHUB_TOKEN` for accessing documentation via the GitHub API
|
|
|
1 |
+
---
|
2 |
+
title: AI Tutor Chatbot
|
3 |
+
emoji: 🧑🏻🏫
|
4 |
+
colorFrom: gray
|
5 |
+
colorTo: pink
|
6 |
+
sdk: gradio
|
7 |
+
sdk_version: 5.20.1
|
8 |
+
app_file: scripts/main.py
|
9 |
+
pinned: false
|
10 |
+
---
|
11 |
+
### Gradio UI Chatbot
|
12 |
|
13 |
+
A Gradio UI for the chatbot is available in [scripts/main.py](./scripts/main.py).
|
14 |
|
15 |
+
The Gradio demo is deployed on Hugging Face Spaces at: [AI Tutor Chatbot on Hugging Face](https://huggingface.co/spaces/towardsai-tutors/ai-tutor-chatbot).
|
16 |
|
17 |
+
**Note:** A GitHub Action automatically deploys the Gradio demo when changes are pushed to the main branch (excluding documentation and scripts in the `data/scraping_scripts` directory).
|
18 |
|
19 |
+
### Installation (for Gradio UI)
|
20 |
|
21 |
+
1. **Create a new Python environment:**
|
|
|
|
|
22 |
|
23 |
+
```bash
|
24 |
+
python -m venv .venv
|
25 |
+
```
|
26 |
|
27 |
+
2. **Activate the environment:**
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
|
29 |
+
For macOS and Linux:
|
30 |
|
31 |
+
```bash
|
32 |
+
source .venv/bin/activate
|
33 |
+
```
|
34 |
|
35 |
+
For Windows:
|
36 |
|
37 |
+
```bash
|
38 |
+
.venv\Scripts\activate
|
39 |
+
```
|
40 |
|
41 |
+
3. **Install the dependencies:**
|
|
|
|
|
42 |
|
43 |
+
```bash
|
44 |
+
pip install -r requirements.txt
|
45 |
+
```
|
46 |
|
47 |
+
### Usage (for Gradio UI)
|
|
|
|
|
48 |
|
49 |
+
1. **Set environment variables:**
|
50 |
|
51 |
+
Before running the application, set up the required API keys:
|
|
|
|
|
|
|
|
|
52 |
|
53 |
+
For macOS and Linux:
|
54 |
|
55 |
+
```bash
|
56 |
+
export OPENAI_API_KEY=your_openai_api_key_here
|
57 |
+
export COHERE_API_KEY=your_cohere_api_key_here
|
58 |
+
```
|
59 |
|
60 |
+
For Windows:
|
|
|
|
|
61 |
|
62 |
+
```bash
|
63 |
+
set OPENAI_API_KEY=your_openai_api_key_here
|
64 |
+
set COHERE_API_KEY=your_cohere_api_key_here
|
65 |
+
```
|
66 |
|
67 |
+
2. **Run the application:**
|
68 |
|
69 |
+
```bash
|
70 |
+
python scripts/main.py
|
71 |
+
```
|
72 |
|
73 |
+
This command starts the Gradio interface for the AI Tutor chatbot.
|
|
|
|
|
|
|
|
|
74 |
|
75 |
+
### Updating Data Sources
|
76 |
|
77 |
+
This application uses a RAG (Retrieval Augmented Generation) system with multiple data sources, including documentation and courses. To update these sources:
|
|
|
|
|
78 |
|
79 |
+
1. **For adding new courses or updating documentation:**
|
80 |
+
- See the detailed instructions in [data/scraping_scripts/README.md](./data/scraping_scripts/README.md)
|
81 |
+
- Automated workflows are available for both course addition and documentation updates
|
82 |
+
|
83 |
+
2. **Available workflows:**
|
84 |
+
- `add_course_workflow.py` - For adding new course content
|
85 |
+
- `update_docs_workflow.py` - For updating documentation from GitHub repositories
|
86 |
+
- `upload_data_to_hf.py` - For uploading data files to HuggingFace
|
87 |
|
88 |
+
These scripts streamline the process of adding new content to the AI Tutor and ensure consistency across team members.
|
89 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|