Spaces:
Runtime error
Runtime error
Upload folder using huggingface_hub
Browse files- .cursorignore +6 -0
- .gitattributes +1 -0
- .gitignore +2 -0
- Dockerfile +17 -0
- Dockerfile.ollama +10 -0
- README.md +123 -10
- create-db.sql +2 -0
- describer.html +146 -0
- docker-backup/Dockerfile +18 -0
- docker-backup/docker-compose.yml +19 -0
- docker-backup/requirements.txt +2 -0
- docker-compose.yml +51 -0
- docmanager.html +134 -0
- docs/docs.go +44 -0
- docs/swagger.json +20 -0
- docs/swagger.yaml +15 -0
- example.env +4 -0
- go.mod +43 -0
- go.sum +152 -0
- index.html +227 -0
- init-db.sql +8 -0
- main.go +698 -0
- ollama-entrypoint.sh +17 -0
- ragtag4 +3 -0
- screenplays/downloader.py +63 -0
- screenplays/scripts/AmericanBeauty_final.txt +0 -0
- screenplays/scripts/thefifthelement.txt +0 -0
- screenplays/scripts/twelve_monkeys.txt +0 -0
- screenplays/send_screenplay.py +48 -0
.cursorignore
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
screenplays/scripts/*
|
2 |
+
screenplays/scripts/twelve_monkeys.txt
|
3 |
+
screenplays/scripts/thefifthelement.txt
|
4 |
+
screenplays/scripts/AmericanBeauty_final.txt
|
5 |
+
screenplays/scripts/
|
6 |
+
|
.gitattributes
CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
ragtag4 filter=lfs diff=lfs merge=lfs -text
|
.gitignore
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
*.pdf
|
2 |
+
.env
|
Dockerfile
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# golang image and setup for the gin server
|
2 |
+
FROM golang:1.24.1
|
3 |
+
|
4 |
+
# Set the working directory in the container
|
5 |
+
WORKDIR /app
|
6 |
+
|
7 |
+
# Copy the current directory contents into the container at /app
|
8 |
+
COPY . /app
|
9 |
+
|
10 |
+
RUN go mod download
|
11 |
+
|
12 |
+
RUN go build -o ragtag
|
13 |
+
|
14 |
+
EXPOSE 8080
|
15 |
+
|
16 |
+
CMD ["./ragtag"]
|
17 |
+
|
Dockerfile.ollama
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
FROM ollama/ollama:latest
|
2 |
+
|
3 |
+
# Install curl for healthcheck
|
4 |
+
RUN apt-get update && apt-get install -y curl
|
5 |
+
|
6 |
+
# Copy the entrypoint script
|
7 |
+
COPY ollama-entrypoint.sh /ollama-entrypoint.sh
|
8 |
+
RUN chmod +x /ollama-entrypoint.sh
|
9 |
+
|
10 |
+
ENTRYPOINT ["/ollama-entrypoint.sh"]
|
README.md
CHANGED
@@ -1,10 +1,123 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# what
|
2 |
+
|
3 |
+
RAG is easy! Run ollama llama3.1 in golang with a postgres database.
|
4 |
+
|
5 |
+
This is a simple example of how to use ollama with a postgres database to create a RAG system. It should be considered a starting point and not a full-featured system. It can be used and adapted for any data related use case for using llm's to answer questions about data.
|
6 |
+
|
7 |
+
- Cool feature:
|
8 |
+
You can use the title as a category filter and if you add an '@' to the query, it will filter vector docs by that matched title, so for example:
|
9 |
+
`Is @antibionic a good company?` and it sets only docs and vectors with antibionic in the title field as the ones for the entire chat session moving forward. You can see this in the screenshots below. This means you can have different types of documents and not be forced to chat with all of them at once.
|
10 |
+
|
11 |
+
It is setup on docker compose and is ready to go if you skip to that section below.
|
12 |
+
|
13 |
+
## gui
|
14 |
+
|
15 |
+
The gui includes a document manager to add and remove documents from a database and a chat interface to interact with the system. It is in the format of a single page application and is built with html, css, and javascript. The style is in the format of an emulated terminal with a black background and white / green text.
|
16 |
+
|
17 |
+

|
18 |
+
|
19 |
+

|
20 |
+
|
21 |
+
## key files
|
22 |
+
|
23 |
+
- [index.html](index.html) - a simple html gui to interact with the system
|
24 |
+
- [main.go](main.go) - the main go file to interact with the system
|
25 |
+
|
26 |
+
## what I did & learned
|
27 |
+
|
28 |
+
- created a table with a vector column and learned it had to be 4096 size to match llama3.1
|
29 |
+
- created a function to generate an embedding for a given text including splitting at 4096-bit chunks
|
30 |
+
- created a function to query the table with an embedding and return the most similar texts within the chat
|
31 |
+
- created a script to download screenplays
|
32 |
+
- created a script to send screenplays to the database and auto-embed against with the llm
|
33 |
+
- created web interface and document manager interface to do CRUD on docs and vectors and stream tokens back from chat prompt
|
34 |
+
|
35 |
+
## tools
|
36 |
+
|
37 |
+
Used https://cursor.sh and claude sonnet to help with codebase.
|
38 |
+
|
39 |
+
## curl add embedding example with title and doc
|
40 |
+
|
41 |
+
```bash
|
42 |
+
curl -X POST http://localhost:8080/add_document \
|
43 |
+
-H "Content-Type: application/json" \
|
44 |
+
-d '{"title": "Screenplay Title", "doc_text": "INT. COFFEE SHOP - DAY\n\nJANE, 30s, sits at a corner table, typing furiously on her laptop. The cafe buzzes with quiet conversation.\n\nJOHN, 40s, enters, scanning the room. He spots Jane and approaches.\n\nJOHN\nMind if I join you?\n\nJane looks up, startled."}'
|
45 |
+
```
|
46 |
+
|
47 |
+
## curl upload document example
|
48 |
+
|
49 |
+
```bash
|
50 |
+
curl -X POST http://localhost:8080/upload_document \
|
51 |
+
-H "Content-Type: multipart/form-data" \
|
52 |
+
-F "title=Example Document" \
|
53 |
+
-F "file=@/path/to/your/document.pdf"
|
54 |
+
```
|
55 |
+
|
56 |
+
## curl query example
|
57 |
+
|
58 |
+
```bash
|
59 |
+
curl -X POST http://localhost:8080/query \
|
60 |
+
-H "Content-Type: application/json" \
|
61 |
+
-d '{"query": "What are the main characters in the screenplays that are in the coffeeshop?"}'
|
62 |
+
```
|
63 |
+
|
64 |
+
## filter query example by title
|
65 |
+
|
66 |
+
```bash
|
67 |
+
curl -X POST http://localhost:8080/query \
|
68 |
+
-H "Content-Type: application/json" \
|
69 |
+
-d '{
|
70 |
+
"query": "@screenplay Tell me about the main characters",
|
71 |
+
"sessionId": "1234567890"
|
72 |
+
}'
|
73 |
+
```
|
74 |
+
|
75 |
+
## sql table creation
|
76 |
+
|
77 |
+
```sql
|
78 |
+
CREATE DATABASE IF NOT EXISTS ragtag
|
79 |
+
CREATE EXTENSION IF NOT EXISTS vector;
|
80 |
+
CREATE TABLE IF NOT EXISTS items (
|
81 |
+
id SERIAL PRIMARY KEY,
|
82 |
+
title TEXT,
|
83 |
+
doc TEXT,
|
84 |
+
embedding vector(4096)
|
85 |
+
);
|
86 |
+
```
|
87 |
+
|
88 |
+
## docker
|
89 |
+
|
90 |
+
Do this first:
|
91 |
+
`docker build -t ragtag .`
|
92 |
+
|
93 |
+
## docker compose
|
94 |
+
|
95 |
+
`docker-compose up --build`
|
96 |
+
This will pull the llama3.1 model and start the ollama server. It will also start the go server and the gui and connect to postgres and tie it all together.
|
97 |
+
|
98 |
+
## ports and versions
|
99 |
+
|
100 |
+
ollama - 11434
|
101 |
+
go server - 8080
|
102 |
+
postgres - 5432
|
103 |
+
|
104 |
+
go version - 1.23.0
|
105 |
+
ollama version - 1.10.0
|
106 |
+
postgres version - 16.1 as the pgvector:pg16 docker image
|
107 |
+
|
108 |
+
This has not been tested on other versions but should work on other versions of the software if you know what you are doing.
|
109 |
+
|
110 |
+
## curl test no stream
|
111 |
+
|
112 |
+
```bash
|
113 |
+
curl http://localhost:11434/api/generate -d '{
|
114 |
+
"model": "llama3.1",
|
115 |
+
"prompt":"Why is the sky blue?",
|
116 |
+
"stream": false
|
117 |
+
}'
|
118 |
+
```
|
119 |
+
|
120 |
+
### Helpers
|
121 |
+
|
122 |
+
- [downloader.py](screenplays/downloader.py) - downloads a screenplay from a given URL
|
123 |
+
- [send_screenplay.py](screenplays/send_screenplay.py) - sends a screenplay to the database
|
create-db.sql
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
SELECT 'CREATE DATABASE ragtag'
|
2 |
+
WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = 'ragtag')\gexec
|
describer.html
ADDED
@@ -0,0 +1,146 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<!DOCTYPE html>
|
2 |
+
<html lang="en">
|
3 |
+
<head>
|
4 |
+
<meta charset="UTF-8">
|
5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
6 |
+
<title>Image Describer</title>
|
7 |
+
<style>
|
8 |
+
body {
|
9 |
+
font-family: 'Courier New', monospace;
|
10 |
+
background-color: #000;
|
11 |
+
color: #00ff00;
|
12 |
+
margin: 0;
|
13 |
+
padding: 20px;
|
14 |
+
display: flex;
|
15 |
+
flex-direction: column;
|
16 |
+
align-items: center;
|
17 |
+
}
|
18 |
+
h1 {
|
19 |
+
color: #00ff00;
|
20 |
+
}
|
21 |
+
#imageContainer {
|
22 |
+
width: 500px;
|
23 |
+
height: 500px;
|
24 |
+
border: 1px solid #00ff00;
|
25 |
+
display: flex;
|
26 |
+
justify-content: center;
|
27 |
+
align-items: center;
|
28 |
+
margin-bottom: 20px;
|
29 |
+
}
|
30 |
+
#uploadedImage {
|
31 |
+
max-width: 100%;
|
32 |
+
max-height: 100%;
|
33 |
+
}
|
34 |
+
#descriptionContainer {
|
35 |
+
width: 500px;
|
36 |
+
min-height: 100px;
|
37 |
+
border: 1px solid #00ff00;
|
38 |
+
padding: 10px;
|
39 |
+
margin-bottom: 20px;
|
40 |
+
}
|
41 |
+
#uploadForm {
|
42 |
+
margin-bottom: 20px;
|
43 |
+
}
|
44 |
+
input[type="file"] {
|
45 |
+
display: none;
|
46 |
+
}
|
47 |
+
label, button {
|
48 |
+
background-color: #003300;
|
49 |
+
color: #00ff00;
|
50 |
+
border: 1px solid #00ff00;
|
51 |
+
padding: 5px 10px;
|
52 |
+
cursor: pointer;
|
53 |
+
}
|
54 |
+
label:hover, button:hover {
|
55 |
+
background-color: #004400;
|
56 |
+
}
|
57 |
+
a {
|
58 |
+
color: #00ff00;
|
59 |
+
text-decoration: none;
|
60 |
+
margin-bottom: 20px;
|
61 |
+
}
|
62 |
+
a:hover {
|
63 |
+
text-decoration: underline;
|
64 |
+
}
|
65 |
+
</style>
|
66 |
+
</head>
|
67 |
+
<body>
|
68 |
+
<h1>Image Describer</h1>
|
69 |
+
<a href="/">Back to Chat</a>
|
70 |
+
<div id="imageContainer">
|
71 |
+
<img id="uploadedImage" src="" alt="Uploaded image will appear here">
|
72 |
+
</div>
|
73 |
+
<form id="uploadForm" enctype="multipart/form-data">
|
74 |
+
<label for="imageFile">Choose Image</label>
|
75 |
+
<input type="file" id="imageFile" name="file" accept="image/*" required>
|
76 |
+
<button type="submit">Describe Image</button>
|
77 |
+
</form>
|
78 |
+
<div id="descriptionContainer">
|
79 |
+
Description will appear here...
|
80 |
+
</div>
|
81 |
+
|
82 |
+
<script>
|
83 |
+
const uploadForm = document.getElementById('uploadForm');
|
84 |
+
const imageFile = document.getElementById('imageFile');
|
85 |
+
const uploadedImage = document.getElementById('uploadedImage');
|
86 |
+
const descriptionContainer = document.getElementById('descriptionContainer');
|
87 |
+
|
88 |
+
uploadForm.addEventListener('submit', async (e) => {
|
89 |
+
e.preventDefault();
|
90 |
+
const formData = new FormData(uploadForm);
|
91 |
+
|
92 |
+
try {
|
93 |
+
const response = await fetch('/describe_image', {
|
94 |
+
method: 'POST',
|
95 |
+
body: formData
|
96 |
+
});
|
97 |
+
|
98 |
+
if (!response.ok) {
|
99 |
+
throw new Error('Network response was not ok');
|
100 |
+
}
|
101 |
+
|
102 |
+
const reader = response.body.getReader();
|
103 |
+
descriptionContainer.textContent = '';
|
104 |
+
|
105 |
+
let buffer = '';
|
106 |
+
while (true) {
|
107 |
+
const { value, done } = await reader.read();
|
108 |
+
if (done) break;
|
109 |
+
|
110 |
+
buffer += new TextDecoder().decode(value);
|
111 |
+
const lines = buffer.split('\n');
|
112 |
+
buffer = lines.pop() || '';
|
113 |
+
|
114 |
+
for (const line of lines) {
|
115 |
+
if (line.startsWith('data:')) {
|
116 |
+
const content = line.slice(5);
|
117 |
+
if (content) {
|
118 |
+
appendToDescription(content);
|
119 |
+
}
|
120 |
+
}
|
121 |
+
}
|
122 |
+
}
|
123 |
+
} catch (error) {
|
124 |
+
console.error('Error:', error);
|
125 |
+
descriptionContainer.textContent = 'Error describing image';
|
126 |
+
}
|
127 |
+
});
|
128 |
+
|
129 |
+
function appendToDescription(content) {
|
130 |
+
descriptionContainer.textContent += content + '';
|
131 |
+
descriptionContainer.scrollTop = descriptionContainer.scrollHeight;
|
132 |
+
}
|
133 |
+
|
134 |
+
imageFile.addEventListener('change', (e) => {
|
135 |
+
const file = e.target.files[0];
|
136 |
+
if (file) {
|
137 |
+
const reader = new FileReader();
|
138 |
+
reader.onload = (e) => {
|
139 |
+
uploadedImage.src = e.target.result;
|
140 |
+
};
|
141 |
+
reader.readAsDataURL(file);
|
142 |
+
}
|
143 |
+
});
|
144 |
+
</script>
|
145 |
+
</body>
|
146 |
+
</html>
|
docker-backup/Dockerfile
ADDED
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# golang image and setup for the gin server
|
2 |
+
FROM golang:1.23.0
|
3 |
+
|
4 |
+
# Set the working directory in the container
|
5 |
+
WORKDIR /app
|
6 |
+
|
7 |
+
# Copy the current directory contents into the container at /app
|
8 |
+
COPY . /app
|
9 |
+
|
10 |
+
# Install dependencies
|
11 |
+
RUN go mod download
|
12 |
+
|
13 |
+
# Build with verbose output
|
14 |
+
RUN go build -v -o ragtag
|
15 |
+
|
16 |
+
EXPOSE 8080
|
17 |
+
|
18 |
+
CMD ["./ragtag"]
|
docker-backup/docker-compose.yml
ADDED
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
version: '3.8'
|
2 |
+
|
3 |
+
services:
|
4 |
+
ollama:
|
5 |
+
image: ollama/ollama:latest
|
6 |
+
container_name: ollama
|
7 |
+
ports:
|
8 |
+
- "11434:11434" # Expose the default Ollama API port
|
9 |
+
environment:
|
10 |
+
- OLLAMA_PORT=11434
|
11 |
+
- OLLAMA_HOST=0.0.0.0
|
12 |
+
- OLLAMA_MODEL=llama3.1
|
13 |
+
restart: unless-stopped
|
14 |
+
api:
|
15 |
+
image: ragtag:latest
|
16 |
+
container_name: api
|
17 |
+
ports:
|
18 |
+
- "8080:8080"
|
19 |
+
restart: unless-stopped
|
docker-backup/requirements.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
flask
|
2 |
+
ollama
|
docker-compose.yml
ADDED
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
services:
|
2 |
+
ollama:
|
3 |
+
build:
|
4 |
+
context: .
|
5 |
+
dockerfile: Dockerfile.ollama
|
6 |
+
container_name: ollama
|
7 |
+
ports:
|
8 |
+
- "11434:11434"
|
9 |
+
environment:
|
10 |
+
- OLLAMA_PORT=11434
|
11 |
+
- OLLAMA_HOST=0.0.0.0
|
12 |
+
- OLLAMA_MODEL=llama3.1
|
13 |
+
restart: unless-stopped
|
14 |
+
healthcheck:
|
15 |
+
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
|
16 |
+
interval: 30s
|
17 |
+
timeout: 10s
|
18 |
+
retries: 5
|
19 |
+
|
20 |
+
db:
|
21 |
+
image: pgvector/pgvector:pg16
|
22 |
+
container_name: db
|
23 |
+
ports:
|
24 |
+
- "5432:5432"
|
25 |
+
environment:
|
26 |
+
- POSTGRES_DB=ragtag
|
27 |
+
- POSTGRES_USER=${DB_USER}
|
28 |
+
- POSTGRES_PASSWORD=${DB_PASSWORD}
|
29 |
+
volumes:
|
30 |
+
- ./init-db.sql:/docker-entrypoint-initdb.d/init-db.sql
|
31 |
+
restart: unless-stopped
|
32 |
+
healthcheck:
|
33 |
+
test: ["CMD-SHELL", "pg_isready -U ${DB_USER} -d ragtag"]
|
34 |
+
interval: 10s
|
35 |
+
timeout: 5s
|
36 |
+
retries: 5
|
37 |
+
|
38 |
+
api:
|
39 |
+
image: ragtag:latest
|
40 |
+
container_name: api
|
41 |
+
ports:
|
42 |
+
- "8080:8080"
|
43 |
+
environment:
|
44 |
+
- DB_URL=${DB_URL_DOCKER}
|
45 |
+
- OLLAMA_HOST=ollama
|
46 |
+
restart: unless-stopped
|
47 |
+
depends_on:
|
48 |
+
db:
|
49 |
+
condition: service_healthy
|
50 |
+
ollama:
|
51 |
+
condition: service_healthy
|
docmanager.html
ADDED
@@ -0,0 +1,134 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<!DOCTYPE html>
|
2 |
+
<html lang="en">
|
3 |
+
<head>
|
4 |
+
<meta charset="UTF-8">
|
5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
6 |
+
<title>Document Manager</title>
|
7 |
+
<style>
|
8 |
+
body {
|
9 |
+
font-family: 'Courier New', monospace;
|
10 |
+
background-color: #000;
|
11 |
+
color: #00ff00;
|
12 |
+
margin: 0;
|
13 |
+
padding: 20px;
|
14 |
+
}
|
15 |
+
h1 {
|
16 |
+
color: #00ff00;
|
17 |
+
}
|
18 |
+
table {
|
19 |
+
width: 100%;
|
20 |
+
border-collapse: collapse;
|
21 |
+
}
|
22 |
+
th, td {
|
23 |
+
border: 1px solid #00ff00;
|
24 |
+
padding: 8px;
|
25 |
+
text-align: left;
|
26 |
+
}
|
27 |
+
th {
|
28 |
+
background-color: #003300;
|
29 |
+
}
|
30 |
+
button {
|
31 |
+
background-color: #003300;
|
32 |
+
color: #00ff00;
|
33 |
+
border: 1px solid #00ff00;
|
34 |
+
padding: 5px 10px;
|
35 |
+
cursor: pointer;
|
36 |
+
}
|
37 |
+
button:hover {
|
38 |
+
background-color: #004400;
|
39 |
+
}
|
40 |
+
a {
|
41 |
+
color: #00ff00;
|
42 |
+
text-decoration: none;
|
43 |
+
}
|
44 |
+
a:hover {
|
45 |
+
text-decoration: underline;
|
46 |
+
}
|
47 |
+
</style>
|
48 |
+
</head>
|
49 |
+
<body>
|
50 |
+
<h1>Document Manager</h1>
|
51 |
+
<a href="/" style="margin-bottom: 20px; display: block;">Back to Chat</a>
|
52 |
+
<table id="docTable">
|
53 |
+
<thead>
|
54 |
+
<tr>
|
55 |
+
<th>Title</th>
|
56 |
+
<th>Count</th>
|
57 |
+
<th>Action</th>
|
58 |
+
</tr>
|
59 |
+
</thead>
|
60 |
+
<tbody>
|
61 |
+
<!-- Table rows will be dynamically added here -->
|
62 |
+
</tbody>
|
63 |
+
</table>
|
64 |
+
|
65 |
+
<h2>Upload New Document</h2>
|
66 |
+
<form id="uploadForm" enctype="multipart/form-data">
|
67 |
+
<input type="text" id="docTitle" name="title" placeholder="Document Title" required>
|
68 |
+
<input type="file" id="docFile" name="file" accept=".txt,.pdf,.jpg,.jpeg,.png" required>
|
69 |
+
<button type="submit">Upload</button>
|
70 |
+
</form>
|
71 |
+
|
72 |
+
<script>
|
73 |
+
async function fetchDocuments() {
|
74 |
+
let documents = [];
|
75 |
+
try {
|
76 |
+
documents = await fetch('/documents').then(r => r.json());
|
77 |
+
if (!Array.isArray(documents)) documents = [];
|
78 |
+
} catch {}
|
79 |
+
const tableBody = document.querySelector('#docTable tbody');
|
80 |
+
tableBody.innerHTML = '';
|
81 |
+
documents.forEach(doc => {
|
82 |
+
const row = tableBody.insertRow();
|
83 |
+
row.insertCell(0).textContent = doc.title;
|
84 |
+
row.insertCell(1).textContent = doc.count;
|
85 |
+
const deleteButton = document.createElement('button');
|
86 |
+
deleteButton.textContent = 'Delete';
|
87 |
+
deleteButton.onclick = () => deleteDocument(doc.title);
|
88 |
+
row.insertCell(2).appendChild(deleteButton);
|
89 |
+
});
|
90 |
+
}
|
91 |
+
|
92 |
+
async function deleteDocument(title) {
|
93 |
+
const response = await fetch('/delete_document', {
|
94 |
+
method: 'POST',
|
95 |
+
headers: {
|
96 |
+
'Content-Type': 'application/json',
|
97 |
+
},
|
98 |
+
body: JSON.stringify({ title: title }),
|
99 |
+
});
|
100 |
+
if (response.ok) {
|
101 |
+
fetchDocuments();
|
102 |
+
} else {
|
103 |
+
alert('Error deleting document');
|
104 |
+
}
|
105 |
+
}
|
106 |
+
|
107 |
+
async function uploadDocument(formData) {
|
108 |
+
const response = await fetch('/upload_document', {
|
109 |
+
method: 'POST',
|
110 |
+
body: formData
|
111 |
+
});
|
112 |
+
if (response.ok) {
|
113 |
+
alert('Document uploaded successfully');
|
114 |
+
fetchDocuments();
|
115 |
+
} else {
|
116 |
+
let msg = 'Error uploading document';
|
117 |
+
try {
|
118 |
+
const data = await response.json();
|
119 |
+
if (data && data.error) msg = data.error;
|
120 |
+
} catch {}
|
121 |
+
alert(msg);
|
122 |
+
}
|
123 |
+
}
|
124 |
+
|
125 |
+
document.getElementById('uploadForm').addEventListener('submit', function(e) {
|
126 |
+
e.preventDefault();
|
127 |
+
const formData = new FormData(this);
|
128 |
+
uploadDocument(formData);
|
129 |
+
});
|
130 |
+
|
131 |
+
fetchDocuments();
|
132 |
+
</script>
|
133 |
+
</body>
|
134 |
+
</html>
|
docs/docs.go
ADDED
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
// Package docs Code generated by swaggo/swag. DO NOT EDIT
|
2 |
+
package docs
|
3 |
+
|
4 |
+
import "github.com/swaggo/swag"
|
5 |
+
|
6 |
+
const docTemplate = `{
|
7 |
+
"schemes": {{ marshal .Schemes }},
|
8 |
+
"swagger": "2.0",
|
9 |
+
"info": {
|
10 |
+
"description": "{{escape .Description}}",
|
11 |
+
"title": "{{.Title}}",
|
12 |
+
"termsOfService": "http://swagger.io/terms/",
|
13 |
+
"contact": {
|
14 |
+
"name": "James Campbell",
|
15 |
+
"email": "[email protected]"
|
16 |
+
},
|
17 |
+
"license": {
|
18 |
+
"name": "Apache 2.0",
|
19 |
+
"url": "http://www.apache.org/licenses/LICENSE-2.0.html"
|
20 |
+
},
|
21 |
+
"version": "{{.Version}}"
|
22 |
+
},
|
23 |
+
"host": "{{.Host}}",
|
24 |
+
"basePath": "{{.BasePath}}",
|
25 |
+
"paths": {}
|
26 |
+
}`
|
27 |
+
|
28 |
+
// SwaggerInfo holds exported Swagger Info so clients can modify it
|
29 |
+
var SwaggerInfo = &swag.Spec{
|
30 |
+
Version: "1.0",
|
31 |
+
Host: "localhost:8080",
|
32 |
+
BasePath: "/",
|
33 |
+
Schemes: []string{},
|
34 |
+
Title: "RAGTAG API",
|
35 |
+
Description: "This is the API for the RAGTAG system.",
|
36 |
+
InfoInstanceName: "swagger",
|
37 |
+
SwaggerTemplate: docTemplate,
|
38 |
+
LeftDelim: "{{",
|
39 |
+
RightDelim: "}}",
|
40 |
+
}
|
41 |
+
|
42 |
+
func init() {
|
43 |
+
swag.Register(SwaggerInfo.InstanceName(), SwaggerInfo)
|
44 |
+
}
|
docs/swagger.json
ADDED
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"swagger": "2.0",
|
3 |
+
"info": {
|
4 |
+
"description": "This is the API for the RAGTAG system.",
|
5 |
+
"title": "RAGTAG API",
|
6 |
+
"termsOfService": "http://swagger.io/terms/",
|
7 |
+
"contact": {
|
8 |
+
"name": "James Campbell",
|
9 |
+
"email": "[email protected]"
|
10 |
+
},
|
11 |
+
"license": {
|
12 |
+
"name": "Apache 2.0",
|
13 |
+
"url": "http://www.apache.org/licenses/LICENSE-2.0.html"
|
14 |
+
},
|
15 |
+
"version": "1.0"
|
16 |
+
},
|
17 |
+
"host": "localhost:8080",
|
18 |
+
"basePath": "/",
|
19 |
+
"paths": {}
|
20 |
+
}
|
docs/swagger.yaml
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
basePath: /
|
2 |
+
host: localhost:8080
|
3 |
+
info:
|
4 |
+
contact:
|
5 |
+
email: [email protected]
|
6 |
+
name: James Campbell
|
7 |
+
description: This is the API for the RAGTAG system.
|
8 |
+
license:
|
9 |
+
name: Apache 2.0
|
10 |
+
url: http://www.apache.org/licenses/LICENSE-2.0.html
|
11 |
+
termsOfService: http://swagger.io/terms/
|
12 |
+
title: RAGTAG API
|
13 |
+
version: "1.0"
|
14 |
+
paths: {}
|
15 |
+
swagger: "2.0"
|
example.env
ADDED
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
1 |
+
DB_URL=postgres://postgres:password@localhost:5432/ragtag
|
2 |
+
DB_URL_DOCKER=postgres://ragtag:ragtag@db:5432/ragtag
|
3 |
+
DB_USER=ragtag
|
4 |
+
DB_PASSWORD=ragtag
|
go.mod
ADDED
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
module github.com/james-see/ragtag4
|
2 |
+
|
3 |
+
go 1.24.1
|
4 |
+
|
5 |
+
require (
|
6 |
+
github.com/gin-gonic/gin v1.10.0
|
7 |
+
github.com/jackc/pgx/v5 v5.6.0
|
8 |
+
github.com/joho/godotenv v1.5.1
|
9 |
+
github.com/ledongthuc/pdf v0.0.0-20250511090121-5959a4027728
|
10 |
+
github.com/ollama/ollama v0.3.6
|
11 |
+
github.com/pgvector/pgvector-go v0.2.2
|
12 |
+
)
|
13 |
+
|
14 |
+
require (
|
15 |
+
github.com/bytedance/sonic v1.11.6 // indirect
|
16 |
+
github.com/bytedance/sonic/loader v0.1.1 // indirect
|
17 |
+
github.com/cloudwego/base64x v0.1.4 // indirect
|
18 |
+
github.com/cloudwego/iasm v0.2.0 // indirect
|
19 |
+
github.com/gabriel-vasile/mimetype v1.4.3 // indirect
|
20 |
+
github.com/gin-contrib/sse v0.1.0 // indirect
|
21 |
+
github.com/go-playground/locales v0.14.1 // indirect
|
22 |
+
github.com/go-playground/universal-translator v0.18.1 // indirect
|
23 |
+
github.com/go-playground/validator/v10 v10.20.0 // indirect
|
24 |
+
github.com/goccy/go-json v0.10.2 // indirect
|
25 |
+
github.com/jackc/pgpassfile v1.0.0 // indirect
|
26 |
+
github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 // indirect
|
27 |
+
github.com/json-iterator/go v1.1.12 // indirect
|
28 |
+
github.com/klauspost/cpuid/v2 v2.2.7 // indirect
|
29 |
+
github.com/leodido/go-urn v1.4.0 // indirect
|
30 |
+
github.com/mattn/go-isatty v0.0.20 // indirect
|
31 |
+
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
|
32 |
+
github.com/modern-go/reflect2 v1.0.2 // indirect
|
33 |
+
github.com/pelletier/go-toml/v2 v2.2.2 // indirect
|
34 |
+
github.com/twitchyliquid64/golang-asm v0.15.1 // indirect
|
35 |
+
github.com/ugorji/go/codec v1.2.12 // indirect
|
36 |
+
golang.org/x/arch v0.8.0 // indirect
|
37 |
+
golang.org/x/crypto v0.25.0 // indirect
|
38 |
+
golang.org/x/net v0.25.0 // indirect
|
39 |
+
golang.org/x/sys v0.22.0 // indirect
|
40 |
+
golang.org/x/text v0.16.0 // indirect
|
41 |
+
google.golang.org/protobuf v1.34.1 // indirect
|
42 |
+
gopkg.in/yaml.v3 v3.0.1 // indirect
|
43 |
+
)
|
go.sum
ADDED
@@ -0,0 +1,152 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
entgo.io/ent v0.13.1 h1:uD8QwN1h6SNphdCCzmkMN3feSUzNnVvV/WIkHKMbzOE=
|
2 |
+
entgo.io/ent v0.13.1/go.mod h1:qCEmo+biw3ccBn9OyL4ZK5dfpwg++l1Gxwac5B1206A=
|
3 |
+
github.com/bytedance/sonic v1.11.6 h1:oUp34TzMlL+OY1OUWxHqsdkgC/Zfc85zGqw9siXjrc0=
|
4 |
+
github.com/bytedance/sonic v1.11.6/go.mod h1:LysEHSvpvDySVdC2f87zGWf6CIKJcAvqab1ZaiQtds4=
|
5 |
+
github.com/bytedance/sonic/loader v0.1.1 h1:c+e5Pt1k/cy5wMveRDyk2X4B9hF4g7an8N3zCYjJFNM=
|
6 |
+
github.com/bytedance/sonic/loader v0.1.1/go.mod h1:ncP89zfokxS5LZrJxl5z0UJcsk4M4yY2JpfqGeCtNLU=
|
7 |
+
github.com/cloudwego/base64x v0.1.4 h1:jwCgWpFanWmN8xoIUHa2rtzmkd5J2plF/dnLS6Xd/0Y=
|
8 |
+
github.com/cloudwego/base64x v0.1.4/go.mod h1:0zlkT4Wn5C6NdauXdJRhSKRlJvmclQ1hhJgA0rcu/8w=
|
9 |
+
github.com/cloudwego/iasm v0.2.0 h1:1KNIy1I1H9hNNFEEH3DVnI4UujN+1zjpuk6gwHLTssg=
|
10 |
+
github.com/cloudwego/iasm v0.2.0/go.mod h1:8rXZaNYT2n95jn+zTI1sDr+IgcD2GVs0nlbbQPiEFhY=
|
11 |
+
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
12 |
+
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
|
13 |
+
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
14 |
+
github.com/gabriel-vasile/mimetype v1.4.3 h1:in2uUcidCuFcDKtdcBxlR0rJ1+fsokWf+uqxgUFjbI0=
|
15 |
+
github.com/gabriel-vasile/mimetype v1.4.3/go.mod h1:d8uq/6HKRL6CGdk+aubisF/M5GcPfT7nKyLpA0lbSSk=
|
16 |
+
github.com/gin-contrib/sse v0.1.0 h1:Y/yl/+YNO8GZSjAhjMsSuLt29uWRFHdHYUb5lYOV9qE=
|
17 |
+
github.com/gin-contrib/sse v0.1.0/go.mod h1:RHrZQHXnP2xjPF+u1gW/2HnVO7nvIa9PG3Gm+fLHvGI=
|
18 |
+
github.com/gin-gonic/gin v1.10.0 h1:nTuyha1TYqgedzytsKYqna+DfLos46nTv2ygFy86HFU=
|
19 |
+
github.com/gin-gonic/gin v1.10.0/go.mod h1:4PMNQiOhvDRa013RKVbsiNwoyezlm2rm0uX/T7kzp5Y=
|
20 |
+
github.com/go-pg/pg/v10 v10.11.0 h1:CMKJqLgTrfpE/aOVeLdybezR2om071Vh38OLZjsyMI0=
|
21 |
+
github.com/go-pg/pg/v10 v10.11.0/go.mod h1:4BpHRoxE61y4Onpof3x1a2SQvi9c+q1dJnrNdMjsroA=
|
22 |
+
github.com/go-pg/zerochecker v0.2.0 h1:pp7f72c3DobMWOb2ErtZsnrPaSvHd2W4o9//8HtF4mU=
|
23 |
+
github.com/go-pg/zerochecker v0.2.0/go.mod h1:NJZ4wKL0NmTtz0GKCoJ8kym6Xn/EQzXRl2OnAe7MmDo=
|
24 |
+
github.com/go-playground/assert/v2 v2.2.0 h1:JvknZsQTYeFEAhQwI4qEt9cyV5ONwRHC+lYKSsYSR8s=
|
25 |
+
github.com/go-playground/assert/v2 v2.2.0/go.mod h1:VDjEfimB/XKnb+ZQfWdccd7VUvScMdVu0Titje2rxJ4=
|
26 |
+
github.com/go-playground/locales v0.14.1 h1:EWaQ/wswjilfKLTECiXz7Rh+3BjFhfDFKv/oXslEjJA=
|
27 |
+
github.com/go-playground/locales v0.14.1/go.mod h1:hxrqLVvrK65+Rwrd5Fc6F2O76J/NuW9t0sjnWqG1slY=
|
28 |
+
github.com/go-playground/universal-translator v0.18.1 h1:Bcnm0ZwsGyWbCzImXv+pAJnYK9S473LQFuzCbDbfSFY=
|
29 |
+
github.com/go-playground/universal-translator v0.18.1/go.mod h1:xekY+UJKNuX9WP91TpwSH2VMlDf28Uj24BCp08ZFTUY=
|
30 |
+
github.com/go-playground/validator/v10 v10.20.0 h1:K9ISHbSaI0lyB2eWMPJo+kOS/FBExVwjEviJTixqxL8=
|
31 |
+
github.com/go-playground/validator/v10 v10.20.0/go.mod h1:dbuPbCMFw/DrkbEynArYaCwl3amGuJotoKCe95atGMM=
|
32 |
+
github.com/goccy/go-json v0.10.2 h1:CrxCmQqYDkv1z7lO7Wbh2HN93uovUHgrECaO5ZrCXAU=
|
33 |
+
github.com/goccy/go-json v0.10.2/go.mod h1:6MelG93GURQebXPDq3khkgXZkazVtN9CRI+MGFi0w8I=
|
34 |
+
github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI=
|
35 |
+
github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
|
36 |
+
github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
|
37 |
+
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
|
38 |
+
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
|
39 |
+
github.com/jackc/pgpassfile v1.0.0 h1:/6Hmqy13Ss2zCq62VdNG8tM1wchn8zjSGOBJ6icpsIM=
|
40 |
+
github.com/jackc/pgpassfile v1.0.0/go.mod h1:CEx0iS5ambNFdcRtxPj5JhEz+xB6uRky5eyVu/W2HEg=
|
41 |
+
github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 h1:iCEnooe7UlwOQYpKFhBabPMi4aNAfoODPEFNiAnClxo=
|
42 |
+
github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761/go.mod h1:5TJZWKEWniPve33vlWYSoGYefn3gLQRzjfDlhSJ9ZKM=
|
43 |
+
github.com/jackc/pgx/v5 v5.6.0 h1:SWJzexBzPL5jb0GEsrPMLIsi/3jOo7RHlzTjcAeDrPY=
|
44 |
+
github.com/jackc/pgx/v5 v5.6.0/go.mod h1:DNZ/vlrUnhWCoFGxHAG8U2ljioxukquj7utPDgtQdTw=
|
45 |
+
github.com/jackc/puddle/v2 v2.2.1 h1:RhxXJtFG022u4ibrCSMSiu5aOq1i77R3OHKNJj77OAk=
|
46 |
+
github.com/jackc/puddle/v2 v2.2.1/go.mod h1:vriiEXHvEE654aYKXXjOvZM39qJ0q+azkZFrfEOc3H4=
|
47 |
+
github.com/jinzhu/inflection v1.0.0 h1:K317FqzuhWc8YvSVlFMCCUb36O/S9MCKRDI7QkRKD/E=
|
48 |
+
github.com/jinzhu/inflection v1.0.0/go.mod h1:h+uFLlag+Qp1Va5pdKtLDYj+kHp5pxUVkryuEj+Srlc=
|
49 |
+
github.com/jinzhu/now v1.1.5 h1:/o9tlHleP7gOFmsnYNz3RGnqzefHA47wQpKrrdTIwXQ=
|
50 |
+
github.com/jinzhu/now v1.1.5/go.mod h1:d3SSVoowX0Lcu0IBviAWJpolVfI5UJVZZ7cO71lE/z8=
|
51 |
+
github.com/jmoiron/sqlx v1.3.5 h1:vFFPA71p1o5gAeqtEAwLU4dnX2napprKtHr7PYIcN3g=
|
52 |
+
github.com/jmoiron/sqlx v1.3.5/go.mod h1:nRVWtLre0KfCLJvgxzCsLVMogSvQ1zNJtpYr2Ccp0mQ=
|
53 |
+
github.com/joho/godotenv v1.5.1 h1:7eLL/+HRGLY0ldzfGMeQkb7vMd0as4CfYvUVzLqw0N0=
|
54 |
+
github.com/joho/godotenv v1.5.1/go.mod h1:f4LDr5Voq0i2e/R5DDNOoa2zzDfwtkZa6DnEwAbqwq4=
|
55 |
+
github.com/json-iterator/go v1.1.12 h1:PV8peI4a0ysnczrg+LtxykD8LfKY9ML6u2jnxaEnrnM=
|
56 |
+
github.com/json-iterator/go v1.1.12/go.mod h1:e30LSqwooZae/UwlEbR2852Gd8hjQvJoHmT4TnhNGBo=
|
57 |
+
github.com/klauspost/cpuid/v2 v2.0.9/go.mod h1:FInQzS24/EEf25PyTYn52gqo7WaD8xa0213Md/qVLRg=
|
58 |
+
github.com/klauspost/cpuid/v2 v2.2.7 h1:ZWSB3igEs+d0qvnxR/ZBzXVmxkgt8DdzP6m9pfuVLDM=
|
59 |
+
github.com/klauspost/cpuid/v2 v2.2.7/go.mod h1:Lcz8mBdAVJIBVzewtcLocK12l3Y+JytZYpaMropDUws=
|
60 |
+
github.com/knz/go-libedit v1.10.1/go.mod h1:MZTVkCWyz0oBc7JOWP3wNAzd002ZbM/5hgShxwh4x8M=
|
61 |
+
github.com/kr/pretty v0.3.0 h1:WgNl7dwNpEZ6jJ9k1snq4pZsg7DOEN8hP9Xw0Tsjwk0=
|
62 |
+
github.com/kr/pretty v0.3.0/go.mod h1:640gp4NfQd8pI5XOwp5fnNeVWj67G7CFk/SaSQn7NBk=
|
63 |
+
github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
|
64 |
+
github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
|
65 |
+
github.com/ledongthuc/pdf v0.0.0-20240201131950-da5b75280b06 h1:kacRlPN7EN++tVpGUorNGPn/4DnB7/DfTY82AOn6ccU=
|
66 |
+
github.com/ledongthuc/pdf v0.0.0-20240201131950-da5b75280b06/go.mod h1:imJHygn/1yfhB7XSJJKlFZKl/J+dCPAknuiaGOshXAs=
|
67 |
+
github.com/ledongthuc/pdf v0.0.0-20250511090121-5959a4027728 h1:QwWKgMY28TAXaDl+ExRDqGQltzXqN/xypdKP86niVn8=
|
68 |
+
github.com/ledongthuc/pdf v0.0.0-20250511090121-5959a4027728/go.mod h1:1fEHWurg7pvf5SG6XNE5Q8UZmOwex51Mkx3SLhrW5B4=
|
69 |
+
github.com/leodido/go-urn v1.4.0 h1:WT9HwE9SGECu3lg4d/dIA+jxlljEa1/ffXKmRjqdmIQ=
|
70 |
+
github.com/leodido/go-urn v1.4.0/go.mod h1:bvxc+MVxLKB4z00jd1z+Dvzr47oO32F/QSNjSBOlFxI=
|
71 |
+
github.com/lib/pq v1.10.9 h1:YXG7RB+JIjhP29X+OtkiDnYaXQwpS4JEWq7dtCCRUEw=
|
72 |
+
github.com/lib/pq v1.10.9/go.mod h1:AlVN5x4E4T544tWzH6hKfbfQvm3HdbOxrmggDNAPY9o=
|
73 |
+
github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY=
|
74 |
+
github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
|
75 |
+
github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
|
76 |
+
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w8PVh93nsPXa1VrQ6jlwL5oN8l14QlcNfg=
|
77 |
+
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
|
78 |
+
github.com/modern-go/reflect2 v1.0.2 h1:xBagoLtFs94CBntxluKeaWgTMpvLxC4ur3nMaC9Gz0M=
|
79 |
+
github.com/modern-go/reflect2 v1.0.2/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk=
|
80 |
+
github.com/ollama/ollama v0.3.6 h1:nA/N0AmjP327po5cZDGLqI40nl+aeei0pD0dLa92ypE=
|
81 |
+
github.com/ollama/ollama v0.3.6/go.mod h1:YrWoNkFnPOYsnDvsf/Ztb1wxU9/IXrNsQHqcxbY2r94=
|
82 |
+
github.com/pelletier/go-toml/v2 v2.2.2 h1:aYUidT7k73Pcl9nb2gScu7NSrKCSHIDE89b3+6Wq+LM=
|
83 |
+
github.com/pelletier/go-toml/v2 v2.2.2/go.mod h1:1t835xjRzz80PqgE6HHgN2JOsmgYu/h4qDAS4n929Rs=
|
84 |
+
github.com/pgvector/pgvector-go v0.2.2 h1:Q/oArmzgbEcio88q0tWQksv/u9Gnb1c3F1K2TnalxR0=
|
85 |
+
github.com/pgvector/pgvector-go v0.2.2/go.mod h1:u5sg3z9bnqVEdpe1pkTij8/rFhTaMCMNyQagPDLK8gQ=
|
86 |
+
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
|
87 |
+
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
|
88 |
+
github.com/rogpeppe/go-internal v1.9.0 h1:73kH8U+JUqXU8lRuOHeVHaa/SZPifC7BkcraZVejAe8=
|
89 |
+
github.com/rogpeppe/go-internal v1.9.0/go.mod h1:WtVeX8xhTBvf0smdhujwtBcq4Qrzq/fJaraNFVN+nFs=
|
90 |
+
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
|
91 |
+
github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw=
|
92 |
+
github.com/stretchr/objx v0.5.0/go.mod h1:Yh+to48EsGEfYuaHDzXPcE3xhTkx73EhmCGUpEOglKo=
|
93 |
+
github.com/stretchr/objx v0.5.2/go.mod h1:FRsXN1f5AsAjCGJKqEizvkpNtU+EGNCLh3NxZ/8L+MA=
|
94 |
+
github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI=
|
95 |
+
github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
|
96 |
+
github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
|
97 |
+
github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU=
|
98 |
+
github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4=
|
99 |
+
github.com/stretchr/testify v1.8.4/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo=
|
100 |
+
github.com/stretchr/testify v1.9.0 h1:HtqpIVDClZ4nwg75+f6Lvsy/wHu+3BoSGCbBAcpTsTg=
|
101 |
+
github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
|
102 |
+
github.com/tmthrgd/go-hex v0.0.0-20190904060850-447a3041c3bc h1:9lRDQMhESg+zvGYmW5DyG0UqvY96Bu5QYsTLvCHdrgo=
|
103 |
+
github.com/tmthrgd/go-hex v0.0.0-20190904060850-447a3041c3bc/go.mod h1:bciPuU6GHm1iF1pBvUfxfsH0Wmnc2VbpgvbI9ZWuIRs=
|
104 |
+
github.com/twitchyliquid64/golang-asm v0.15.1 h1:SU5vSMR7hnwNxj24w34ZyCi/FmDZTkS4MhqMhdFk5YI=
|
105 |
+
github.com/twitchyliquid64/golang-asm v0.15.1/go.mod h1:a1lVb/DtPvCB8fslRZhAngC2+aY1QWCk3Cedj/Gdt08=
|
106 |
+
github.com/ugorji/go/codec v1.2.12 h1:9LC83zGrHhuUA9l16C9AHXAqEV/2wBQ4nkvumAE65EE=
|
107 |
+
github.com/ugorji/go/codec v1.2.12/go.mod h1:UNopzCgEMSXjBc6AOMqYvWC1ktqTAfzJZUZgYf6w6lg=
|
108 |
+
github.com/uptrace/bun v1.1.12 h1:sOjDVHxNTuM6dNGaba0wUuz7KvDE1BmNu9Gqs2gJSXQ=
|
109 |
+
github.com/uptrace/bun v1.1.12/go.mod h1:NPG6JGULBeQ9IU6yHp7YGELRa5Agmd7ATZdz4tGZ6z0=
|
110 |
+
github.com/uptrace/bun/dialect/pgdialect v1.1.12 h1:m/CM1UfOkoBTglGO5CUTKnIKKOApOYxkcP2qn0F9tJk=
|
111 |
+
github.com/uptrace/bun/dialect/pgdialect v1.1.12/go.mod h1:Ij6WIxQILxLlL2frUBxUBOZJtLElD2QQNDcu/PWDHTc=
|
112 |
+
github.com/uptrace/bun/driver/pgdriver v1.1.12 h1:3rRWB1GK0psTJrHwxzNfEij2MLibggiLdTqjTtfHc1w=
|
113 |
+
github.com/uptrace/bun/driver/pgdriver v1.1.12/go.mod h1:ssYUP+qwSEgeDDS1xm2XBip9el1y9Mi5mTAvLoiADLM=
|
114 |
+
github.com/vmihailenco/bufpool v0.1.11 h1:gOq2WmBrq0i2yW5QJ16ykccQ4wH9UyEsgLm6czKAd94=
|
115 |
+
github.com/vmihailenco/bufpool v0.1.11/go.mod h1:AFf/MOy3l2CFTKbxwt0mp2MwnqjNEs5H/UxrkA5jxTQ=
|
116 |
+
github.com/vmihailenco/msgpack/v5 v5.3.5 h1:5gO0H1iULLWGhs2H5tbAHIZTV8/cYafcFOr9znI5mJU=
|
117 |
+
github.com/vmihailenco/msgpack/v5 v5.3.5/go.mod h1:7xyJ9e+0+9SaZT0Wt1RGleJXzli6Q/V5KbhBonMG9jc=
|
118 |
+
github.com/vmihailenco/tagparser v0.1.2 h1:gnjoVuB/kljJ5wICEEOpx98oXMWPLj22G67Vbd1qPqc=
|
119 |
+
github.com/vmihailenco/tagparser v0.1.2/go.mod h1:OeAg3pn3UbLjkWt+rN9oFYB6u/cQgqMEUPoW2WPyhdI=
|
120 |
+
github.com/vmihailenco/tagparser/v2 v2.0.0 h1:y09buUbR+b5aycVFQs/g70pqKVZNBmxwAhO7/IwNM9g=
|
121 |
+
github.com/vmihailenco/tagparser/v2 v2.0.0/go.mod h1:Wri+At7QHww0WTrCBeu4J6bNtoV6mEfg5OIWRZA9qds=
|
122 |
+
golang.org/x/arch v0.0.0-20210923205945-b76863e36670/go.mod h1:5om86z9Hs0C8fWVUuoMHwpExlXzs5Tkyp9hOrfG7pp8=
|
123 |
+
golang.org/x/arch v0.8.0 h1:3wRIsP3pM4yUptoR96otTUOXI367OS0+c9eeRi9doIc=
|
124 |
+
golang.org/x/arch v0.8.0/go.mod h1:FEVrYAQjsQXMVJ1nsMoVVXPZg6p2JE2mx8psSWTDQys=
|
125 |
+
golang.org/x/crypto v0.25.0 h1:ypSNr+bnYL2YhwoMt2zPxHFmbAN1KZs/njMG3hxUp30=
|
126 |
+
golang.org/x/crypto v0.25.0/go.mod h1:T+wALwcMOSE0kXgUAnPAHqTLW+XHgcELELW8VaDgm/M=
|
127 |
+
golang.org/x/net v0.25.0 h1:d/OCCoBEUq33pjydKrGQhw7IlUPI2Oylr+8qLx49kac=
|
128 |
+
golang.org/x/net v0.25.0/go.mod h1:JkAGAh7GEvH74S6FOH42FLoXpXbE/aqXSrIQjXgsiwM=
|
129 |
+
golang.org/x/sync v0.7.0 h1:YsImfSBoP9QPYL0xyKJPq0gcaJdG3rInoqxTWbfQu9M=
|
130 |
+
golang.org/x/sync v0.7.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
|
131 |
+
golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
132 |
+
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
133 |
+
golang.org/x/sys v0.22.0 h1:RI27ohtqKCnwULzJLqkv897zojh5/DwS/ENaMzUOaWI=
|
134 |
+
golang.org/x/sys v0.22.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
|
135 |
+
golang.org/x/text v0.16.0 h1:a94ExnEXNtEwYLGJSIUxnWoxoRz/ZcCsV63ROupILh4=
|
136 |
+
golang.org/x/text v0.16.0/go.mod h1:GhwF1Be+LQoKShO3cGOHzqOgRrGaYc9AvblQOmPVHnI=
|
137 |
+
google.golang.org/protobuf v1.34.1 h1:9ddQBjfCyZPOHPUiPxpYESBLc+T8P3E+Vo4IbKZgFWg=
|
138 |
+
google.golang.org/protobuf v1.34.1/go.mod h1:c6P6GXX6sHbq/GpV6MGZEdwhWPcYBgnhAHhKbcUYpos=
|
139 |
+
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
|
140 |
+
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk=
|
141 |
+
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q=
|
142 |
+
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
143 |
+
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
|
144 |
+
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
145 |
+
gorm.io/driver/postgres v1.5.4 h1:Iyrp9Meh3GmbSuyIAGyjkN+n9K+GHX9b9MqsTL4EJCo=
|
146 |
+
gorm.io/driver/postgres v1.5.4/go.mod h1:Bgo89+h0CRcdA33Y6frlaHHVuTdOf87pmyzwW9C/BH0=
|
147 |
+
gorm.io/gorm v1.25.5 h1:zR9lOiiYf09VNh5Q1gphfyia1JpiClIWG9hQaxB/mls=
|
148 |
+
gorm.io/gorm v1.25.5/go.mod h1:hbnx/Oo0ChWMn1BIhpy1oYozzpM15i4YPuHDmfYtwg8=
|
149 |
+
mellium.im/sasl v0.3.1 h1:wE0LW6g7U83vhvxjC1IY8DnXM+EU095yeo8XClvCdfo=
|
150 |
+
mellium.im/sasl v0.3.1/go.mod h1:xm59PUYpZHhgQ9ZqoJ5QaCqzWMi8IeS49dhp6plPCzw=
|
151 |
+
nullprogram.com/x/optparse v1.0.0/go.mod h1:KdyPE+Igbe0jQUrVfMqDMeJQIJZEuyV7pjYmp6pbG50=
|
152 |
+
rsc.io/pdf v0.1.1/go.mod h1:n8OzWcQ6Sp37PL01nO98y4iUCRdTGarVfzxY20ICaU4=
|
index.html
ADDED
@@ -0,0 +1,227 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<!DOCTYPE html>
|
2 |
+
<html lang="en">
|
3 |
+
<head>
|
4 |
+
<meta charset="UTF-8">
|
5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
6 |
+
<title>RAGTAG MVP</title>
|
7 |
+
<link rel="icon" type="image/x-icon" href="/favicon.ico">
|
8 |
+
<style>
|
9 |
+
body {
|
10 |
+
font-family: 'Courier New', monospace;
|
11 |
+
background-color: #000;
|
12 |
+
color: #00ff00;
|
13 |
+
margin: 0;
|
14 |
+
padding: 0;
|
15 |
+
height: 100vh;
|
16 |
+
display: flex;
|
17 |
+
flex-direction: column;
|
18 |
+
}
|
19 |
+
#content {
|
20 |
+
flex-grow: 1;
|
21 |
+
overflow-y: auto;
|
22 |
+
padding: 20px;
|
23 |
+
}
|
24 |
+
#chat-container {
|
25 |
+
border: 1px solid #00ff00;
|
26 |
+
padding: 10px;
|
27 |
+
margin-bottom: 10px;
|
28 |
+
min-height: 200px;
|
29 |
+
}
|
30 |
+
#input-container {
|
31 |
+
display: flex;
|
32 |
+
padding: 10px 20px;
|
33 |
+
background-color: #000;
|
34 |
+
border-top: 1px solid #00ff00;
|
35 |
+
}
|
36 |
+
#prompt {
|
37 |
+
color: #00ff00;
|
38 |
+
margin-right: 5px;
|
39 |
+
}
|
40 |
+
#user-input {
|
41 |
+
flex-grow: 1;
|
42 |
+
background-color: #000;
|
43 |
+
border: none;
|
44 |
+
color: #00ff00;
|
45 |
+
font-family: 'Courier New', monospace;
|
46 |
+
font-size: 16px;
|
47 |
+
}
|
48 |
+
#user-input:focus {
|
49 |
+
outline: none;
|
50 |
+
}
|
51 |
+
.message {
|
52 |
+
margin-bottom: 10px;
|
53 |
+
}
|
54 |
+
.user-message {
|
55 |
+
color: #ffffff;
|
56 |
+
}
|
57 |
+
.ai-message {
|
58 |
+
color: #00ff00;
|
59 |
+
}
|
60 |
+
</style>
|
61 |
+
</head>
|
62 |
+
<body>
|
63 |
+
<div id="content">
|
64 |
+
<a href="/docmanager" style="color: #00ff00; margin-bottom: 10px; display: block;">Document Manager</a>
|
65 |
+
<a href="/describer" style="color: #00ff00; margin-bottom: 10px; display: block;">Image Describer</a>
|
66 |
+
<div id="sessionInfo" style="color: #00ff00; margin-bottom: 10px;">
|
67 |
+
<div>Session ID: <span id="sessionIdDisplay"></span></div>
|
68 |
+
<div>Title Filter: <span id="titleFilterDisplay">None</span></div>
|
69 |
+
</div>
|
70 |
+
<button id="clearButton" style="background-color: #003300; color: #00ff00; border: 1px solid #00ff00; padding: 5px 10px; cursor: pointer; margin-bottom: 10px;">Clear Chat</button>
|
71 |
+
<div id="chat-container"></div>
|
72 |
+
</div>
|
73 |
+
<div id="input-container">
|
74 |
+
<span id="prompt">$</span>
|
75 |
+
<input type="text" id="user-input" placeholder="Enter your query...">
|
76 |
+
</div>
|
77 |
+
|
78 |
+
<script>
|
79 |
+
const chatContainer = document.getElementById('chat-container');
|
80 |
+
const userInput = document.getElementById('user-input');
|
81 |
+
const clearButton = document.getElementById('clearButton');
|
82 |
+
const sessionIdDisplay = document.getElementById('sessionIdDisplay');
|
83 |
+
const titleFilterDisplay = document.getElementById('titleFilterDisplay');
|
84 |
+
let queryHistory = [];
|
85 |
+
let historyIndex = -1;
|
86 |
+
let sessionId = Date.now().toString();
|
87 |
+
let titleFilter = '';
|
88 |
+
|
89 |
+
// Display initial session ID
|
90 |
+
sessionIdDisplay.textContent = sessionId;
|
91 |
+
|
92 |
+
function addMessage(sender, message) {
|
93 |
+
const messageElement = document.createElement('div');
|
94 |
+
messageElement.classList.add('message');
|
95 |
+
messageElement.classList.add(sender === 'You' ? 'user-message' : 'ai-message');
|
96 |
+
messageElement.textContent = `${sender === 'You' ? '$ ' : ''}${message}`;
|
97 |
+
chatContainer.appendChild(messageElement);
|
98 |
+
chatContainer.scrollTop = chatContainer.scrollHeight;
|
99 |
+
}
|
100 |
+
|
101 |
+
async function sendQuery(query) {
|
102 |
+
try {
|
103 |
+
const response = await fetch('/query', {
|
104 |
+
method: 'POST',
|
105 |
+
headers: {
|
106 |
+
'Content-Type': 'application/json',
|
107 |
+
},
|
108 |
+
body: JSON.stringify({ query: query, sessionId: sessionId, titleFilter: titleFilter }),
|
109 |
+
});
|
110 |
+
|
111 |
+
if (!response.ok) {
|
112 |
+
throw new Error('Network response was not ok');
|
113 |
+
}
|
114 |
+
|
115 |
+
const reader = response.body.getReader();
|
116 |
+
addMessage('AI', ''); // Add an empty AI message to start
|
117 |
+
|
118 |
+
let buffer = '';
|
119 |
+
while (true) {
|
120 |
+
const { value, done } = await reader.read();
|
121 |
+
if (done) break;
|
122 |
+
|
123 |
+
buffer += new TextDecoder().decode(value);
|
124 |
+
const lines = buffer.split('\n');
|
125 |
+
buffer = lines.pop() || '';
|
126 |
+
|
127 |
+
for (const line of lines) {
|
128 |
+
if (line.startsWith('data:')) {
|
129 |
+
const content = line.slice(5);
|
130 |
+
if (content) {
|
131 |
+
appendToLastAIMessage(content);
|
132 |
+
} else {
|
133 |
+
appendToLastAIMessage('\n');
|
134 |
+
}
|
135 |
+
}
|
136 |
+
}
|
137 |
+
}
|
138 |
+
} catch (error) {
|
139 |
+
console.error('Error:', error);
|
140 |
+
addMessage('System', 'An error occurred while processing your request.');
|
141 |
+
}
|
142 |
+
}
|
143 |
+
|
144 |
+
function appendToLastAIMessage(content) {
|
145 |
+
const messages = document.querySelectorAll('.message');
|
146 |
+
const lastAIMessage = Array.from(messages).reverse().find(msg => msg.classList.contains('ai-message'));
|
147 |
+
if (lastAIMessage) {
|
148 |
+
if (content === '\n') {
|
149 |
+
lastAIMessage.appendChild(document.createElement('br'));
|
150 |
+
} else {
|
151 |
+
lastAIMessage.appendChild(document.createTextNode(content));
|
152 |
+
}
|
153 |
+
lastAIMessage.scrollIntoView({ behavior: 'smooth', block: 'end' });
|
154 |
+
} else {
|
155 |
+
addMessage('AI', content);
|
156 |
+
}
|
157 |
+
}
|
158 |
+
|
159 |
+
async function clearSession() {
|
160 |
+
try {
|
161 |
+
const response = await fetch('/clear_session', {
|
162 |
+
method: 'POST',
|
163 |
+
headers: {
|
164 |
+
'Content-Type': 'application/json',
|
165 |
+
},
|
166 |
+
body: JSON.stringify({ sessionId: sessionId }),
|
167 |
+
});
|
168 |
+
|
169 |
+
if (!response.ok) {
|
170 |
+
throw new Error('Network response was not ok');
|
171 |
+
}
|
172 |
+
|
173 |
+
chatContainer.innerHTML = '';
|
174 |
+
queryHistory = [];
|
175 |
+
historyIndex = -1;
|
176 |
+
sessionId = Date.now().toString();
|
177 |
+
titleFilter = '';
|
178 |
+
sessionIdDisplay.textContent = sessionId;
|
179 |
+
titleFilterDisplay.textContent = 'None';
|
180 |
+
addMessage('System', 'Chat session cleared.');
|
181 |
+
} catch (error) {
|
182 |
+
console.error('Error:', error);
|
183 |
+
addMessage('System', 'An error occurred while clearing the session.');
|
184 |
+
}
|
185 |
+
}
|
186 |
+
|
187 |
+
function processQuery(query) {
|
188 |
+
if (query.includes('@')) {
|
189 |
+
const parts = query.split('@');
|
190 |
+
if (parts.length > 1) {
|
191 |
+
titleFilter = parts[1].split(' ')[0];
|
192 |
+
titleFilterDisplay.textContent = titleFilter;
|
193 |
+
}
|
194 |
+
}
|
195 |
+
return query;
|
196 |
+
}
|
197 |
+
|
198 |
+
userInput.addEventListener('keydown', (e) => {
|
199 |
+
if (e.key === 'Enter') {
|
200 |
+
const query = userInput.value.trim();
|
201 |
+
if (query) {
|
202 |
+
addMessage('You', query);
|
203 |
+
const processedQuery = processQuery(query);
|
204 |
+
sendQuery(processedQuery);
|
205 |
+
queryHistory.unshift(query);
|
206 |
+
historyIndex = -1;
|
207 |
+
userInput.value = '';
|
208 |
+
}
|
209 |
+
} else if (e.key === 'ArrowUp') {
|
210 |
+
e.preventDefault();
|
211 |
+
if (historyIndex < queryHistory.length - 1) {
|
212 |
+
historyIndex++;
|
213 |
+
userInput.value = queryHistory[historyIndex];
|
214 |
+
}
|
215 |
+
} else if (e.key === 'ArrowDown') {
|
216 |
+
e.preventDefault();
|
217 |
+
if (historyIndex > -1) {
|
218 |
+
historyIndex--;
|
219 |
+
userInput.value = historyIndex === -1 ? '' : queryHistory[historyIndex];
|
220 |
+
}
|
221 |
+
}
|
222 |
+
});
|
223 |
+
|
224 |
+
clearButton.addEventListener('click', clearSession);
|
225 |
+
</script>
|
226 |
+
</body>
|
227 |
+
</html>
|
init-db.sql
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
CREATE EXTENSION IF NOT EXISTS vector;
|
2 |
+
|
3 |
+
CREATE TABLE IF NOT EXISTS items (
|
4 |
+
id SERIAL PRIMARY KEY,
|
5 |
+
title TEXT,
|
6 |
+
doc TEXT,
|
7 |
+
embedding vector(4096)
|
8 |
+
);
|
main.go
ADDED
@@ -0,0 +1,698 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
package main
|
2 |
+
|
3 |
+
import (
|
4 |
+
"context"
|
5 |
+
"fmt"
|
6 |
+
"io"
|
7 |
+
"log"
|
8 |
+
"net/http"
|
9 |
+
"net/url"
|
10 |
+
"os"
|
11 |
+
"path/filepath"
|
12 |
+
"strings"
|
13 |
+
|
14 |
+
got "github.com/joho/godotenv"
|
15 |
+
|
16 |
+
"bytes"
|
17 |
+
"encoding/base64"
|
18 |
+
"encoding/json"
|
19 |
+
|
20 |
+
"unicode/utf8"
|
21 |
+
|
22 |
+
"github.com/gin-gonic/gin"
|
23 |
+
"github.com/jackc/pgx/v5"
|
24 |
+
"github.com/ledongthuc/pdf"
|
25 |
+
"github.com/ollama/ollama/api"
|
26 |
+
"github.com/pgvector/pgvector-go"
|
27 |
+
)
|
28 |
+
|
29 |
+
// @title RAGTAG API
|
30 |
+
// @version 1.0
|
31 |
+
// @description This is the API for the RAGTAG system.
|
32 |
+
// @termsOfService http://swagger.io/terms/
|
33 |
+
|
34 |
+
// @contact.name James Campbell
|
35 |
+
// @contact.email [email protected]
|
36 |
+
|
37 |
+
// @license.name Apache 2.0
|
38 |
+
// @license.url http://www.apache.org/licenses/LICENSE-2.0.html
|
39 |
+
|
40 |
+
// @host localhost:8080
|
41 |
+
// @BasePath /
|
42 |
+
|
43 |
+
type Session struct {
|
44 |
+
Messages []api.Message
|
45 |
+
TitleFilter string
|
46 |
+
}
|
47 |
+
|
48 |
+
var sessions = make(map[string]*Session)
|
49 |
+
|
50 |
+
func generateEmbedding(input string) ([]float32, error) {
|
51 |
+
ollamaHost := os.Getenv("OLLAMA_HOST")
|
52 |
+
if ollamaHost == "" {
|
53 |
+
ollamaHost = "localhost" // fallback to localhost if not set
|
54 |
+
}
|
55 |
+
ollamaURL, err := url.Parse(fmt.Sprintf("http://%s:11434", ollamaHost))
|
56 |
+
if err != nil {
|
57 |
+
return nil, err
|
58 |
+
}
|
59 |
+
client := api.NewClient(ollamaURL, http.DefaultClient)
|
60 |
+
|
61 |
+
// Create an embedding request
|
62 |
+
req := &api.EmbedRequest{
|
63 |
+
Model: "llama3.1", // Ensure this is an embedding-capable model
|
64 |
+
Input: input,
|
65 |
+
}
|
66 |
+
|
67 |
+
// Call the Embed function
|
68 |
+
resp, err := client.Embed(context.Background(), req)
|
69 |
+
if err != nil {
|
70 |
+
return nil, err
|
71 |
+
}
|
72 |
+
|
73 |
+
return resp.Embeddings[0], nil
|
74 |
+
}
|
75 |
+
|
76 |
+
func insertItem(conn *pgx.Conn, title string, docText string, embedding []float32) error {
|
77 |
+
// Combine title and docText for embedding
|
78 |
+
combinedText := title + " " + docText
|
79 |
+
|
80 |
+
_, err := conn.Exec(context.Background(),
|
81 |
+
"INSERT INTO items (title, doc, embedding) VALUES ($1, $2, $3)",
|
82 |
+
title, combinedText, pgvector.NewVector(embedding))
|
83 |
+
|
84 |
+
return err
|
85 |
+
}
|
86 |
+
|
87 |
+
func queryEmbeddings(conn *pgx.Conn, query string, session *Session, c *gin.Context) error {
|
88 |
+
// Generate embedding for the query
|
89 |
+
queryEmbedding, err := generateEmbedding(query)
|
90 |
+
if err != nil {
|
91 |
+
return err
|
92 |
+
}
|
93 |
+
|
94 |
+
// Prepare the SQL query
|
95 |
+
sqlQuery := "SELECT doc, COALESCE(title, 'Untitled') FROM items"
|
96 |
+
if session.TitleFilter != "" {
|
97 |
+
sqlQuery += fmt.Sprintf(" WHERE title LIKE '%%%s%%'", session.TitleFilter)
|
98 |
+
}
|
99 |
+
sqlQuery += " ORDER BY embedding <-> $1 LIMIT 5"
|
100 |
+
|
101 |
+
// Query the database for similar documents
|
102 |
+
rows, err := conn.Query(context.Background(), sqlQuery, pgvector.NewVector(queryEmbedding))
|
103 |
+
if err != nil {
|
104 |
+
return err
|
105 |
+
}
|
106 |
+
defer rows.Close()
|
107 |
+
|
108 |
+
var docs []string
|
109 |
+
var sources []string
|
110 |
+
for rows.Next() {
|
111 |
+
var doc, title string
|
112 |
+
if err := rows.Scan(&doc, &title); err != nil {
|
113 |
+
return err
|
114 |
+
}
|
115 |
+
docs = append(docs, doc)
|
116 |
+
sources = append(sources, fmt.Sprintf("Source: %s", title))
|
117 |
+
}
|
118 |
+
|
119 |
+
// Combine the retrieved documents
|
120 |
+
contextText := strings.Join(docs, "\n\n")
|
121 |
+
|
122 |
+
// Create a chat request
|
123 |
+
ollamaHost := os.Getenv("OLLAMA_HOST")
|
124 |
+
if ollamaHost == "" {
|
125 |
+
ollamaHost = "localhost" // fallback to localhost if not set
|
126 |
+
}
|
127 |
+
ollamaURL, err := url.Parse(fmt.Sprintf("http://%s:11434", ollamaHost))
|
128 |
+
if err != nil {
|
129 |
+
return err
|
130 |
+
}
|
131 |
+
client := api.NewClient(ollamaURL, http.DefaultClient)
|
132 |
+
|
133 |
+
// Add the new query to the session
|
134 |
+
session.Messages = append(session.Messages, api.Message{Role: "user", Content: query})
|
135 |
+
|
136 |
+
// Prepare the messages for the chat request
|
137 |
+
messages := []api.Message{
|
138 |
+
{Role: "system", Content: "You are an assistant that answers questions based on the given context."},
|
139 |
+
{Role: "user", Content: "Here's the context:\n" + contextText},
|
140 |
+
}
|
141 |
+
messages = append(messages, session.Messages...)
|
142 |
+
|
143 |
+
req := &api.ChatRequest{
|
144 |
+
Model: "llama3.1",
|
145 |
+
Messages: messages,
|
146 |
+
Stream: new(bool), // Use new(bool) to create a pointer to a boolean
|
147 |
+
}
|
148 |
+
*req.Stream = true // Set the value to true
|
149 |
+
|
150 |
+
// Call the Chat function with streaming
|
151 |
+
err = client.Chat(context.Background(), req, func(resp api.ChatResponse) error {
|
152 |
+
// Send the raw content without any modifications
|
153 |
+
if resp.Message.Content != "" {
|
154 |
+
c.SSEvent("message", resp.Message.Content)
|
155 |
+
c.Writer.Flush() // Ensure the content is sent immediately
|
156 |
+
}
|
157 |
+
return nil
|
158 |
+
})
|
159 |
+
if err != nil {
|
160 |
+
return err
|
161 |
+
}
|
162 |
+
|
163 |
+
// Add the AI response to the session
|
164 |
+
session.Messages = append(session.Messages, api.Message{Role: "assistant", Content: "Response sent via streaming"})
|
165 |
+
|
166 |
+
return nil
|
167 |
+
}
|
168 |
+
|
169 |
+
func getDocuments(conn *pgx.Conn) ([]map[string]interface{}, error) {
|
170 |
+
rows, err := conn.Query(context.Background(), "SELECT DISTINCT ON (SPLIT_PART(title, '_chunk_', 1)) SPLIT_PART(title, '_chunk_', 1) as title, COUNT(*) as count FROM items GROUP BY SPLIT_PART(title, '_chunk_', 1)")
|
171 |
+
if err != nil {
|
172 |
+
return nil, err
|
173 |
+
}
|
174 |
+
defer rows.Close()
|
175 |
+
|
176 |
+
var documents []map[string]interface{}
|
177 |
+
for rows.Next() {
|
178 |
+
var title string
|
179 |
+
var count int
|
180 |
+
if err := rows.Scan(&title, &count); err != nil {
|
181 |
+
return nil, err
|
182 |
+
}
|
183 |
+
documents = append(documents, map[string]interface{}{
|
184 |
+
"title": title,
|
185 |
+
"count": count,
|
186 |
+
})
|
187 |
+
}
|
188 |
+
|
189 |
+
return documents, nil
|
190 |
+
}
|
191 |
+
|
192 |
+
func deleteDocument(conn *pgx.Conn, title string) error {
|
193 |
+
_, err := conn.Exec(context.Background(), "DELETE FROM items WHERE title LIKE $1 || '%'", title)
|
194 |
+
return err
|
195 |
+
}
|
196 |
+
|
197 |
+
func uploadDocument(c *gin.Context, conn *pgx.Conn) {
|
198 |
+
title := c.PostForm("title")
|
199 |
+
file, header, err := c.Request.FormFile("file")
|
200 |
+
if err != nil {
|
201 |
+
log.Printf("Error getting file: %v", err)
|
202 |
+
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
203 |
+
return
|
204 |
+
}
|
205 |
+
defer file.Close()
|
206 |
+
|
207 |
+
// Create uploads directory if it doesn't exist
|
208 |
+
uploadsDir := "uploads"
|
209 |
+
if err := os.MkdirAll(uploadsDir, 0755); err != nil {
|
210 |
+
log.Printf("Error creating uploads directory: %v", err)
|
211 |
+
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to create uploads directory"})
|
212 |
+
return
|
213 |
+
}
|
214 |
+
|
215 |
+
filename := filepath.Join(uploadsDir, header.Filename)
|
216 |
+
out, err := os.Create(filename)
|
217 |
+
if err != nil {
|
218 |
+
log.Printf("Error creating file: %v", err)
|
219 |
+
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
220 |
+
return
|
221 |
+
}
|
222 |
+
defer out.Close()
|
223 |
+
|
224 |
+
_, err = io.Copy(out, file)
|
225 |
+
if err != nil {
|
226 |
+
log.Printf("Error copying file: %v", err)
|
227 |
+
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
228 |
+
return
|
229 |
+
}
|
230 |
+
|
231 |
+
// Debug: log file size and first 16 bytes
|
232 |
+
stat, statErr := os.Stat(filename)
|
233 |
+
if statErr == nil {
|
234 |
+
log.Printf("Uploaded file size: %d bytes", stat.Size())
|
235 |
+
fcheck, ferr := os.Open(filename)
|
236 |
+
if ferr == nil {
|
237 |
+
buf := make([]byte, 16)
|
238 |
+
n, _ := fcheck.Read(buf)
|
239 |
+
log.Printf("First 16 bytes: % x", buf[:n])
|
240 |
+
fcheck.Close()
|
241 |
+
}
|
242 |
+
}
|
243 |
+
|
244 |
+
var textContent string
|
245 |
+
ext := strings.ToLower(filepath.Ext(filename))
|
246 |
+
if ext == ".jpg" || ext == ".jpeg" || ext == ".png" {
|
247 |
+
// Generate image summary using the llava model
|
248 |
+
summary, err := generateImageSummary(filename)
|
249 |
+
if err != nil {
|
250 |
+
log.Printf("Error generating image summary: %v", err)
|
251 |
+
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
252 |
+
return
|
253 |
+
}
|
254 |
+
textContent = summary
|
255 |
+
} else if ext == ".pdf" {
|
256 |
+
// Check PDF signature before parsing
|
257 |
+
f, rErr := os.Open(filename)
|
258 |
+
if rErr != nil {
|
259 |
+
log.Printf("Error opening PDF: %v", rErr)
|
260 |
+
c.JSON(http.StatusInternalServerError, gin.H{"error": rErr.Error()})
|
261 |
+
return
|
262 |
+
}
|
263 |
+
defer f.Close()
|
264 |
+
buf := make([]byte, 5)
|
265 |
+
_, err := f.Read(buf)
|
266 |
+
if err != nil || string(buf) != "%PDF-" {
|
267 |
+
log.Printf("Uploaded file is not a valid PDF (missing %PDF- header): %s", filename)
|
268 |
+
c.JSON(http.StatusBadRequest, gin.H{"error": "Uploaded file is not a valid PDF (missing %PDF- header)"})
|
269 |
+
return
|
270 |
+
}
|
271 |
+
stat, _ := f.Stat()
|
272 |
+
// Loosen EOF check: search last 1KB for %%EOF
|
273 |
+
eofCheckSize := int64(1024)
|
274 |
+
if stat.Size() < eofCheckSize {
|
275 |
+
eofCheckSize = stat.Size()
|
276 |
+
}
|
277 |
+
endBuf := make([]byte, eofCheckSize)
|
278 |
+
_, err = f.ReadAt(endBuf, stat.Size()-eofCheckSize)
|
279 |
+
if err != nil || !strings.Contains(string(endBuf), "%%EOF") {
|
280 |
+
log.Printf("Uploaded file is not a valid PDF (missing %%EOF): %s", filename)
|
281 |
+
c.JSON(http.StatusBadRequest, gin.H{"error": "Uploaded file is not a valid PDF (missing %%EOF)"})
|
282 |
+
return
|
283 |
+
}
|
284 |
+
// Reset file pointer for pdf.NewReader
|
285 |
+
f.Seek(0, 0)
|
286 |
+
reader, pdfErr := pdf.NewReader(f, stat.Size())
|
287 |
+
if pdfErr != nil {
|
288 |
+
log.Printf("Error reading PDF: %v", pdfErr)
|
289 |
+
c.JSON(http.StatusBadRequest, gin.H{"error": "Uploaded file is not a valid PDF or is corrupted"})
|
290 |
+
return
|
291 |
+
}
|
292 |
+
var sb strings.Builder
|
293 |
+
for i := 1; i <= reader.NumPage(); i++ {
|
294 |
+
page := reader.Page(i)
|
295 |
+
if page.V.IsNull() {
|
296 |
+
continue
|
297 |
+
}
|
298 |
+
content, err := page.GetPlainText(nil)
|
299 |
+
if err != nil {
|
300 |
+
log.Printf("Error extracting text from page %d: %v", i, err)
|
301 |
+
continue
|
302 |
+
}
|
303 |
+
sb.WriteString(content)
|
304 |
+
}
|
305 |
+
textContent = sb.String()
|
306 |
+
log.Printf("Extracted text length: %d", len(textContent))
|
307 |
+
} else {
|
308 |
+
content, err := os.ReadFile(filename)
|
309 |
+
if err != nil {
|
310 |
+
log.Printf("Error reading file: %v", err)
|
311 |
+
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
312 |
+
return
|
313 |
+
}
|
314 |
+
textContent = string(content)
|
315 |
+
}
|
316 |
+
|
317 |
+
// Remove null bytes (Postgres TEXT cannot contain 0x00)
|
318 |
+
textContent = strings.ReplaceAll(textContent, "\x00", "")
|
319 |
+
|
320 |
+
// Validate UTF-8
|
321 |
+
if !utf8.ValidString(textContent) {
|
322 |
+
log.Printf("Invalid UTF-8 detected in document: %s", filename)
|
323 |
+
c.JSON(http.StatusBadRequest, gin.H{"error": "Uploaded document is not valid UTF-8"})
|
324 |
+
return
|
325 |
+
}
|
326 |
+
|
327 |
+
// Generate embedding for the text content using llama3.1
|
328 |
+
embedding, err := generateEmbedding(textContent)
|
329 |
+
if err != nil {
|
330 |
+
log.Printf("Error generating embedding: %v", err)
|
331 |
+
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
332 |
+
return
|
333 |
+
}
|
334 |
+
|
335 |
+
// Insert the document into the database
|
336 |
+
err = insertItem(conn, title, textContent, embedding)
|
337 |
+
if err != nil {
|
338 |
+
log.Printf("Error inserting item: %v", err)
|
339 |
+
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
340 |
+
return
|
341 |
+
}
|
342 |
+
|
343 |
+
c.JSON(http.StatusOK, gin.H{"message": "Document uploaded and processed successfully"})
|
344 |
+
}
|
345 |
+
|
346 |
+
func chunkText(text string, chunkSize int) []string {
|
347 |
+
words := strings.Fields(text)
|
348 |
+
var chunks []string
|
349 |
+
for i := 0; i < len(words); i += chunkSize {
|
350 |
+
end := i + chunkSize
|
351 |
+
if end > len(words) {
|
352 |
+
end = len(words)
|
353 |
+
}
|
354 |
+
chunks = append(chunks, strings.Join(words[i:end], " "))
|
355 |
+
}
|
356 |
+
return chunks
|
357 |
+
}
|
358 |
+
|
359 |
+
func generateImageSummary(imagePath string) (string, error) {
|
360 |
+
imageData, err := os.ReadFile(imagePath)
|
361 |
+
if err != nil {
|
362 |
+
return "", fmt.Errorf("failed to read image file: %w", err)
|
363 |
+
}
|
364 |
+
|
365 |
+
base64Image := base64.StdEncoding.EncodeToString(imageData)
|
366 |
+
|
367 |
+
payload := map[string]interface{}{
|
368 |
+
"model": "llava",
|
369 |
+
"prompt": "Describe this image in detail:",
|
370 |
+
"images": []string{base64Image},
|
371 |
+
"stream": true,
|
372 |
+
}
|
373 |
+
|
374 |
+
jsonPayload, err := json.Marshal(payload)
|
375 |
+
if err != nil {
|
376 |
+
return "", fmt.Errorf("failed to marshal JSON payload: %w", err)
|
377 |
+
}
|
378 |
+
|
379 |
+
ollamaHost := os.Getenv("OLLAMA_HOST")
|
380 |
+
if ollamaHost == "" {
|
381 |
+
ollamaHost = "localhost"
|
382 |
+
}
|
383 |
+
url := fmt.Sprintf("http://%s:11434/api/generate", ollamaHost)
|
384 |
+
|
385 |
+
resp, err := http.Post(url, "application/json", bytes.NewBuffer(jsonPayload))
|
386 |
+
if err != nil {
|
387 |
+
return "", fmt.Errorf("failed to send POST request: %w", err)
|
388 |
+
}
|
389 |
+
defer resp.Body.Close()
|
390 |
+
|
391 |
+
if resp.StatusCode != http.StatusOK {
|
392 |
+
body, _ := io.ReadAll(resp.Body)
|
393 |
+
return "", fmt.Errorf("unexpected response status: %d, body: %s", resp.StatusCode, string(body))
|
394 |
+
}
|
395 |
+
|
396 |
+
var summary strings.Builder
|
397 |
+
decoder := json.NewDecoder(resp.Body)
|
398 |
+
for {
|
399 |
+
var result struct {
|
400 |
+
Response string `json:"response"`
|
401 |
+
Done bool `json:"done"`
|
402 |
+
}
|
403 |
+
if err := decoder.Decode(&result); err != nil {
|
404 |
+
if err == io.EOF {
|
405 |
+
break
|
406 |
+
}
|
407 |
+
return "", fmt.Errorf("failed to decode JSON response: %w", err)
|
408 |
+
}
|
409 |
+
summary.WriteString(result.Response)
|
410 |
+
if result.Done {
|
411 |
+
break
|
412 |
+
}
|
413 |
+
}
|
414 |
+
|
415 |
+
if summary.Len() == 0 {
|
416 |
+
return "", fmt.Errorf("empty response from llava model")
|
417 |
+
}
|
418 |
+
|
419 |
+
fmt.Println("The summary of the image is: ", summary.String())
|
420 |
+
|
421 |
+
return summary.String(), nil
|
422 |
+
}
|
423 |
+
|
424 |
+
func main() {
|
425 |
+
// Set up the database connection
|
426 |
+
// load env variables
|
427 |
+
got.Load()
|
428 |
+
conn, err := pgx.Connect(context.Background(), os.Getenv("DB_URL"))
|
429 |
+
if err != nil {
|
430 |
+
log.Fatal("Unable to connect to database:", err)
|
431 |
+
}
|
432 |
+
defer conn.Close(context.Background())
|
433 |
+
|
434 |
+
// Set up the Gin router
|
435 |
+
r := gin.Default()
|
436 |
+
|
437 |
+
// Define the /add_document endpoint
|
438 |
+
r.POST("/add_document", func(c *gin.Context) {
|
439 |
+
var request struct {
|
440 |
+
Title string `json:"title"`
|
441 |
+
DocText string `json:"doc_text"`
|
442 |
+
}
|
443 |
+
if err := c.BindJSON(&request); err != nil {
|
444 |
+
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
445 |
+
return
|
446 |
+
}
|
447 |
+
|
448 |
+
// Generate the embedding
|
449 |
+
embedding, err := generateEmbedding(request.DocText)
|
450 |
+
if err != nil {
|
451 |
+
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
452 |
+
return
|
453 |
+
}
|
454 |
+
|
455 |
+
// Insert the document and its embedding into the items table
|
456 |
+
err = insertItem(conn, request.Title, request.DocText, embedding)
|
457 |
+
if err != nil {
|
458 |
+
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
459 |
+
return
|
460 |
+
}
|
461 |
+
|
462 |
+
c.JSON(http.StatusOK, gin.H{"message": "Document chunk embedded and stored successfully!"})
|
463 |
+
})
|
464 |
+
|
465 |
+
// Add the new /query endpoint
|
466 |
+
r.POST("/query", func(c *gin.Context) {
|
467 |
+
var request struct {
|
468 |
+
Query string `json:"query"`
|
469 |
+
SessionID string `json:"sessionId"`
|
470 |
+
}
|
471 |
+
if err := c.BindJSON(&request); err != nil {
|
472 |
+
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
473 |
+
return
|
474 |
+
}
|
475 |
+
|
476 |
+
session, ok := sessions[request.SessionID]
|
477 |
+
if !ok {
|
478 |
+
session = &Session{
|
479 |
+
Messages: []api.Message{
|
480 |
+
{Role: "system", Content: "You are an assistant that answers questions based on the given context."},
|
481 |
+
},
|
482 |
+
TitleFilter: "",
|
483 |
+
}
|
484 |
+
sessions[request.SessionID] = session
|
485 |
+
}
|
486 |
+
|
487 |
+
// Check for @title in the query
|
488 |
+
if strings.Contains(request.Query, "@") {
|
489 |
+
parts := strings.Split(request.Query, "@")
|
490 |
+
if len(parts) > 1 {
|
491 |
+
session.TitleFilter = strings.Split(parts[1], " ")[0]
|
492 |
+
request.Query = strings.Replace(request.Query, "@"+session.TitleFilter, "", 1)
|
493 |
+
}
|
494 |
+
}
|
495 |
+
|
496 |
+
c.Header("Content-Type", "text/event-stream")
|
497 |
+
c.Header("Cache-Control", "no-cache")
|
498 |
+
c.Header("Connection", "keep-alive")
|
499 |
+
c.Header("Access-Control-Allow-Origin", "*")
|
500 |
+
c.Header("Access-Control-Allow-Credentials", "true")
|
501 |
+
c.Header("Access-Control-Allow-Headers", "Content-Type")
|
502 |
+
c.Header("Access-Control-Allow-Methods", "POST")
|
503 |
+
c.Header("encoding", "chunked")
|
504 |
+
|
505 |
+
err := queryEmbeddings(conn, request.Query, session, c)
|
506 |
+
if err != nil {
|
507 |
+
c.SSEvent("error", err.Error())
|
508 |
+
}
|
509 |
+
c.SSEvent("done", "")
|
510 |
+
})
|
511 |
+
|
512 |
+
// Serve the index.html file
|
513 |
+
r.GET("/", func(c *gin.Context) {
|
514 |
+
c.File("index.html")
|
515 |
+
})
|
516 |
+
|
517 |
+
// Serve the docmanager.html file
|
518 |
+
r.GET("/docmanager", func(c *gin.Context) {
|
519 |
+
c.File("docmanager.html")
|
520 |
+
})
|
521 |
+
|
522 |
+
// Add a new endpoint to fetch documents
|
523 |
+
r.GET("/documents", func(c *gin.Context) {
|
524 |
+
documents, err := getDocuments(conn)
|
525 |
+
if err != nil {
|
526 |
+
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
527 |
+
return
|
528 |
+
}
|
529 |
+
c.JSON(http.StatusOK, documents)
|
530 |
+
})
|
531 |
+
|
532 |
+
// Add a new endpoint to delete documents
|
533 |
+
r.POST("/delete_document", func(c *gin.Context) {
|
534 |
+
var request struct {
|
535 |
+
Title string `json:"title"`
|
536 |
+
}
|
537 |
+
if err := c.BindJSON(&request); err != nil {
|
538 |
+
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
539 |
+
return
|
540 |
+
}
|
541 |
+
|
542 |
+
err := deleteDocument(conn, request.Title)
|
543 |
+
if err != nil {
|
544 |
+
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
545 |
+
return
|
546 |
+
}
|
547 |
+
|
548 |
+
c.JSON(http.StatusOK, gin.H{"message": "Document deleted successfully"})
|
549 |
+
})
|
550 |
+
|
551 |
+
// Add a new endpoint to upload documents
|
552 |
+
r.POST("/upload_document", func(c *gin.Context) {
|
553 |
+
uploadDocument(c, conn)
|
554 |
+
})
|
555 |
+
|
556 |
+
// Add a new endpoint to clear the chat session
|
557 |
+
r.POST("/clear_session", func(c *gin.Context) {
|
558 |
+
var request struct {
|
559 |
+
SessionID string `json:"sessionId"`
|
560 |
+
}
|
561 |
+
if err := c.BindJSON(&request); err != nil {
|
562 |
+
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
563 |
+
return
|
564 |
+
}
|
565 |
+
|
566 |
+
delete(sessions, request.SessionID)
|
567 |
+
c.JSON(http.StatusOK, gin.H{"message": "Chat session cleared successfully"})
|
568 |
+
})
|
569 |
+
|
570 |
+
// Add a new endpoint to check if Twitter data exists
|
571 |
+
r.GET("/check_data", func(c *gin.Context) {
|
572 |
+
rows, err := conn.Query(context.Background(), "SELECT DISTINCT title FROM items WHERE title LIKE '%twitter%'")
|
573 |
+
if err != nil {
|
574 |
+
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
575 |
+
return
|
576 |
+
}
|
577 |
+
defer rows.Close()
|
578 |
+
|
579 |
+
var titles []string
|
580 |
+
for rows.Next() {
|
581 |
+
var title string
|
582 |
+
if err := rows.Scan(&title); err != nil {
|
583 |
+
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
584 |
+
return
|
585 |
+
}
|
586 |
+
titles = append(titles, title)
|
587 |
+
}
|
588 |
+
|
589 |
+
c.JSON(http.StatusOK, gin.H{"twitter_titles": titles})
|
590 |
+
})
|
591 |
+
|
592 |
+
// Serve the describer.html file
|
593 |
+
r.GET("/describer", func(c *gin.Context) {
|
594 |
+
c.File("describer.html")
|
595 |
+
})
|
596 |
+
|
597 |
+
// Handle image description
|
598 |
+
r.POST("/describe_image", func(c *gin.Context) {
|
599 |
+
file, _, err := c.Request.FormFile("file")
|
600 |
+
if err != nil {
|
601 |
+
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
602 |
+
return
|
603 |
+
}
|
604 |
+
defer file.Close()
|
605 |
+
|
606 |
+
// Create a temporary file to store the uploaded image
|
607 |
+
tempFile, err := os.CreateTemp("", "uploaded-*.jpg")
|
608 |
+
if err != nil {
|
609 |
+
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
610 |
+
return
|
611 |
+
}
|
612 |
+
defer os.Remove(tempFile.Name())
|
613 |
+
defer tempFile.Close()
|
614 |
+
|
615 |
+
// Copy the uploaded file to the temporary file
|
616 |
+
_, err = io.Copy(tempFile, file)
|
617 |
+
if err != nil {
|
618 |
+
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
619 |
+
return
|
620 |
+
}
|
621 |
+
|
622 |
+
c.Header("Content-Type", "text/event-stream")
|
623 |
+
c.Header("Cache-Control", "no-cache")
|
624 |
+
c.Header("Connection", "keep-alive")
|
625 |
+
c.Header("Access-Control-Allow-Origin", "*")
|
626 |
+
c.Header("Access-Control-Allow-Credentials", "true")
|
627 |
+
c.Header("Access-Control-Allow-Headers", "Content-Type")
|
628 |
+
c.Header("Access-Control-Allow-Methods", "POST")
|
629 |
+
c.Header("encoding", "chunked")
|
630 |
+
|
631 |
+
imageData, err := os.ReadFile(tempFile.Name())
|
632 |
+
if err != nil {
|
633 |
+
c.SSEvent("error", err.Error())
|
634 |
+
return
|
635 |
+
}
|
636 |
+
|
637 |
+
base64Image := base64.StdEncoding.EncodeToString(imageData)
|
638 |
+
|
639 |
+
payload := map[string]interface{}{
|
640 |
+
"model": "llava",
|
641 |
+
"prompt": "Describe this image in detail:",
|
642 |
+
"images": []string{base64Image},
|
643 |
+
"stream": true,
|
644 |
+
}
|
645 |
+
|
646 |
+
jsonPayload, err := json.Marshal(payload)
|
647 |
+
if err != nil {
|
648 |
+
c.SSEvent("error", err.Error())
|
649 |
+
return
|
650 |
+
}
|
651 |
+
|
652 |
+
ollamaHost := os.Getenv("OLLAMA_HOST")
|
653 |
+
if ollamaHost == "" {
|
654 |
+
ollamaHost = "localhost"
|
655 |
+
}
|
656 |
+
url := fmt.Sprintf("http://%s:11434/api/generate", ollamaHost)
|
657 |
+
|
658 |
+
resp, err := http.Post(url, "application/json", bytes.NewBuffer(jsonPayload))
|
659 |
+
if err != nil {
|
660 |
+
c.SSEvent("error", err.Error())
|
661 |
+
return
|
662 |
+
}
|
663 |
+
defer resp.Body.Close()
|
664 |
+
|
665 |
+
if resp.StatusCode != http.StatusOK {
|
666 |
+
body, _ := io.ReadAll(resp.Body)
|
667 |
+
c.SSEvent("error", fmt.Sprintf("Unexpected response status: %d, body: %s", resp.StatusCode, string(body)))
|
668 |
+
return
|
669 |
+
}
|
670 |
+
|
671 |
+
decoder := json.NewDecoder(resp.Body)
|
672 |
+
for {
|
673 |
+
var result struct {
|
674 |
+
Response string `json:"response"`
|
675 |
+
Done bool `json:"done"`
|
676 |
+
}
|
677 |
+
if err := decoder.Decode(&result); err != nil {
|
678 |
+
if err == io.EOF {
|
679 |
+
break
|
680 |
+
}
|
681 |
+
c.SSEvent("error", err.Error())
|
682 |
+
return
|
683 |
+
}
|
684 |
+
if result.Response != "" {
|
685 |
+
c.SSEvent("message", result.Response)
|
686 |
+
c.Writer.Flush() // Ensure the content is sent immediately
|
687 |
+
}
|
688 |
+
if result.Done {
|
689 |
+
break
|
690 |
+
}
|
691 |
+
}
|
692 |
+
|
693 |
+
c.SSEvent("done", "")
|
694 |
+
})
|
695 |
+
|
696 |
+
// Run the Gin server
|
697 |
+
r.Run(":8080")
|
698 |
+
}
|
ollama-entrypoint.sh
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/bin/bash
|
2 |
+
set -e
|
3 |
+
|
4 |
+
# Start Ollama in the background
|
5 |
+
ollama serve &
|
6 |
+
|
7 |
+
# Wait for Ollama to be ready
|
8 |
+
until curl -s -o /dev/null -w "%{http_code}" http://localhost:11434/api/tags | grep -q "200"; do
|
9 |
+
echo "Waiting for Ollama to be ready..."
|
10 |
+
sleep 5
|
11 |
+
done
|
12 |
+
|
13 |
+
# Pull the llama3.1 model
|
14 |
+
ollama pull llama3.1
|
15 |
+
ollama pull llava
|
16 |
+
# Keep the container running
|
17 |
+
wait
|
ragtag4
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:85c0b559c6b2545fe5174963fd47313c490dda984c16bbd4db0d2b8c9eaa56dd
|
3 |
+
size 18022930
|
screenplays/downloader.py
ADDED
@@ -0,0 +1,63 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
import sys
|
3 |
+
import requests
|
4 |
+
from bs4 import BeautifulSoup
|
5 |
+
import argparse
|
6 |
+
|
7 |
+
# Default URL of the script to download
|
8 |
+
DEFAULT_SCRIPT_URL = "https://www.dailyscript.com/scripts/twelve_monkeys.html"
|
9 |
+
|
10 |
+
# Directory to save the downloaded script
|
11 |
+
save_dir = "scripts"
|
12 |
+
os.makedirs(save_dir, exist_ok=True)
|
13 |
+
|
14 |
+
# Function to download a script
|
15 |
+
def download_script(script_url, save_dir):
|
16 |
+
response = requests.get(script_url)
|
17 |
+
response.raise_for_status()
|
18 |
+
|
19 |
+
# Parse the HTML content
|
20 |
+
soup = BeautifulSoup(response.text, "html.parser")
|
21 |
+
|
22 |
+
# Try to find the <pre> tag
|
23 |
+
pre_tag = soup.find("pre")
|
24 |
+
|
25 |
+
if pre_tag:
|
26 |
+
# Extract the text content from the <pre> tag
|
27 |
+
text_content = pre_tag.get_text()
|
28 |
+
# Remove the line numbers
|
29 |
+
lines = text_content.split("\n")
|
30 |
+
cleaned_lines = [line.split("|", 1)[-1] for line in lines]
|
31 |
+
cleaned_text = "\n".join(cleaned_lines)
|
32 |
+
else:
|
33 |
+
# If no <pre> tag, get the text from the body
|
34 |
+
body = soup.find("body")
|
35 |
+
if body:
|
36 |
+
cleaned_text = body.get_text()
|
37 |
+
else:
|
38 |
+
raise ValueError("Could not find script content in the HTML")
|
39 |
+
|
40 |
+
# Extract the filename from the URL and change the extension to .txt
|
41 |
+
filename = os.path.basename(script_url).split(".")[0] + ".txt"
|
42 |
+
save_path = os.path.join(save_dir, filename)
|
43 |
+
|
44 |
+
with open(save_path, "w", encoding="utf-8") as file:
|
45 |
+
file.write(cleaned_text)
|
46 |
+
|
47 |
+
print(f"Downloaded: {save_path}")
|
48 |
+
return save_path
|
49 |
+
|
50 |
+
if __name__ == "__main__":
|
51 |
+
parser = argparse.ArgumentParser(description="Download a screenplay from a given URL.")
|
52 |
+
parser.add_argument("--url", type=str, default=DEFAULT_SCRIPT_URL,
|
53 |
+
help="URL of the screenplay to download (default: Twelve Monkeys)")
|
54 |
+
|
55 |
+
args = parser.parse_args()
|
56 |
+
script_url = args.url
|
57 |
+
|
58 |
+
# Extract the script name from the URL
|
59 |
+
script_name = os.path.basename(script_url).split(".")[0] + ".txt"
|
60 |
+
save_path = os.path.join(save_dir, script_name)
|
61 |
+
|
62 |
+
# Download the script
|
63 |
+
download_script(script_url, save_dir)
|
screenplays/scripts/AmericanBeauty_final.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
screenplays/scripts/thefifthelement.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
screenplays/scripts/twelve_monkeys.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
screenplays/send_screenplay.py
ADDED
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
import requests
|
3 |
+
import psycopg2
|
4 |
+
import math
|
5 |
+
|
6 |
+
def chunk_text(text, chunk_size=4096):
|
7 |
+
words = text.split()
|
8 |
+
return [' '.join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]
|
9 |
+
|
10 |
+
def is_screenplay_in_db(cursor, title):
|
11 |
+
cursor.execute("SELECT COUNT(*) FROM items WHERE title LIKE %s", (f"{title}%",))
|
12 |
+
return cursor.fetchone()[0] > 0
|
13 |
+
|
14 |
+
def send_screenplay(file_path, cursor):
|
15 |
+
with open(file_path, 'r', encoding='utf-8') as file:
|
16 |
+
content = file.read()
|
17 |
+
|
18 |
+
title = os.path.splitext(os.path.basename(file_path))[0]
|
19 |
+
|
20 |
+
if is_screenplay_in_db(cursor, title):
|
21 |
+
print(f"Screenplay '{title}' already exists in the database. Skipping.")
|
22 |
+
return
|
23 |
+
|
24 |
+
chunks = chunk_text(content)
|
25 |
+
|
26 |
+
for i, chunk in enumerate(chunks):
|
27 |
+
payload = {
|
28 |
+
"title": f"{title}_chunk_{i+1}",
|
29 |
+
"doc_text": chunk
|
30 |
+
}
|
31 |
+
response = requests.post("http://localhost:8080/add_document", json=payload)
|
32 |
+
print(f"Chunk {i+1} response: {response.status_code}")
|
33 |
+
|
34 |
+
def process_scripts_folder():
|
35 |
+
conn = psycopg2.connect("dbname=ragtag user=jc password=!1newmedia host=localhost")
|
36 |
+
cursor = conn.cursor()
|
37 |
+
|
38 |
+
scripts_folder = "scripts"
|
39 |
+
for filename in os.listdir(scripts_folder):
|
40 |
+
if filename.endswith(".txt"):
|
41 |
+
file_path = os.path.join(scripts_folder, filename)
|
42 |
+
send_screenplay(file_path, cursor)
|
43 |
+
|
44 |
+
cursor.close()
|
45 |
+
conn.close()
|
46 |
+
|
47 |
+
if __name__ == "__main__":
|
48 |
+
process_scripts_folder()
|