hugging2021 commited on
Commit
79c7b05
·
verified ·
1 Parent(s): 8a96b21

Upload folder using huggingface_hub

Browse files
.cursorignore ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ screenplays/scripts/*
2
+ screenplays/scripts/twelve_monkeys.txt
3
+ screenplays/scripts/thefifthelement.txt
4
+ screenplays/scripts/AmericanBeauty_final.txt
5
+ screenplays/scripts/
6
+
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ ragtag4 filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ *.pdf
2
+ .env
Dockerfile ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # golang image and setup for the gin server
2
+ FROM golang:1.24.1
3
+
4
+ # Set the working directory in the container
5
+ WORKDIR /app
6
+
7
+ # Copy the current directory contents into the container at /app
8
+ COPY . /app
9
+
10
+ RUN go mod download
11
+
12
+ RUN go build -o ragtag
13
+
14
+ EXPOSE 8080
15
+
16
+ CMD ["./ragtag"]
17
+
Dockerfile.ollama ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM ollama/ollama:latest
2
+
3
+ # Install curl for healthcheck
4
+ RUN apt-get update && apt-get install -y curl
5
+
6
+ # Copy the entrypoint script
7
+ COPY ollama-entrypoint.sh /ollama-entrypoint.sh
8
+ RUN chmod +x /ollama-entrypoint.sh
9
+
10
+ ENTRYPOINT ["/ollama-entrypoint.sh"]
README.md CHANGED
@@ -1,10 +1,123 @@
1
- ---
2
- title: Ragtag4
3
- emoji: 👁
4
- colorFrom: indigo
5
- colorTo: gray
6
- sdk: docker
7
- pinned: false
8
- ---
9
-
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # what
2
+
3
+ RAG is easy! Run ollama llama3.1 in golang with a postgres database.
4
+
5
+ This is a simple example of how to use ollama with a postgres database to create a RAG system. It should be considered a starting point and not a full-featured system. It can be used and adapted for any data related use case for using llm's to answer questions about data.
6
+
7
+ - Cool feature:
8
+ You can use the title as a category filter and if you add an '@' to the query, it will filter vector docs by that matched title, so for example:
9
+ `Is @antibionic a good company?` and it sets only docs and vectors with antibionic in the title field as the ones for the entire chat session moving forward. You can see this in the screenshots below. This means you can have different types of documents and not be forced to chat with all of them at once.
10
+
11
+ It is setup on docker compose and is ready to go if you skip to that section below.
12
+
13
+ ## gui
14
+
15
+ The gui includes a document manager to add and remove documents from a database and a chat interface to interact with the system. It is in the format of a single page application and is built with html, css, and javascript. The style is in the format of an emulated terminal with a black background and white / green text.
16
+
17
+ ![Screenshot 2024-08-18 at 5 04 10 PM](https://github.com/user-attachments/assets/ea0b8b04-2dba-4e5c-88fd-037fe296be87)
18
+
19
+ ![PNG image](https://github.com/user-attachments/assets/da8a5c78-7365-459d-9f69-76956dc276df)
20
+
21
+ ## key files
22
+
23
+ - [index.html](index.html) - a simple html gui to interact with the system
24
+ - [main.go](main.go) - the main go file to interact with the system
25
+
26
+ ## what I did & learned
27
+
28
+ - created a table with a vector column and learned it had to be 4096 size to match llama3.1
29
+ - created a function to generate an embedding for a given text including splitting at 4096-bit chunks
30
+ - created a function to query the table with an embedding and return the most similar texts within the chat
31
+ - created a script to download screenplays
32
+ - created a script to send screenplays to the database and auto-embed against with the llm
33
+ - created web interface and document manager interface to do CRUD on docs and vectors and stream tokens back from chat prompt
34
+
35
+ ## tools
36
+
37
+ Used https://cursor.sh and claude sonnet to help with codebase.
38
+
39
+ ## curl add embedding example with title and doc
40
+
41
+ ```bash
42
+ curl -X POST http://localhost:8080/add_document \
43
+ -H "Content-Type: application/json" \
44
+ -d '{"title": "Screenplay Title", "doc_text": "INT. COFFEE SHOP - DAY\n\nJANE, 30s, sits at a corner table, typing furiously on her laptop. The cafe buzzes with quiet conversation.\n\nJOHN, 40s, enters, scanning the room. He spots Jane and approaches.\n\nJOHN\nMind if I join you?\n\nJane looks up, startled."}'
45
+ ```
46
+
47
+ ## curl upload document example
48
+
49
+ ```bash
50
+ curl -X POST http://localhost:8080/upload_document \
51
+ -H "Content-Type: multipart/form-data" \
52
+ -F "title=Example Document" \
53
+ -F "file=@/path/to/your/document.pdf"
54
+ ```
55
+
56
+ ## curl query example
57
+
58
+ ```bash
59
+ curl -X POST http://localhost:8080/query \
60
+ -H "Content-Type: application/json" \
61
+ -d '{"query": "What are the main characters in the screenplays that are in the coffeeshop?"}'
62
+ ```
63
+
64
+ ## filter query example by title
65
+
66
+ ```bash
67
+ curl -X POST http://localhost:8080/query \
68
+ -H "Content-Type: application/json" \
69
+ -d '{
70
+ "query": "@screenplay Tell me about the main characters",
71
+ "sessionId": "1234567890"
72
+ }'
73
+ ```
74
+
75
+ ## sql table creation
76
+
77
+ ```sql
78
+ CREATE DATABASE IF NOT EXISTS ragtag
79
+ CREATE EXTENSION IF NOT EXISTS vector;
80
+ CREATE TABLE IF NOT EXISTS items (
81
+ id SERIAL PRIMARY KEY,
82
+ title TEXT,
83
+ doc TEXT,
84
+ embedding vector(4096)
85
+ );
86
+ ```
87
+
88
+ ## docker
89
+
90
+ Do this first:
91
+ `docker build -t ragtag .`
92
+
93
+ ## docker compose
94
+
95
+ `docker-compose up --build`
96
+ This will pull the llama3.1 model and start the ollama server. It will also start the go server and the gui and connect to postgres and tie it all together.
97
+
98
+ ## ports and versions
99
+
100
+ ollama - 11434
101
+ go server - 8080
102
+ postgres - 5432
103
+
104
+ go version - 1.23.0
105
+ ollama version - 1.10.0
106
+ postgres version - 16.1 as the pgvector:pg16 docker image
107
+
108
+ This has not been tested on other versions but should work on other versions of the software if you know what you are doing.
109
+
110
+ ## curl test no stream
111
+
112
+ ```bash
113
+ curl http://localhost:11434/api/generate -d '{
114
+ "model": "llama3.1",
115
+ "prompt":"Why is the sky blue?",
116
+ "stream": false
117
+ }'
118
+ ```
119
+
120
+ ### Helpers
121
+
122
+ - [downloader.py](screenplays/downloader.py) - downloads a screenplay from a given URL
123
+ - [send_screenplay.py](screenplays/send_screenplay.py) - sends a screenplay to the database
create-db.sql ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ SELECT 'CREATE DATABASE ragtag'
2
+ WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = 'ragtag')\gexec
describer.html ADDED
@@ -0,0 +1,146 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>Image Describer</title>
7
+ <style>
8
+ body {
9
+ font-family: 'Courier New', monospace;
10
+ background-color: #000;
11
+ color: #00ff00;
12
+ margin: 0;
13
+ padding: 20px;
14
+ display: flex;
15
+ flex-direction: column;
16
+ align-items: center;
17
+ }
18
+ h1 {
19
+ color: #00ff00;
20
+ }
21
+ #imageContainer {
22
+ width: 500px;
23
+ height: 500px;
24
+ border: 1px solid #00ff00;
25
+ display: flex;
26
+ justify-content: center;
27
+ align-items: center;
28
+ margin-bottom: 20px;
29
+ }
30
+ #uploadedImage {
31
+ max-width: 100%;
32
+ max-height: 100%;
33
+ }
34
+ #descriptionContainer {
35
+ width: 500px;
36
+ min-height: 100px;
37
+ border: 1px solid #00ff00;
38
+ padding: 10px;
39
+ margin-bottom: 20px;
40
+ }
41
+ #uploadForm {
42
+ margin-bottom: 20px;
43
+ }
44
+ input[type="file"] {
45
+ display: none;
46
+ }
47
+ label, button {
48
+ background-color: #003300;
49
+ color: #00ff00;
50
+ border: 1px solid #00ff00;
51
+ padding: 5px 10px;
52
+ cursor: pointer;
53
+ }
54
+ label:hover, button:hover {
55
+ background-color: #004400;
56
+ }
57
+ a {
58
+ color: #00ff00;
59
+ text-decoration: none;
60
+ margin-bottom: 20px;
61
+ }
62
+ a:hover {
63
+ text-decoration: underline;
64
+ }
65
+ </style>
66
+ </head>
67
+ <body>
68
+ <h1>Image Describer</h1>
69
+ <a href="/">Back to Chat</a>
70
+ <div id="imageContainer">
71
+ <img id="uploadedImage" src="" alt="Uploaded image will appear here">
72
+ </div>
73
+ <form id="uploadForm" enctype="multipart/form-data">
74
+ <label for="imageFile">Choose Image</label>
75
+ <input type="file" id="imageFile" name="file" accept="image/*" required>
76
+ <button type="submit">Describe Image</button>
77
+ </form>
78
+ <div id="descriptionContainer">
79
+ Description will appear here...
80
+ </div>
81
+
82
+ <script>
83
+ const uploadForm = document.getElementById('uploadForm');
84
+ const imageFile = document.getElementById('imageFile');
85
+ const uploadedImage = document.getElementById('uploadedImage');
86
+ const descriptionContainer = document.getElementById('descriptionContainer');
87
+
88
+ uploadForm.addEventListener('submit', async (e) => {
89
+ e.preventDefault();
90
+ const formData = new FormData(uploadForm);
91
+
92
+ try {
93
+ const response = await fetch('/describe_image', {
94
+ method: 'POST',
95
+ body: formData
96
+ });
97
+
98
+ if (!response.ok) {
99
+ throw new Error('Network response was not ok');
100
+ }
101
+
102
+ const reader = response.body.getReader();
103
+ descriptionContainer.textContent = '';
104
+
105
+ let buffer = '';
106
+ while (true) {
107
+ const { value, done } = await reader.read();
108
+ if (done) break;
109
+
110
+ buffer += new TextDecoder().decode(value);
111
+ const lines = buffer.split('\n');
112
+ buffer = lines.pop() || '';
113
+
114
+ for (const line of lines) {
115
+ if (line.startsWith('data:')) {
116
+ const content = line.slice(5);
117
+ if (content) {
118
+ appendToDescription(content);
119
+ }
120
+ }
121
+ }
122
+ }
123
+ } catch (error) {
124
+ console.error('Error:', error);
125
+ descriptionContainer.textContent = 'Error describing image';
126
+ }
127
+ });
128
+
129
+ function appendToDescription(content) {
130
+ descriptionContainer.textContent += content + '';
131
+ descriptionContainer.scrollTop = descriptionContainer.scrollHeight;
132
+ }
133
+
134
+ imageFile.addEventListener('change', (e) => {
135
+ const file = e.target.files[0];
136
+ if (file) {
137
+ const reader = new FileReader();
138
+ reader.onload = (e) => {
139
+ uploadedImage.src = e.target.result;
140
+ };
141
+ reader.readAsDataURL(file);
142
+ }
143
+ });
144
+ </script>
145
+ </body>
146
+ </html>
docker-backup/Dockerfile ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # golang image and setup for the gin server
2
+ FROM golang:1.23.0
3
+
4
+ # Set the working directory in the container
5
+ WORKDIR /app
6
+
7
+ # Copy the current directory contents into the container at /app
8
+ COPY . /app
9
+
10
+ # Install dependencies
11
+ RUN go mod download
12
+
13
+ # Build with verbose output
14
+ RUN go build -v -o ragtag
15
+
16
+ EXPOSE 8080
17
+
18
+ CMD ["./ragtag"]
docker-backup/docker-compose.yml ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ version: '3.8'
2
+
3
+ services:
4
+ ollama:
5
+ image: ollama/ollama:latest
6
+ container_name: ollama
7
+ ports:
8
+ - "11434:11434" # Expose the default Ollama API port
9
+ environment:
10
+ - OLLAMA_PORT=11434
11
+ - OLLAMA_HOST=0.0.0.0
12
+ - OLLAMA_MODEL=llama3.1
13
+ restart: unless-stopped
14
+ api:
15
+ image: ragtag:latest
16
+ container_name: api
17
+ ports:
18
+ - "8080:8080"
19
+ restart: unless-stopped
docker-backup/requirements.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ flask
2
+ ollama
docker-compose.yml ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ services:
2
+ ollama:
3
+ build:
4
+ context: .
5
+ dockerfile: Dockerfile.ollama
6
+ container_name: ollama
7
+ ports:
8
+ - "11434:11434"
9
+ environment:
10
+ - OLLAMA_PORT=11434
11
+ - OLLAMA_HOST=0.0.0.0
12
+ - OLLAMA_MODEL=llama3.1
13
+ restart: unless-stopped
14
+ healthcheck:
15
+ test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
16
+ interval: 30s
17
+ timeout: 10s
18
+ retries: 5
19
+
20
+ db:
21
+ image: pgvector/pgvector:pg16
22
+ container_name: db
23
+ ports:
24
+ - "5432:5432"
25
+ environment:
26
+ - POSTGRES_DB=ragtag
27
+ - POSTGRES_USER=${DB_USER}
28
+ - POSTGRES_PASSWORD=${DB_PASSWORD}
29
+ volumes:
30
+ - ./init-db.sql:/docker-entrypoint-initdb.d/init-db.sql
31
+ restart: unless-stopped
32
+ healthcheck:
33
+ test: ["CMD-SHELL", "pg_isready -U ${DB_USER} -d ragtag"]
34
+ interval: 10s
35
+ timeout: 5s
36
+ retries: 5
37
+
38
+ api:
39
+ image: ragtag:latest
40
+ container_name: api
41
+ ports:
42
+ - "8080:8080"
43
+ environment:
44
+ - DB_URL=${DB_URL_DOCKER}
45
+ - OLLAMA_HOST=ollama
46
+ restart: unless-stopped
47
+ depends_on:
48
+ db:
49
+ condition: service_healthy
50
+ ollama:
51
+ condition: service_healthy
docmanager.html ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>Document Manager</title>
7
+ <style>
8
+ body {
9
+ font-family: 'Courier New', monospace;
10
+ background-color: #000;
11
+ color: #00ff00;
12
+ margin: 0;
13
+ padding: 20px;
14
+ }
15
+ h1 {
16
+ color: #00ff00;
17
+ }
18
+ table {
19
+ width: 100%;
20
+ border-collapse: collapse;
21
+ }
22
+ th, td {
23
+ border: 1px solid #00ff00;
24
+ padding: 8px;
25
+ text-align: left;
26
+ }
27
+ th {
28
+ background-color: #003300;
29
+ }
30
+ button {
31
+ background-color: #003300;
32
+ color: #00ff00;
33
+ border: 1px solid #00ff00;
34
+ padding: 5px 10px;
35
+ cursor: pointer;
36
+ }
37
+ button:hover {
38
+ background-color: #004400;
39
+ }
40
+ a {
41
+ color: #00ff00;
42
+ text-decoration: none;
43
+ }
44
+ a:hover {
45
+ text-decoration: underline;
46
+ }
47
+ </style>
48
+ </head>
49
+ <body>
50
+ <h1>Document Manager</h1>
51
+ <a href="/" style="margin-bottom: 20px; display: block;">Back to Chat</a>
52
+ <table id="docTable">
53
+ <thead>
54
+ <tr>
55
+ <th>Title</th>
56
+ <th>Count</th>
57
+ <th>Action</th>
58
+ </tr>
59
+ </thead>
60
+ <tbody>
61
+ <!-- Table rows will be dynamically added here -->
62
+ </tbody>
63
+ </table>
64
+
65
+ <h2>Upload New Document</h2>
66
+ <form id="uploadForm" enctype="multipart/form-data">
67
+ <input type="text" id="docTitle" name="title" placeholder="Document Title" required>
68
+ <input type="file" id="docFile" name="file" accept=".txt,.pdf,.jpg,.jpeg,.png" required>
69
+ <button type="submit">Upload</button>
70
+ </form>
71
+
72
+ <script>
73
+ async function fetchDocuments() {
74
+ let documents = [];
75
+ try {
76
+ documents = await fetch('/documents').then(r => r.json());
77
+ if (!Array.isArray(documents)) documents = [];
78
+ } catch {}
79
+ const tableBody = document.querySelector('#docTable tbody');
80
+ tableBody.innerHTML = '';
81
+ documents.forEach(doc => {
82
+ const row = tableBody.insertRow();
83
+ row.insertCell(0).textContent = doc.title;
84
+ row.insertCell(1).textContent = doc.count;
85
+ const deleteButton = document.createElement('button');
86
+ deleteButton.textContent = 'Delete';
87
+ deleteButton.onclick = () => deleteDocument(doc.title);
88
+ row.insertCell(2).appendChild(deleteButton);
89
+ });
90
+ }
91
+
92
+ async function deleteDocument(title) {
93
+ const response = await fetch('/delete_document', {
94
+ method: 'POST',
95
+ headers: {
96
+ 'Content-Type': 'application/json',
97
+ },
98
+ body: JSON.stringify({ title: title }),
99
+ });
100
+ if (response.ok) {
101
+ fetchDocuments();
102
+ } else {
103
+ alert('Error deleting document');
104
+ }
105
+ }
106
+
107
+ async function uploadDocument(formData) {
108
+ const response = await fetch('/upload_document', {
109
+ method: 'POST',
110
+ body: formData
111
+ });
112
+ if (response.ok) {
113
+ alert('Document uploaded successfully');
114
+ fetchDocuments();
115
+ } else {
116
+ let msg = 'Error uploading document';
117
+ try {
118
+ const data = await response.json();
119
+ if (data && data.error) msg = data.error;
120
+ } catch {}
121
+ alert(msg);
122
+ }
123
+ }
124
+
125
+ document.getElementById('uploadForm').addEventListener('submit', function(e) {
126
+ e.preventDefault();
127
+ const formData = new FormData(this);
128
+ uploadDocument(formData);
129
+ });
130
+
131
+ fetchDocuments();
132
+ </script>
133
+ </body>
134
+ </html>
docs/docs.go ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ // Package docs Code generated by swaggo/swag. DO NOT EDIT
2
+ package docs
3
+
4
+ import "github.com/swaggo/swag"
5
+
6
+ const docTemplate = `{
7
+ "schemes": {{ marshal .Schemes }},
8
+ "swagger": "2.0",
9
+ "info": {
10
+ "description": "{{escape .Description}}",
11
+ "title": "{{.Title}}",
12
+ "termsOfService": "http://swagger.io/terms/",
13
+ "contact": {
14
+ "name": "James Campbell",
15
+ "email": "[email protected]"
16
+ },
17
+ "license": {
18
+ "name": "Apache 2.0",
19
+ "url": "http://www.apache.org/licenses/LICENSE-2.0.html"
20
+ },
21
+ "version": "{{.Version}}"
22
+ },
23
+ "host": "{{.Host}}",
24
+ "basePath": "{{.BasePath}}",
25
+ "paths": {}
26
+ }`
27
+
28
+ // SwaggerInfo holds exported Swagger Info so clients can modify it
29
+ var SwaggerInfo = &swag.Spec{
30
+ Version: "1.0",
31
+ Host: "localhost:8080",
32
+ BasePath: "/",
33
+ Schemes: []string{},
34
+ Title: "RAGTAG API",
35
+ Description: "This is the API for the RAGTAG system.",
36
+ InfoInstanceName: "swagger",
37
+ SwaggerTemplate: docTemplate,
38
+ LeftDelim: "{{",
39
+ RightDelim: "}}",
40
+ }
41
+
42
+ func init() {
43
+ swag.Register(SwaggerInfo.InstanceName(), SwaggerInfo)
44
+ }
docs/swagger.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "swagger": "2.0",
3
+ "info": {
4
+ "description": "This is the API for the RAGTAG system.",
5
+ "title": "RAGTAG API",
6
+ "termsOfService": "http://swagger.io/terms/",
7
+ "contact": {
8
+ "name": "James Campbell",
9
+ "email": "[email protected]"
10
+ },
11
+ "license": {
12
+ "name": "Apache 2.0",
13
+ "url": "http://www.apache.org/licenses/LICENSE-2.0.html"
14
+ },
15
+ "version": "1.0"
16
+ },
17
+ "host": "localhost:8080",
18
+ "basePath": "/",
19
+ "paths": {}
20
+ }
docs/swagger.yaml ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ basePath: /
2
+ host: localhost:8080
3
+ info:
4
+ contact:
5
6
+ name: James Campbell
7
+ description: This is the API for the RAGTAG system.
8
+ license:
9
+ name: Apache 2.0
10
+ url: http://www.apache.org/licenses/LICENSE-2.0.html
11
+ termsOfService: http://swagger.io/terms/
12
+ title: RAGTAG API
13
+ version: "1.0"
14
+ paths: {}
15
+ swagger: "2.0"
example.env ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ DB_URL=postgres://postgres:password@localhost:5432/ragtag
2
+ DB_URL_DOCKER=postgres://ragtag:ragtag@db:5432/ragtag
3
+ DB_USER=ragtag
4
+ DB_PASSWORD=ragtag
go.mod ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ module github.com/james-see/ragtag4
2
+
3
+ go 1.24.1
4
+
5
+ require (
6
+ github.com/gin-gonic/gin v1.10.0
7
+ github.com/jackc/pgx/v5 v5.6.0
8
+ github.com/joho/godotenv v1.5.1
9
+ github.com/ledongthuc/pdf v0.0.0-20250511090121-5959a4027728
10
+ github.com/ollama/ollama v0.3.6
11
+ github.com/pgvector/pgvector-go v0.2.2
12
+ )
13
+
14
+ require (
15
+ github.com/bytedance/sonic v1.11.6 // indirect
16
+ github.com/bytedance/sonic/loader v0.1.1 // indirect
17
+ github.com/cloudwego/base64x v0.1.4 // indirect
18
+ github.com/cloudwego/iasm v0.2.0 // indirect
19
+ github.com/gabriel-vasile/mimetype v1.4.3 // indirect
20
+ github.com/gin-contrib/sse v0.1.0 // indirect
21
+ github.com/go-playground/locales v0.14.1 // indirect
22
+ github.com/go-playground/universal-translator v0.18.1 // indirect
23
+ github.com/go-playground/validator/v10 v10.20.0 // indirect
24
+ github.com/goccy/go-json v0.10.2 // indirect
25
+ github.com/jackc/pgpassfile v1.0.0 // indirect
26
+ github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 // indirect
27
+ github.com/json-iterator/go v1.1.12 // indirect
28
+ github.com/klauspost/cpuid/v2 v2.2.7 // indirect
29
+ github.com/leodido/go-urn v1.4.0 // indirect
30
+ github.com/mattn/go-isatty v0.0.20 // indirect
31
+ github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
32
+ github.com/modern-go/reflect2 v1.0.2 // indirect
33
+ github.com/pelletier/go-toml/v2 v2.2.2 // indirect
34
+ github.com/twitchyliquid64/golang-asm v0.15.1 // indirect
35
+ github.com/ugorji/go/codec v1.2.12 // indirect
36
+ golang.org/x/arch v0.8.0 // indirect
37
+ golang.org/x/crypto v0.25.0 // indirect
38
+ golang.org/x/net v0.25.0 // indirect
39
+ golang.org/x/sys v0.22.0 // indirect
40
+ golang.org/x/text v0.16.0 // indirect
41
+ google.golang.org/protobuf v1.34.1 // indirect
42
+ gopkg.in/yaml.v3 v3.0.1 // indirect
43
+ )
go.sum ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ entgo.io/ent v0.13.1 h1:uD8QwN1h6SNphdCCzmkMN3feSUzNnVvV/WIkHKMbzOE=
2
+ entgo.io/ent v0.13.1/go.mod h1:qCEmo+biw3ccBn9OyL4ZK5dfpwg++l1Gxwac5B1206A=
3
+ github.com/bytedance/sonic v1.11.6 h1:oUp34TzMlL+OY1OUWxHqsdkgC/Zfc85zGqw9siXjrc0=
4
+ github.com/bytedance/sonic v1.11.6/go.mod h1:LysEHSvpvDySVdC2f87zGWf6CIKJcAvqab1ZaiQtds4=
5
+ github.com/bytedance/sonic/loader v0.1.1 h1:c+e5Pt1k/cy5wMveRDyk2X4B9hF4g7an8N3zCYjJFNM=
6
+ github.com/bytedance/sonic/loader v0.1.1/go.mod h1:ncP89zfokxS5LZrJxl5z0UJcsk4M4yY2JpfqGeCtNLU=
7
+ github.com/cloudwego/base64x v0.1.4 h1:jwCgWpFanWmN8xoIUHa2rtzmkd5J2plF/dnLS6Xd/0Y=
8
+ github.com/cloudwego/base64x v0.1.4/go.mod h1:0zlkT4Wn5C6NdauXdJRhSKRlJvmclQ1hhJgA0rcu/8w=
9
+ github.com/cloudwego/iasm v0.2.0 h1:1KNIy1I1H9hNNFEEH3DVnI4UujN+1zjpuk6gwHLTssg=
10
+ github.com/cloudwego/iasm v0.2.0/go.mod h1:8rXZaNYT2n95jn+zTI1sDr+IgcD2GVs0nlbbQPiEFhY=
11
+ github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
12
+ github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
13
+ github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
14
+ github.com/gabriel-vasile/mimetype v1.4.3 h1:in2uUcidCuFcDKtdcBxlR0rJ1+fsokWf+uqxgUFjbI0=
15
+ github.com/gabriel-vasile/mimetype v1.4.3/go.mod h1:d8uq/6HKRL6CGdk+aubisF/M5GcPfT7nKyLpA0lbSSk=
16
+ github.com/gin-contrib/sse v0.1.0 h1:Y/yl/+YNO8GZSjAhjMsSuLt29uWRFHdHYUb5lYOV9qE=
17
+ github.com/gin-contrib/sse v0.1.0/go.mod h1:RHrZQHXnP2xjPF+u1gW/2HnVO7nvIa9PG3Gm+fLHvGI=
18
+ github.com/gin-gonic/gin v1.10.0 h1:nTuyha1TYqgedzytsKYqna+DfLos46nTv2ygFy86HFU=
19
+ github.com/gin-gonic/gin v1.10.0/go.mod h1:4PMNQiOhvDRa013RKVbsiNwoyezlm2rm0uX/T7kzp5Y=
20
+ github.com/go-pg/pg/v10 v10.11.0 h1:CMKJqLgTrfpE/aOVeLdybezR2om071Vh38OLZjsyMI0=
21
+ github.com/go-pg/pg/v10 v10.11.0/go.mod h1:4BpHRoxE61y4Onpof3x1a2SQvi9c+q1dJnrNdMjsroA=
22
+ github.com/go-pg/zerochecker v0.2.0 h1:pp7f72c3DobMWOb2ErtZsnrPaSvHd2W4o9//8HtF4mU=
23
+ github.com/go-pg/zerochecker v0.2.0/go.mod h1:NJZ4wKL0NmTtz0GKCoJ8kym6Xn/EQzXRl2OnAe7MmDo=
24
+ github.com/go-playground/assert/v2 v2.2.0 h1:JvknZsQTYeFEAhQwI4qEt9cyV5ONwRHC+lYKSsYSR8s=
25
+ github.com/go-playground/assert/v2 v2.2.0/go.mod h1:VDjEfimB/XKnb+ZQfWdccd7VUvScMdVu0Titje2rxJ4=
26
+ github.com/go-playground/locales v0.14.1 h1:EWaQ/wswjilfKLTECiXz7Rh+3BjFhfDFKv/oXslEjJA=
27
+ github.com/go-playground/locales v0.14.1/go.mod h1:hxrqLVvrK65+Rwrd5Fc6F2O76J/NuW9t0sjnWqG1slY=
28
+ github.com/go-playground/universal-translator v0.18.1 h1:Bcnm0ZwsGyWbCzImXv+pAJnYK9S473LQFuzCbDbfSFY=
29
+ github.com/go-playground/universal-translator v0.18.1/go.mod h1:xekY+UJKNuX9WP91TpwSH2VMlDf28Uj24BCp08ZFTUY=
30
+ github.com/go-playground/validator/v10 v10.20.0 h1:K9ISHbSaI0lyB2eWMPJo+kOS/FBExVwjEviJTixqxL8=
31
+ github.com/go-playground/validator/v10 v10.20.0/go.mod h1:dbuPbCMFw/DrkbEynArYaCwl3amGuJotoKCe95atGMM=
32
+ github.com/goccy/go-json v0.10.2 h1:CrxCmQqYDkv1z7lO7Wbh2HN93uovUHgrECaO5ZrCXAU=
33
+ github.com/goccy/go-json v0.10.2/go.mod h1:6MelG93GURQebXPDq3khkgXZkazVtN9CRI+MGFi0w8I=
34
+ github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI=
35
+ github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
36
+ github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
37
+ github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
38
+ github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
39
+ github.com/jackc/pgpassfile v1.0.0 h1:/6Hmqy13Ss2zCq62VdNG8tM1wchn8zjSGOBJ6icpsIM=
40
+ github.com/jackc/pgpassfile v1.0.0/go.mod h1:CEx0iS5ambNFdcRtxPj5JhEz+xB6uRky5eyVu/W2HEg=
41
+ github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 h1:iCEnooe7UlwOQYpKFhBabPMi4aNAfoODPEFNiAnClxo=
42
+ github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761/go.mod h1:5TJZWKEWniPve33vlWYSoGYefn3gLQRzjfDlhSJ9ZKM=
43
+ github.com/jackc/pgx/v5 v5.6.0 h1:SWJzexBzPL5jb0GEsrPMLIsi/3jOo7RHlzTjcAeDrPY=
44
+ github.com/jackc/pgx/v5 v5.6.0/go.mod h1:DNZ/vlrUnhWCoFGxHAG8U2ljioxukquj7utPDgtQdTw=
45
+ github.com/jackc/puddle/v2 v2.2.1 h1:RhxXJtFG022u4ibrCSMSiu5aOq1i77R3OHKNJj77OAk=
46
+ github.com/jackc/puddle/v2 v2.2.1/go.mod h1:vriiEXHvEE654aYKXXjOvZM39qJ0q+azkZFrfEOc3H4=
47
+ github.com/jinzhu/inflection v1.0.0 h1:K317FqzuhWc8YvSVlFMCCUb36O/S9MCKRDI7QkRKD/E=
48
+ github.com/jinzhu/inflection v1.0.0/go.mod h1:h+uFLlag+Qp1Va5pdKtLDYj+kHp5pxUVkryuEj+Srlc=
49
+ github.com/jinzhu/now v1.1.5 h1:/o9tlHleP7gOFmsnYNz3RGnqzefHA47wQpKrrdTIwXQ=
50
+ github.com/jinzhu/now v1.1.5/go.mod h1:d3SSVoowX0Lcu0IBviAWJpolVfI5UJVZZ7cO71lE/z8=
51
+ github.com/jmoiron/sqlx v1.3.5 h1:vFFPA71p1o5gAeqtEAwLU4dnX2napprKtHr7PYIcN3g=
52
+ github.com/jmoiron/sqlx v1.3.5/go.mod h1:nRVWtLre0KfCLJvgxzCsLVMogSvQ1zNJtpYr2Ccp0mQ=
53
+ github.com/joho/godotenv v1.5.1 h1:7eLL/+HRGLY0ldzfGMeQkb7vMd0as4CfYvUVzLqw0N0=
54
+ github.com/joho/godotenv v1.5.1/go.mod h1:f4LDr5Voq0i2e/R5DDNOoa2zzDfwtkZa6DnEwAbqwq4=
55
+ github.com/json-iterator/go v1.1.12 h1:PV8peI4a0ysnczrg+LtxykD8LfKY9ML6u2jnxaEnrnM=
56
+ github.com/json-iterator/go v1.1.12/go.mod h1:e30LSqwooZae/UwlEbR2852Gd8hjQvJoHmT4TnhNGBo=
57
+ github.com/klauspost/cpuid/v2 v2.0.9/go.mod h1:FInQzS24/EEf25PyTYn52gqo7WaD8xa0213Md/qVLRg=
58
+ github.com/klauspost/cpuid/v2 v2.2.7 h1:ZWSB3igEs+d0qvnxR/ZBzXVmxkgt8DdzP6m9pfuVLDM=
59
+ github.com/klauspost/cpuid/v2 v2.2.7/go.mod h1:Lcz8mBdAVJIBVzewtcLocK12l3Y+JytZYpaMropDUws=
60
+ github.com/knz/go-libedit v1.10.1/go.mod h1:MZTVkCWyz0oBc7JOWP3wNAzd002ZbM/5hgShxwh4x8M=
61
+ github.com/kr/pretty v0.3.0 h1:WgNl7dwNpEZ6jJ9k1snq4pZsg7DOEN8hP9Xw0Tsjwk0=
62
+ github.com/kr/pretty v0.3.0/go.mod h1:640gp4NfQd8pI5XOwp5fnNeVWj67G7CFk/SaSQn7NBk=
63
+ github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
64
+ github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
65
+ github.com/ledongthuc/pdf v0.0.0-20240201131950-da5b75280b06 h1:kacRlPN7EN++tVpGUorNGPn/4DnB7/DfTY82AOn6ccU=
66
+ github.com/ledongthuc/pdf v0.0.0-20240201131950-da5b75280b06/go.mod h1:imJHygn/1yfhB7XSJJKlFZKl/J+dCPAknuiaGOshXAs=
67
+ github.com/ledongthuc/pdf v0.0.0-20250511090121-5959a4027728 h1:QwWKgMY28TAXaDl+ExRDqGQltzXqN/xypdKP86niVn8=
68
+ github.com/ledongthuc/pdf v0.0.0-20250511090121-5959a4027728/go.mod h1:1fEHWurg7pvf5SG6XNE5Q8UZmOwex51Mkx3SLhrW5B4=
69
+ github.com/leodido/go-urn v1.4.0 h1:WT9HwE9SGECu3lg4d/dIA+jxlljEa1/ffXKmRjqdmIQ=
70
+ github.com/leodido/go-urn v1.4.0/go.mod h1:bvxc+MVxLKB4z00jd1z+Dvzr47oO32F/QSNjSBOlFxI=
71
+ github.com/lib/pq v1.10.9 h1:YXG7RB+JIjhP29X+OtkiDnYaXQwpS4JEWq7dtCCRUEw=
72
+ github.com/lib/pq v1.10.9/go.mod h1:AlVN5x4E4T544tWzH6hKfbfQvm3HdbOxrmggDNAPY9o=
73
+ github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY=
74
+ github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
75
+ github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
76
+ github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w8PVh93nsPXa1VrQ6jlwL5oN8l14QlcNfg=
77
+ github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
78
+ github.com/modern-go/reflect2 v1.0.2 h1:xBagoLtFs94CBntxluKeaWgTMpvLxC4ur3nMaC9Gz0M=
79
+ github.com/modern-go/reflect2 v1.0.2/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk=
80
+ github.com/ollama/ollama v0.3.6 h1:nA/N0AmjP327po5cZDGLqI40nl+aeei0pD0dLa92ypE=
81
+ github.com/ollama/ollama v0.3.6/go.mod h1:YrWoNkFnPOYsnDvsf/Ztb1wxU9/IXrNsQHqcxbY2r94=
82
+ github.com/pelletier/go-toml/v2 v2.2.2 h1:aYUidT7k73Pcl9nb2gScu7NSrKCSHIDE89b3+6Wq+LM=
83
+ github.com/pelletier/go-toml/v2 v2.2.2/go.mod h1:1t835xjRzz80PqgE6HHgN2JOsmgYu/h4qDAS4n929Rs=
84
+ github.com/pgvector/pgvector-go v0.2.2 h1:Q/oArmzgbEcio88q0tWQksv/u9Gnb1c3F1K2TnalxR0=
85
+ github.com/pgvector/pgvector-go v0.2.2/go.mod h1:u5sg3z9bnqVEdpe1pkTij8/rFhTaMCMNyQagPDLK8gQ=
86
+ github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
87
+ github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
88
+ github.com/rogpeppe/go-internal v1.9.0 h1:73kH8U+JUqXU8lRuOHeVHaa/SZPifC7BkcraZVejAe8=
89
+ github.com/rogpeppe/go-internal v1.9.0/go.mod h1:WtVeX8xhTBvf0smdhujwtBcq4Qrzq/fJaraNFVN+nFs=
90
+ github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
91
+ github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw=
92
+ github.com/stretchr/objx v0.5.0/go.mod h1:Yh+to48EsGEfYuaHDzXPcE3xhTkx73EhmCGUpEOglKo=
93
+ github.com/stretchr/objx v0.5.2/go.mod h1:FRsXN1f5AsAjCGJKqEizvkpNtU+EGNCLh3NxZ/8L+MA=
94
+ github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI=
95
+ github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
96
+ github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
97
+ github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU=
98
+ github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4=
99
+ github.com/stretchr/testify v1.8.4/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo=
100
+ github.com/stretchr/testify v1.9.0 h1:HtqpIVDClZ4nwg75+f6Lvsy/wHu+3BoSGCbBAcpTsTg=
101
+ github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
102
+ github.com/tmthrgd/go-hex v0.0.0-20190904060850-447a3041c3bc h1:9lRDQMhESg+zvGYmW5DyG0UqvY96Bu5QYsTLvCHdrgo=
103
+ github.com/tmthrgd/go-hex v0.0.0-20190904060850-447a3041c3bc/go.mod h1:bciPuU6GHm1iF1pBvUfxfsH0Wmnc2VbpgvbI9ZWuIRs=
104
+ github.com/twitchyliquid64/golang-asm v0.15.1 h1:SU5vSMR7hnwNxj24w34ZyCi/FmDZTkS4MhqMhdFk5YI=
105
+ github.com/twitchyliquid64/golang-asm v0.15.1/go.mod h1:a1lVb/DtPvCB8fslRZhAngC2+aY1QWCk3Cedj/Gdt08=
106
+ github.com/ugorji/go/codec v1.2.12 h1:9LC83zGrHhuUA9l16C9AHXAqEV/2wBQ4nkvumAE65EE=
107
+ github.com/ugorji/go/codec v1.2.12/go.mod h1:UNopzCgEMSXjBc6AOMqYvWC1ktqTAfzJZUZgYf6w6lg=
108
+ github.com/uptrace/bun v1.1.12 h1:sOjDVHxNTuM6dNGaba0wUuz7KvDE1BmNu9Gqs2gJSXQ=
109
+ github.com/uptrace/bun v1.1.12/go.mod h1:NPG6JGULBeQ9IU6yHp7YGELRa5Agmd7ATZdz4tGZ6z0=
110
+ github.com/uptrace/bun/dialect/pgdialect v1.1.12 h1:m/CM1UfOkoBTglGO5CUTKnIKKOApOYxkcP2qn0F9tJk=
111
+ github.com/uptrace/bun/dialect/pgdialect v1.1.12/go.mod h1:Ij6WIxQILxLlL2frUBxUBOZJtLElD2QQNDcu/PWDHTc=
112
+ github.com/uptrace/bun/driver/pgdriver v1.1.12 h1:3rRWB1GK0psTJrHwxzNfEij2MLibggiLdTqjTtfHc1w=
113
+ github.com/uptrace/bun/driver/pgdriver v1.1.12/go.mod h1:ssYUP+qwSEgeDDS1xm2XBip9el1y9Mi5mTAvLoiADLM=
114
+ github.com/vmihailenco/bufpool v0.1.11 h1:gOq2WmBrq0i2yW5QJ16ykccQ4wH9UyEsgLm6czKAd94=
115
+ github.com/vmihailenco/bufpool v0.1.11/go.mod h1:AFf/MOy3l2CFTKbxwt0mp2MwnqjNEs5H/UxrkA5jxTQ=
116
+ github.com/vmihailenco/msgpack/v5 v5.3.5 h1:5gO0H1iULLWGhs2H5tbAHIZTV8/cYafcFOr9znI5mJU=
117
+ github.com/vmihailenco/msgpack/v5 v5.3.5/go.mod h1:7xyJ9e+0+9SaZT0Wt1RGleJXzli6Q/V5KbhBonMG9jc=
118
+ github.com/vmihailenco/tagparser v0.1.2 h1:gnjoVuB/kljJ5wICEEOpx98oXMWPLj22G67Vbd1qPqc=
119
+ github.com/vmihailenco/tagparser v0.1.2/go.mod h1:OeAg3pn3UbLjkWt+rN9oFYB6u/cQgqMEUPoW2WPyhdI=
120
+ github.com/vmihailenco/tagparser/v2 v2.0.0 h1:y09buUbR+b5aycVFQs/g70pqKVZNBmxwAhO7/IwNM9g=
121
+ github.com/vmihailenco/tagparser/v2 v2.0.0/go.mod h1:Wri+At7QHww0WTrCBeu4J6bNtoV6mEfg5OIWRZA9qds=
122
+ golang.org/x/arch v0.0.0-20210923205945-b76863e36670/go.mod h1:5om86z9Hs0C8fWVUuoMHwpExlXzs5Tkyp9hOrfG7pp8=
123
+ golang.org/x/arch v0.8.0 h1:3wRIsP3pM4yUptoR96otTUOXI367OS0+c9eeRi9doIc=
124
+ golang.org/x/arch v0.8.0/go.mod h1:FEVrYAQjsQXMVJ1nsMoVVXPZg6p2JE2mx8psSWTDQys=
125
+ golang.org/x/crypto v0.25.0 h1:ypSNr+bnYL2YhwoMt2zPxHFmbAN1KZs/njMG3hxUp30=
126
+ golang.org/x/crypto v0.25.0/go.mod h1:T+wALwcMOSE0kXgUAnPAHqTLW+XHgcELELW8VaDgm/M=
127
+ golang.org/x/net v0.25.0 h1:d/OCCoBEUq33pjydKrGQhw7IlUPI2Oylr+8qLx49kac=
128
+ golang.org/x/net v0.25.0/go.mod h1:JkAGAh7GEvH74S6FOH42FLoXpXbE/aqXSrIQjXgsiwM=
129
+ golang.org/x/sync v0.7.0 h1:YsImfSBoP9QPYL0xyKJPq0gcaJdG3rInoqxTWbfQu9M=
130
+ golang.org/x/sync v0.7.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
131
+ golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
132
+ golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
133
+ golang.org/x/sys v0.22.0 h1:RI27ohtqKCnwULzJLqkv897zojh5/DwS/ENaMzUOaWI=
134
+ golang.org/x/sys v0.22.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
135
+ golang.org/x/text v0.16.0 h1:a94ExnEXNtEwYLGJSIUxnWoxoRz/ZcCsV63ROupILh4=
136
+ golang.org/x/text v0.16.0/go.mod h1:GhwF1Be+LQoKShO3cGOHzqOgRrGaYc9AvblQOmPVHnI=
137
+ google.golang.org/protobuf v1.34.1 h1:9ddQBjfCyZPOHPUiPxpYESBLc+T8P3E+Vo4IbKZgFWg=
138
+ google.golang.org/protobuf v1.34.1/go.mod h1:c6P6GXX6sHbq/GpV6MGZEdwhWPcYBgnhAHhKbcUYpos=
139
+ gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
140
+ gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk=
141
+ gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q=
142
+ gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
143
+ gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
144
+ gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
145
+ gorm.io/driver/postgres v1.5.4 h1:Iyrp9Meh3GmbSuyIAGyjkN+n9K+GHX9b9MqsTL4EJCo=
146
+ gorm.io/driver/postgres v1.5.4/go.mod h1:Bgo89+h0CRcdA33Y6frlaHHVuTdOf87pmyzwW9C/BH0=
147
+ gorm.io/gorm v1.25.5 h1:zR9lOiiYf09VNh5Q1gphfyia1JpiClIWG9hQaxB/mls=
148
+ gorm.io/gorm v1.25.5/go.mod h1:hbnx/Oo0ChWMn1BIhpy1oYozzpM15i4YPuHDmfYtwg8=
149
+ mellium.im/sasl v0.3.1 h1:wE0LW6g7U83vhvxjC1IY8DnXM+EU095yeo8XClvCdfo=
150
+ mellium.im/sasl v0.3.1/go.mod h1:xm59PUYpZHhgQ9ZqoJ5QaCqzWMi8IeS49dhp6plPCzw=
151
+ nullprogram.com/x/optparse v1.0.0/go.mod h1:KdyPE+Igbe0jQUrVfMqDMeJQIJZEuyV7pjYmp6pbG50=
152
+ rsc.io/pdf v0.1.1/go.mod h1:n8OzWcQ6Sp37PL01nO98y4iUCRdTGarVfzxY20ICaU4=
index.html ADDED
@@ -0,0 +1,227 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>RAGTAG MVP</title>
7
+ <link rel="icon" type="image/x-icon" href="/favicon.ico">
8
+ <style>
9
+ body {
10
+ font-family: 'Courier New', monospace;
11
+ background-color: #000;
12
+ color: #00ff00;
13
+ margin: 0;
14
+ padding: 0;
15
+ height: 100vh;
16
+ display: flex;
17
+ flex-direction: column;
18
+ }
19
+ #content {
20
+ flex-grow: 1;
21
+ overflow-y: auto;
22
+ padding: 20px;
23
+ }
24
+ #chat-container {
25
+ border: 1px solid #00ff00;
26
+ padding: 10px;
27
+ margin-bottom: 10px;
28
+ min-height: 200px;
29
+ }
30
+ #input-container {
31
+ display: flex;
32
+ padding: 10px 20px;
33
+ background-color: #000;
34
+ border-top: 1px solid #00ff00;
35
+ }
36
+ #prompt {
37
+ color: #00ff00;
38
+ margin-right: 5px;
39
+ }
40
+ #user-input {
41
+ flex-grow: 1;
42
+ background-color: #000;
43
+ border: none;
44
+ color: #00ff00;
45
+ font-family: 'Courier New', monospace;
46
+ font-size: 16px;
47
+ }
48
+ #user-input:focus {
49
+ outline: none;
50
+ }
51
+ .message {
52
+ margin-bottom: 10px;
53
+ }
54
+ .user-message {
55
+ color: #ffffff;
56
+ }
57
+ .ai-message {
58
+ color: #00ff00;
59
+ }
60
+ </style>
61
+ </head>
62
+ <body>
63
+ <div id="content">
64
+ <a href="/docmanager" style="color: #00ff00; margin-bottom: 10px; display: block;">Document Manager</a>
65
+ <a href="/describer" style="color: #00ff00; margin-bottom: 10px; display: block;">Image Describer</a>
66
+ <div id="sessionInfo" style="color: #00ff00; margin-bottom: 10px;">
67
+ <div>Session ID: <span id="sessionIdDisplay"></span></div>
68
+ <div>Title Filter: <span id="titleFilterDisplay">None</span></div>
69
+ </div>
70
+ <button id="clearButton" style="background-color: #003300; color: #00ff00; border: 1px solid #00ff00; padding: 5px 10px; cursor: pointer; margin-bottom: 10px;">Clear Chat</button>
71
+ <div id="chat-container"></div>
72
+ </div>
73
+ <div id="input-container">
74
+ <span id="prompt">$</span>
75
+ <input type="text" id="user-input" placeholder="Enter your query...">
76
+ </div>
77
+
78
+ <script>
79
+ const chatContainer = document.getElementById('chat-container');
80
+ const userInput = document.getElementById('user-input');
81
+ const clearButton = document.getElementById('clearButton');
82
+ const sessionIdDisplay = document.getElementById('sessionIdDisplay');
83
+ const titleFilterDisplay = document.getElementById('titleFilterDisplay');
84
+ let queryHistory = [];
85
+ let historyIndex = -1;
86
+ let sessionId = Date.now().toString();
87
+ let titleFilter = '';
88
+
89
+ // Display initial session ID
90
+ sessionIdDisplay.textContent = sessionId;
91
+
92
+ function addMessage(sender, message) {
93
+ const messageElement = document.createElement('div');
94
+ messageElement.classList.add('message');
95
+ messageElement.classList.add(sender === 'You' ? 'user-message' : 'ai-message');
96
+ messageElement.textContent = `${sender === 'You' ? '$ ' : ''}${message}`;
97
+ chatContainer.appendChild(messageElement);
98
+ chatContainer.scrollTop = chatContainer.scrollHeight;
99
+ }
100
+
101
+ async function sendQuery(query) {
102
+ try {
103
+ const response = await fetch('/query', {
104
+ method: 'POST',
105
+ headers: {
106
+ 'Content-Type': 'application/json',
107
+ },
108
+ body: JSON.stringify({ query: query, sessionId: sessionId, titleFilter: titleFilter }),
109
+ });
110
+
111
+ if (!response.ok) {
112
+ throw new Error('Network response was not ok');
113
+ }
114
+
115
+ const reader = response.body.getReader();
116
+ addMessage('AI', ''); // Add an empty AI message to start
117
+
118
+ let buffer = '';
119
+ while (true) {
120
+ const { value, done } = await reader.read();
121
+ if (done) break;
122
+
123
+ buffer += new TextDecoder().decode(value);
124
+ const lines = buffer.split('\n');
125
+ buffer = lines.pop() || '';
126
+
127
+ for (const line of lines) {
128
+ if (line.startsWith('data:')) {
129
+ const content = line.slice(5);
130
+ if (content) {
131
+ appendToLastAIMessage(content);
132
+ } else {
133
+ appendToLastAIMessage('\n');
134
+ }
135
+ }
136
+ }
137
+ }
138
+ } catch (error) {
139
+ console.error('Error:', error);
140
+ addMessage('System', 'An error occurred while processing your request.');
141
+ }
142
+ }
143
+
144
+ function appendToLastAIMessage(content) {
145
+ const messages = document.querySelectorAll('.message');
146
+ const lastAIMessage = Array.from(messages).reverse().find(msg => msg.classList.contains('ai-message'));
147
+ if (lastAIMessage) {
148
+ if (content === '\n') {
149
+ lastAIMessage.appendChild(document.createElement('br'));
150
+ } else {
151
+ lastAIMessage.appendChild(document.createTextNode(content));
152
+ }
153
+ lastAIMessage.scrollIntoView({ behavior: 'smooth', block: 'end' });
154
+ } else {
155
+ addMessage('AI', content);
156
+ }
157
+ }
158
+
159
+ async function clearSession() {
160
+ try {
161
+ const response = await fetch('/clear_session', {
162
+ method: 'POST',
163
+ headers: {
164
+ 'Content-Type': 'application/json',
165
+ },
166
+ body: JSON.stringify({ sessionId: sessionId }),
167
+ });
168
+
169
+ if (!response.ok) {
170
+ throw new Error('Network response was not ok');
171
+ }
172
+
173
+ chatContainer.innerHTML = '';
174
+ queryHistory = [];
175
+ historyIndex = -1;
176
+ sessionId = Date.now().toString();
177
+ titleFilter = '';
178
+ sessionIdDisplay.textContent = sessionId;
179
+ titleFilterDisplay.textContent = 'None';
180
+ addMessage('System', 'Chat session cleared.');
181
+ } catch (error) {
182
+ console.error('Error:', error);
183
+ addMessage('System', 'An error occurred while clearing the session.');
184
+ }
185
+ }
186
+
187
+ function processQuery(query) {
188
+ if (query.includes('@')) {
189
+ const parts = query.split('@');
190
+ if (parts.length > 1) {
191
+ titleFilter = parts[1].split(' ')[0];
192
+ titleFilterDisplay.textContent = titleFilter;
193
+ }
194
+ }
195
+ return query;
196
+ }
197
+
198
+ userInput.addEventListener('keydown', (e) => {
199
+ if (e.key === 'Enter') {
200
+ const query = userInput.value.trim();
201
+ if (query) {
202
+ addMessage('You', query);
203
+ const processedQuery = processQuery(query);
204
+ sendQuery(processedQuery);
205
+ queryHistory.unshift(query);
206
+ historyIndex = -1;
207
+ userInput.value = '';
208
+ }
209
+ } else if (e.key === 'ArrowUp') {
210
+ e.preventDefault();
211
+ if (historyIndex < queryHistory.length - 1) {
212
+ historyIndex++;
213
+ userInput.value = queryHistory[historyIndex];
214
+ }
215
+ } else if (e.key === 'ArrowDown') {
216
+ e.preventDefault();
217
+ if (historyIndex > -1) {
218
+ historyIndex--;
219
+ userInput.value = historyIndex === -1 ? '' : queryHistory[historyIndex];
220
+ }
221
+ }
222
+ });
223
+
224
+ clearButton.addEventListener('click', clearSession);
225
+ </script>
226
+ </body>
227
+ </html>
init-db.sql ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ CREATE EXTENSION IF NOT EXISTS vector;
2
+
3
+ CREATE TABLE IF NOT EXISTS items (
4
+ id SERIAL PRIMARY KEY,
5
+ title TEXT,
6
+ doc TEXT,
7
+ embedding vector(4096)
8
+ );
main.go ADDED
@@ -0,0 +1,698 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ package main
2
+
3
+ import (
4
+ "context"
5
+ "fmt"
6
+ "io"
7
+ "log"
8
+ "net/http"
9
+ "net/url"
10
+ "os"
11
+ "path/filepath"
12
+ "strings"
13
+
14
+ got "github.com/joho/godotenv"
15
+
16
+ "bytes"
17
+ "encoding/base64"
18
+ "encoding/json"
19
+
20
+ "unicode/utf8"
21
+
22
+ "github.com/gin-gonic/gin"
23
+ "github.com/jackc/pgx/v5"
24
+ "github.com/ledongthuc/pdf"
25
+ "github.com/ollama/ollama/api"
26
+ "github.com/pgvector/pgvector-go"
27
+ )
28
+
29
+ // @title RAGTAG API
30
+ // @version 1.0
31
+ // @description This is the API for the RAGTAG system.
32
+ // @termsOfService http://swagger.io/terms/
33
+
34
+ // @contact.name James Campbell
35
+ // @contact.email [email protected]
36
+
37
+ // @license.name Apache 2.0
38
+ // @license.url http://www.apache.org/licenses/LICENSE-2.0.html
39
+
40
+ // @host localhost:8080
41
+ // @BasePath /
42
+
43
+ type Session struct {
44
+ Messages []api.Message
45
+ TitleFilter string
46
+ }
47
+
48
+ var sessions = make(map[string]*Session)
49
+
50
+ func generateEmbedding(input string) ([]float32, error) {
51
+ ollamaHost := os.Getenv("OLLAMA_HOST")
52
+ if ollamaHost == "" {
53
+ ollamaHost = "localhost" // fallback to localhost if not set
54
+ }
55
+ ollamaURL, err := url.Parse(fmt.Sprintf("http://%s:11434", ollamaHost))
56
+ if err != nil {
57
+ return nil, err
58
+ }
59
+ client := api.NewClient(ollamaURL, http.DefaultClient)
60
+
61
+ // Create an embedding request
62
+ req := &api.EmbedRequest{
63
+ Model: "llama3.1", // Ensure this is an embedding-capable model
64
+ Input: input,
65
+ }
66
+
67
+ // Call the Embed function
68
+ resp, err := client.Embed(context.Background(), req)
69
+ if err != nil {
70
+ return nil, err
71
+ }
72
+
73
+ return resp.Embeddings[0], nil
74
+ }
75
+
76
+ func insertItem(conn *pgx.Conn, title string, docText string, embedding []float32) error {
77
+ // Combine title and docText for embedding
78
+ combinedText := title + " " + docText
79
+
80
+ _, err := conn.Exec(context.Background(),
81
+ "INSERT INTO items (title, doc, embedding) VALUES ($1, $2, $3)",
82
+ title, combinedText, pgvector.NewVector(embedding))
83
+
84
+ return err
85
+ }
86
+
87
+ func queryEmbeddings(conn *pgx.Conn, query string, session *Session, c *gin.Context) error {
88
+ // Generate embedding for the query
89
+ queryEmbedding, err := generateEmbedding(query)
90
+ if err != nil {
91
+ return err
92
+ }
93
+
94
+ // Prepare the SQL query
95
+ sqlQuery := "SELECT doc, COALESCE(title, 'Untitled') FROM items"
96
+ if session.TitleFilter != "" {
97
+ sqlQuery += fmt.Sprintf(" WHERE title LIKE '%%%s%%'", session.TitleFilter)
98
+ }
99
+ sqlQuery += " ORDER BY embedding <-> $1 LIMIT 5"
100
+
101
+ // Query the database for similar documents
102
+ rows, err := conn.Query(context.Background(), sqlQuery, pgvector.NewVector(queryEmbedding))
103
+ if err != nil {
104
+ return err
105
+ }
106
+ defer rows.Close()
107
+
108
+ var docs []string
109
+ var sources []string
110
+ for rows.Next() {
111
+ var doc, title string
112
+ if err := rows.Scan(&doc, &title); err != nil {
113
+ return err
114
+ }
115
+ docs = append(docs, doc)
116
+ sources = append(sources, fmt.Sprintf("Source: %s", title))
117
+ }
118
+
119
+ // Combine the retrieved documents
120
+ contextText := strings.Join(docs, "\n\n")
121
+
122
+ // Create a chat request
123
+ ollamaHost := os.Getenv("OLLAMA_HOST")
124
+ if ollamaHost == "" {
125
+ ollamaHost = "localhost" // fallback to localhost if not set
126
+ }
127
+ ollamaURL, err := url.Parse(fmt.Sprintf("http://%s:11434", ollamaHost))
128
+ if err != nil {
129
+ return err
130
+ }
131
+ client := api.NewClient(ollamaURL, http.DefaultClient)
132
+
133
+ // Add the new query to the session
134
+ session.Messages = append(session.Messages, api.Message{Role: "user", Content: query})
135
+
136
+ // Prepare the messages for the chat request
137
+ messages := []api.Message{
138
+ {Role: "system", Content: "You are an assistant that answers questions based on the given context."},
139
+ {Role: "user", Content: "Here's the context:\n" + contextText},
140
+ }
141
+ messages = append(messages, session.Messages...)
142
+
143
+ req := &api.ChatRequest{
144
+ Model: "llama3.1",
145
+ Messages: messages,
146
+ Stream: new(bool), // Use new(bool) to create a pointer to a boolean
147
+ }
148
+ *req.Stream = true // Set the value to true
149
+
150
+ // Call the Chat function with streaming
151
+ err = client.Chat(context.Background(), req, func(resp api.ChatResponse) error {
152
+ // Send the raw content without any modifications
153
+ if resp.Message.Content != "" {
154
+ c.SSEvent("message", resp.Message.Content)
155
+ c.Writer.Flush() // Ensure the content is sent immediately
156
+ }
157
+ return nil
158
+ })
159
+ if err != nil {
160
+ return err
161
+ }
162
+
163
+ // Add the AI response to the session
164
+ session.Messages = append(session.Messages, api.Message{Role: "assistant", Content: "Response sent via streaming"})
165
+
166
+ return nil
167
+ }
168
+
169
+ func getDocuments(conn *pgx.Conn) ([]map[string]interface{}, error) {
170
+ rows, err := conn.Query(context.Background(), "SELECT DISTINCT ON (SPLIT_PART(title, '_chunk_', 1)) SPLIT_PART(title, '_chunk_', 1) as title, COUNT(*) as count FROM items GROUP BY SPLIT_PART(title, '_chunk_', 1)")
171
+ if err != nil {
172
+ return nil, err
173
+ }
174
+ defer rows.Close()
175
+
176
+ var documents []map[string]interface{}
177
+ for rows.Next() {
178
+ var title string
179
+ var count int
180
+ if err := rows.Scan(&title, &count); err != nil {
181
+ return nil, err
182
+ }
183
+ documents = append(documents, map[string]interface{}{
184
+ "title": title,
185
+ "count": count,
186
+ })
187
+ }
188
+
189
+ return documents, nil
190
+ }
191
+
192
+ func deleteDocument(conn *pgx.Conn, title string) error {
193
+ _, err := conn.Exec(context.Background(), "DELETE FROM items WHERE title LIKE $1 || '%'", title)
194
+ return err
195
+ }
196
+
197
+ func uploadDocument(c *gin.Context, conn *pgx.Conn) {
198
+ title := c.PostForm("title")
199
+ file, header, err := c.Request.FormFile("file")
200
+ if err != nil {
201
+ log.Printf("Error getting file: %v", err)
202
+ c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
203
+ return
204
+ }
205
+ defer file.Close()
206
+
207
+ // Create uploads directory if it doesn't exist
208
+ uploadsDir := "uploads"
209
+ if err := os.MkdirAll(uploadsDir, 0755); err != nil {
210
+ log.Printf("Error creating uploads directory: %v", err)
211
+ c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to create uploads directory"})
212
+ return
213
+ }
214
+
215
+ filename := filepath.Join(uploadsDir, header.Filename)
216
+ out, err := os.Create(filename)
217
+ if err != nil {
218
+ log.Printf("Error creating file: %v", err)
219
+ c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
220
+ return
221
+ }
222
+ defer out.Close()
223
+
224
+ _, err = io.Copy(out, file)
225
+ if err != nil {
226
+ log.Printf("Error copying file: %v", err)
227
+ c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
228
+ return
229
+ }
230
+
231
+ // Debug: log file size and first 16 bytes
232
+ stat, statErr := os.Stat(filename)
233
+ if statErr == nil {
234
+ log.Printf("Uploaded file size: %d bytes", stat.Size())
235
+ fcheck, ferr := os.Open(filename)
236
+ if ferr == nil {
237
+ buf := make([]byte, 16)
238
+ n, _ := fcheck.Read(buf)
239
+ log.Printf("First 16 bytes: % x", buf[:n])
240
+ fcheck.Close()
241
+ }
242
+ }
243
+
244
+ var textContent string
245
+ ext := strings.ToLower(filepath.Ext(filename))
246
+ if ext == ".jpg" || ext == ".jpeg" || ext == ".png" {
247
+ // Generate image summary using the llava model
248
+ summary, err := generateImageSummary(filename)
249
+ if err != nil {
250
+ log.Printf("Error generating image summary: %v", err)
251
+ c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
252
+ return
253
+ }
254
+ textContent = summary
255
+ } else if ext == ".pdf" {
256
+ // Check PDF signature before parsing
257
+ f, rErr := os.Open(filename)
258
+ if rErr != nil {
259
+ log.Printf("Error opening PDF: %v", rErr)
260
+ c.JSON(http.StatusInternalServerError, gin.H{"error": rErr.Error()})
261
+ return
262
+ }
263
+ defer f.Close()
264
+ buf := make([]byte, 5)
265
+ _, err := f.Read(buf)
266
+ if err != nil || string(buf) != "%PDF-" {
267
+ log.Printf("Uploaded file is not a valid PDF (missing %PDF- header): %s", filename)
268
+ c.JSON(http.StatusBadRequest, gin.H{"error": "Uploaded file is not a valid PDF (missing %PDF- header)"})
269
+ return
270
+ }
271
+ stat, _ := f.Stat()
272
+ // Loosen EOF check: search last 1KB for %%EOF
273
+ eofCheckSize := int64(1024)
274
+ if stat.Size() < eofCheckSize {
275
+ eofCheckSize = stat.Size()
276
+ }
277
+ endBuf := make([]byte, eofCheckSize)
278
+ _, err = f.ReadAt(endBuf, stat.Size()-eofCheckSize)
279
+ if err != nil || !strings.Contains(string(endBuf), "%%EOF") {
280
+ log.Printf("Uploaded file is not a valid PDF (missing %%EOF): %s", filename)
281
+ c.JSON(http.StatusBadRequest, gin.H{"error": "Uploaded file is not a valid PDF (missing %%EOF)"})
282
+ return
283
+ }
284
+ // Reset file pointer for pdf.NewReader
285
+ f.Seek(0, 0)
286
+ reader, pdfErr := pdf.NewReader(f, stat.Size())
287
+ if pdfErr != nil {
288
+ log.Printf("Error reading PDF: %v", pdfErr)
289
+ c.JSON(http.StatusBadRequest, gin.H{"error": "Uploaded file is not a valid PDF or is corrupted"})
290
+ return
291
+ }
292
+ var sb strings.Builder
293
+ for i := 1; i <= reader.NumPage(); i++ {
294
+ page := reader.Page(i)
295
+ if page.V.IsNull() {
296
+ continue
297
+ }
298
+ content, err := page.GetPlainText(nil)
299
+ if err != nil {
300
+ log.Printf("Error extracting text from page %d: %v", i, err)
301
+ continue
302
+ }
303
+ sb.WriteString(content)
304
+ }
305
+ textContent = sb.String()
306
+ log.Printf("Extracted text length: %d", len(textContent))
307
+ } else {
308
+ content, err := os.ReadFile(filename)
309
+ if err != nil {
310
+ log.Printf("Error reading file: %v", err)
311
+ c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
312
+ return
313
+ }
314
+ textContent = string(content)
315
+ }
316
+
317
+ // Remove null bytes (Postgres TEXT cannot contain 0x00)
318
+ textContent = strings.ReplaceAll(textContent, "\x00", "")
319
+
320
+ // Validate UTF-8
321
+ if !utf8.ValidString(textContent) {
322
+ log.Printf("Invalid UTF-8 detected in document: %s", filename)
323
+ c.JSON(http.StatusBadRequest, gin.H{"error": "Uploaded document is not valid UTF-8"})
324
+ return
325
+ }
326
+
327
+ // Generate embedding for the text content using llama3.1
328
+ embedding, err := generateEmbedding(textContent)
329
+ if err != nil {
330
+ log.Printf("Error generating embedding: %v", err)
331
+ c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
332
+ return
333
+ }
334
+
335
+ // Insert the document into the database
336
+ err = insertItem(conn, title, textContent, embedding)
337
+ if err != nil {
338
+ log.Printf("Error inserting item: %v", err)
339
+ c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
340
+ return
341
+ }
342
+
343
+ c.JSON(http.StatusOK, gin.H{"message": "Document uploaded and processed successfully"})
344
+ }
345
+
346
+ func chunkText(text string, chunkSize int) []string {
347
+ words := strings.Fields(text)
348
+ var chunks []string
349
+ for i := 0; i < len(words); i += chunkSize {
350
+ end := i + chunkSize
351
+ if end > len(words) {
352
+ end = len(words)
353
+ }
354
+ chunks = append(chunks, strings.Join(words[i:end], " "))
355
+ }
356
+ return chunks
357
+ }
358
+
359
+ func generateImageSummary(imagePath string) (string, error) {
360
+ imageData, err := os.ReadFile(imagePath)
361
+ if err != nil {
362
+ return "", fmt.Errorf("failed to read image file: %w", err)
363
+ }
364
+
365
+ base64Image := base64.StdEncoding.EncodeToString(imageData)
366
+
367
+ payload := map[string]interface{}{
368
+ "model": "llava",
369
+ "prompt": "Describe this image in detail:",
370
+ "images": []string{base64Image},
371
+ "stream": true,
372
+ }
373
+
374
+ jsonPayload, err := json.Marshal(payload)
375
+ if err != nil {
376
+ return "", fmt.Errorf("failed to marshal JSON payload: %w", err)
377
+ }
378
+
379
+ ollamaHost := os.Getenv("OLLAMA_HOST")
380
+ if ollamaHost == "" {
381
+ ollamaHost = "localhost"
382
+ }
383
+ url := fmt.Sprintf("http://%s:11434/api/generate", ollamaHost)
384
+
385
+ resp, err := http.Post(url, "application/json", bytes.NewBuffer(jsonPayload))
386
+ if err != nil {
387
+ return "", fmt.Errorf("failed to send POST request: %w", err)
388
+ }
389
+ defer resp.Body.Close()
390
+
391
+ if resp.StatusCode != http.StatusOK {
392
+ body, _ := io.ReadAll(resp.Body)
393
+ return "", fmt.Errorf("unexpected response status: %d, body: %s", resp.StatusCode, string(body))
394
+ }
395
+
396
+ var summary strings.Builder
397
+ decoder := json.NewDecoder(resp.Body)
398
+ for {
399
+ var result struct {
400
+ Response string `json:"response"`
401
+ Done bool `json:"done"`
402
+ }
403
+ if err := decoder.Decode(&result); err != nil {
404
+ if err == io.EOF {
405
+ break
406
+ }
407
+ return "", fmt.Errorf("failed to decode JSON response: %w", err)
408
+ }
409
+ summary.WriteString(result.Response)
410
+ if result.Done {
411
+ break
412
+ }
413
+ }
414
+
415
+ if summary.Len() == 0 {
416
+ return "", fmt.Errorf("empty response from llava model")
417
+ }
418
+
419
+ fmt.Println("The summary of the image is: ", summary.String())
420
+
421
+ return summary.String(), nil
422
+ }
423
+
424
+ func main() {
425
+ // Set up the database connection
426
+ // load env variables
427
+ got.Load()
428
+ conn, err := pgx.Connect(context.Background(), os.Getenv("DB_URL"))
429
+ if err != nil {
430
+ log.Fatal("Unable to connect to database:", err)
431
+ }
432
+ defer conn.Close(context.Background())
433
+
434
+ // Set up the Gin router
435
+ r := gin.Default()
436
+
437
+ // Define the /add_document endpoint
438
+ r.POST("/add_document", func(c *gin.Context) {
439
+ var request struct {
440
+ Title string `json:"title"`
441
+ DocText string `json:"doc_text"`
442
+ }
443
+ if err := c.BindJSON(&request); err != nil {
444
+ c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
445
+ return
446
+ }
447
+
448
+ // Generate the embedding
449
+ embedding, err := generateEmbedding(request.DocText)
450
+ if err != nil {
451
+ c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
452
+ return
453
+ }
454
+
455
+ // Insert the document and its embedding into the items table
456
+ err = insertItem(conn, request.Title, request.DocText, embedding)
457
+ if err != nil {
458
+ c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
459
+ return
460
+ }
461
+
462
+ c.JSON(http.StatusOK, gin.H{"message": "Document chunk embedded and stored successfully!"})
463
+ })
464
+
465
+ // Add the new /query endpoint
466
+ r.POST("/query", func(c *gin.Context) {
467
+ var request struct {
468
+ Query string `json:"query"`
469
+ SessionID string `json:"sessionId"`
470
+ }
471
+ if err := c.BindJSON(&request); err != nil {
472
+ c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
473
+ return
474
+ }
475
+
476
+ session, ok := sessions[request.SessionID]
477
+ if !ok {
478
+ session = &Session{
479
+ Messages: []api.Message{
480
+ {Role: "system", Content: "You are an assistant that answers questions based on the given context."},
481
+ },
482
+ TitleFilter: "",
483
+ }
484
+ sessions[request.SessionID] = session
485
+ }
486
+
487
+ // Check for @title in the query
488
+ if strings.Contains(request.Query, "@") {
489
+ parts := strings.Split(request.Query, "@")
490
+ if len(parts) > 1 {
491
+ session.TitleFilter = strings.Split(parts[1], " ")[0]
492
+ request.Query = strings.Replace(request.Query, "@"+session.TitleFilter, "", 1)
493
+ }
494
+ }
495
+
496
+ c.Header("Content-Type", "text/event-stream")
497
+ c.Header("Cache-Control", "no-cache")
498
+ c.Header("Connection", "keep-alive")
499
+ c.Header("Access-Control-Allow-Origin", "*")
500
+ c.Header("Access-Control-Allow-Credentials", "true")
501
+ c.Header("Access-Control-Allow-Headers", "Content-Type")
502
+ c.Header("Access-Control-Allow-Methods", "POST")
503
+ c.Header("encoding", "chunked")
504
+
505
+ err := queryEmbeddings(conn, request.Query, session, c)
506
+ if err != nil {
507
+ c.SSEvent("error", err.Error())
508
+ }
509
+ c.SSEvent("done", "")
510
+ })
511
+
512
+ // Serve the index.html file
513
+ r.GET("/", func(c *gin.Context) {
514
+ c.File("index.html")
515
+ })
516
+
517
+ // Serve the docmanager.html file
518
+ r.GET("/docmanager", func(c *gin.Context) {
519
+ c.File("docmanager.html")
520
+ })
521
+
522
+ // Add a new endpoint to fetch documents
523
+ r.GET("/documents", func(c *gin.Context) {
524
+ documents, err := getDocuments(conn)
525
+ if err != nil {
526
+ c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
527
+ return
528
+ }
529
+ c.JSON(http.StatusOK, documents)
530
+ })
531
+
532
+ // Add a new endpoint to delete documents
533
+ r.POST("/delete_document", func(c *gin.Context) {
534
+ var request struct {
535
+ Title string `json:"title"`
536
+ }
537
+ if err := c.BindJSON(&request); err != nil {
538
+ c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
539
+ return
540
+ }
541
+
542
+ err := deleteDocument(conn, request.Title)
543
+ if err != nil {
544
+ c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
545
+ return
546
+ }
547
+
548
+ c.JSON(http.StatusOK, gin.H{"message": "Document deleted successfully"})
549
+ })
550
+
551
+ // Add a new endpoint to upload documents
552
+ r.POST("/upload_document", func(c *gin.Context) {
553
+ uploadDocument(c, conn)
554
+ })
555
+
556
+ // Add a new endpoint to clear the chat session
557
+ r.POST("/clear_session", func(c *gin.Context) {
558
+ var request struct {
559
+ SessionID string `json:"sessionId"`
560
+ }
561
+ if err := c.BindJSON(&request); err != nil {
562
+ c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
563
+ return
564
+ }
565
+
566
+ delete(sessions, request.SessionID)
567
+ c.JSON(http.StatusOK, gin.H{"message": "Chat session cleared successfully"})
568
+ })
569
+
570
+ // Add a new endpoint to check if Twitter data exists
571
+ r.GET("/check_data", func(c *gin.Context) {
572
+ rows, err := conn.Query(context.Background(), "SELECT DISTINCT title FROM items WHERE title LIKE '%twitter%'")
573
+ if err != nil {
574
+ c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
575
+ return
576
+ }
577
+ defer rows.Close()
578
+
579
+ var titles []string
580
+ for rows.Next() {
581
+ var title string
582
+ if err := rows.Scan(&title); err != nil {
583
+ c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
584
+ return
585
+ }
586
+ titles = append(titles, title)
587
+ }
588
+
589
+ c.JSON(http.StatusOK, gin.H{"twitter_titles": titles})
590
+ })
591
+
592
+ // Serve the describer.html file
593
+ r.GET("/describer", func(c *gin.Context) {
594
+ c.File("describer.html")
595
+ })
596
+
597
+ // Handle image description
598
+ r.POST("/describe_image", func(c *gin.Context) {
599
+ file, _, err := c.Request.FormFile("file")
600
+ if err != nil {
601
+ c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
602
+ return
603
+ }
604
+ defer file.Close()
605
+
606
+ // Create a temporary file to store the uploaded image
607
+ tempFile, err := os.CreateTemp("", "uploaded-*.jpg")
608
+ if err != nil {
609
+ c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
610
+ return
611
+ }
612
+ defer os.Remove(tempFile.Name())
613
+ defer tempFile.Close()
614
+
615
+ // Copy the uploaded file to the temporary file
616
+ _, err = io.Copy(tempFile, file)
617
+ if err != nil {
618
+ c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
619
+ return
620
+ }
621
+
622
+ c.Header("Content-Type", "text/event-stream")
623
+ c.Header("Cache-Control", "no-cache")
624
+ c.Header("Connection", "keep-alive")
625
+ c.Header("Access-Control-Allow-Origin", "*")
626
+ c.Header("Access-Control-Allow-Credentials", "true")
627
+ c.Header("Access-Control-Allow-Headers", "Content-Type")
628
+ c.Header("Access-Control-Allow-Methods", "POST")
629
+ c.Header("encoding", "chunked")
630
+
631
+ imageData, err := os.ReadFile(tempFile.Name())
632
+ if err != nil {
633
+ c.SSEvent("error", err.Error())
634
+ return
635
+ }
636
+
637
+ base64Image := base64.StdEncoding.EncodeToString(imageData)
638
+
639
+ payload := map[string]interface{}{
640
+ "model": "llava",
641
+ "prompt": "Describe this image in detail:",
642
+ "images": []string{base64Image},
643
+ "stream": true,
644
+ }
645
+
646
+ jsonPayload, err := json.Marshal(payload)
647
+ if err != nil {
648
+ c.SSEvent("error", err.Error())
649
+ return
650
+ }
651
+
652
+ ollamaHost := os.Getenv("OLLAMA_HOST")
653
+ if ollamaHost == "" {
654
+ ollamaHost = "localhost"
655
+ }
656
+ url := fmt.Sprintf("http://%s:11434/api/generate", ollamaHost)
657
+
658
+ resp, err := http.Post(url, "application/json", bytes.NewBuffer(jsonPayload))
659
+ if err != nil {
660
+ c.SSEvent("error", err.Error())
661
+ return
662
+ }
663
+ defer resp.Body.Close()
664
+
665
+ if resp.StatusCode != http.StatusOK {
666
+ body, _ := io.ReadAll(resp.Body)
667
+ c.SSEvent("error", fmt.Sprintf("Unexpected response status: %d, body: %s", resp.StatusCode, string(body)))
668
+ return
669
+ }
670
+
671
+ decoder := json.NewDecoder(resp.Body)
672
+ for {
673
+ var result struct {
674
+ Response string `json:"response"`
675
+ Done bool `json:"done"`
676
+ }
677
+ if err := decoder.Decode(&result); err != nil {
678
+ if err == io.EOF {
679
+ break
680
+ }
681
+ c.SSEvent("error", err.Error())
682
+ return
683
+ }
684
+ if result.Response != "" {
685
+ c.SSEvent("message", result.Response)
686
+ c.Writer.Flush() // Ensure the content is sent immediately
687
+ }
688
+ if result.Done {
689
+ break
690
+ }
691
+ }
692
+
693
+ c.SSEvent("done", "")
694
+ })
695
+
696
+ // Run the Gin server
697
+ r.Run(":8080")
698
+ }
ollama-entrypoint.sh ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ set -e
3
+
4
+ # Start Ollama in the background
5
+ ollama serve &
6
+
7
+ # Wait for Ollama to be ready
8
+ until curl -s -o /dev/null -w "%{http_code}" http://localhost:11434/api/tags | grep -q "200"; do
9
+ echo "Waiting for Ollama to be ready..."
10
+ sleep 5
11
+ done
12
+
13
+ # Pull the llama3.1 model
14
+ ollama pull llama3.1
15
+ ollama pull llava
16
+ # Keep the container running
17
+ wait
ragtag4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:85c0b559c6b2545fe5174963fd47313c490dda984c16bbd4db0d2b8c9eaa56dd
3
+ size 18022930
screenplays/downloader.py ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+ import requests
4
+ from bs4 import BeautifulSoup
5
+ import argparse
6
+
7
+ # Default URL of the script to download
8
+ DEFAULT_SCRIPT_URL = "https://www.dailyscript.com/scripts/twelve_monkeys.html"
9
+
10
+ # Directory to save the downloaded script
11
+ save_dir = "scripts"
12
+ os.makedirs(save_dir, exist_ok=True)
13
+
14
+ # Function to download a script
15
+ def download_script(script_url, save_dir):
16
+ response = requests.get(script_url)
17
+ response.raise_for_status()
18
+
19
+ # Parse the HTML content
20
+ soup = BeautifulSoup(response.text, "html.parser")
21
+
22
+ # Try to find the <pre> tag
23
+ pre_tag = soup.find("pre")
24
+
25
+ if pre_tag:
26
+ # Extract the text content from the <pre> tag
27
+ text_content = pre_tag.get_text()
28
+ # Remove the line numbers
29
+ lines = text_content.split("\n")
30
+ cleaned_lines = [line.split("|", 1)[-1] for line in lines]
31
+ cleaned_text = "\n".join(cleaned_lines)
32
+ else:
33
+ # If no <pre> tag, get the text from the body
34
+ body = soup.find("body")
35
+ if body:
36
+ cleaned_text = body.get_text()
37
+ else:
38
+ raise ValueError("Could not find script content in the HTML")
39
+
40
+ # Extract the filename from the URL and change the extension to .txt
41
+ filename = os.path.basename(script_url).split(".")[0] + ".txt"
42
+ save_path = os.path.join(save_dir, filename)
43
+
44
+ with open(save_path, "w", encoding="utf-8") as file:
45
+ file.write(cleaned_text)
46
+
47
+ print(f"Downloaded: {save_path}")
48
+ return save_path
49
+
50
+ if __name__ == "__main__":
51
+ parser = argparse.ArgumentParser(description="Download a screenplay from a given URL.")
52
+ parser.add_argument("--url", type=str, default=DEFAULT_SCRIPT_URL,
53
+ help="URL of the screenplay to download (default: Twelve Monkeys)")
54
+
55
+ args = parser.parse_args()
56
+ script_url = args.url
57
+
58
+ # Extract the script name from the URL
59
+ script_name = os.path.basename(script_url).split(".")[0] + ".txt"
60
+ save_path = os.path.join(save_dir, script_name)
61
+
62
+ # Download the script
63
+ download_script(script_url, save_dir)
screenplays/scripts/AmericanBeauty_final.txt ADDED
The diff for this file is too large to render. See raw diff
 
screenplays/scripts/thefifthelement.txt ADDED
The diff for this file is too large to render. See raw diff
 
screenplays/scripts/twelve_monkeys.txt ADDED
The diff for this file is too large to render. See raw diff
 
screenplays/send_screenplay.py ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import requests
3
+ import psycopg2
4
+ import math
5
+
6
+ def chunk_text(text, chunk_size=4096):
7
+ words = text.split()
8
+ return [' '.join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]
9
+
10
+ def is_screenplay_in_db(cursor, title):
11
+ cursor.execute("SELECT COUNT(*) FROM items WHERE title LIKE %s", (f"{title}%",))
12
+ return cursor.fetchone()[0] > 0
13
+
14
+ def send_screenplay(file_path, cursor):
15
+ with open(file_path, 'r', encoding='utf-8') as file:
16
+ content = file.read()
17
+
18
+ title = os.path.splitext(os.path.basename(file_path))[0]
19
+
20
+ if is_screenplay_in_db(cursor, title):
21
+ print(f"Screenplay '{title}' already exists in the database. Skipping.")
22
+ return
23
+
24
+ chunks = chunk_text(content)
25
+
26
+ for i, chunk in enumerate(chunks):
27
+ payload = {
28
+ "title": f"{title}_chunk_{i+1}",
29
+ "doc_text": chunk
30
+ }
31
+ response = requests.post("http://localhost:8080/add_document", json=payload)
32
+ print(f"Chunk {i+1} response: {response.status_code}")
33
+
34
+ def process_scripts_folder():
35
+ conn = psycopg2.connect("dbname=ragtag user=jc password=!1newmedia host=localhost")
36
+ cursor = conn.cursor()
37
+
38
+ scripts_folder = "scripts"
39
+ for filename in os.listdir(scripts_folder):
40
+ if filename.endswith(".txt"):
41
+ file_path = os.path.join(scripts_folder, filename)
42
+ send_screenplay(file_path, cursor)
43
+
44
+ cursor.close()
45
+ conn.close()
46
+
47
+ if __name__ == "__main__":
48
+ process_scripts_folder()