9farccontioshi Th3BossC commited on
Commit
a0228c6
·
0 Parent(s):

Duplicate from Th3BossC/TranscriptApi

Browse files

Co-authored-by: Diljith P Dileep <[email protected]>

.gitattributes ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tflite filter=lfs diff=lfs merge=lfs -text
29
+ *.tgz filter=lfs diff=lfs merge=lfs -text
30
+ *.wasm filter=lfs diff=lfs merge=lfs -text
31
+ *.xz filter=lfs diff=lfs merge=lfs -text
32
+ *.zip filter=lfs diff=lfs merge=lfs -text
33
+ *.zst filter=lfs diff=lfs merge=lfs -text
34
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ #python cache
2
+ **/__pycache__/
3
+
4
+
5
+ #my files
6
+
7
+ trial.py
8
+ test/
Dockerfile ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # read the doc: https://huggingface.co/docs/hub/spaces-sdks-docker
2
+ # you will also find guides on how best to write your Dockerfile
3
+
4
+ FROM python:3.9
5
+
6
+ WORKDIR /code
7
+
8
+ COPY ./requirements.txt /code/requirements.txt
9
+
10
+ RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
11
+ RUN apt update && apt install -y ffmpeg
12
+
13
+ RUN useradd -m -u 1000 user
14
+ USER user
15
+ ENV HOME=/home/user \
16
+ PATH=/home/user/.local/bin:$PATH
17
+
18
+ WORKDIR $HOME/app
19
+
20
+
21
+ COPY --chown=user . $HOME/app
22
+
23
+ CMD ["python", "app.py"]
README.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: TranscriptApi
3
+ emoji: ⚡
4
+ colorFrom: pink
5
+ colorTo: green
6
+ sdk: docker
7
+ pinned: false
8
+ duplicated_from: Th3BossC/TranscriptApi
9
+ ---
10
+
11
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
12
+
13
+
14
+
15
+ # TranscriptApi
16
+
17
+ TranscriptApi is a backend service written in Flask that provides a RESTful API for summarizing YouTube videos or uploaded files using deep learning models. It allows users to extract and summarize the textual content from video or audio files, enabling easy access to key information.
18
+
19
+ ## Table of Contents
20
+ - [Features](#features)
21
+ - [Installation](#installation)
22
+ - [Usage](#usage)
23
+
24
+ ## Features
25
+
26
+ - Extract and summarize textual content from YouTube videos or uploaded files.
27
+ - Utilizes deep learning models for accurate and efficient summarization.
28
+ - Provides a RESTful API for easy integration with other applications.
29
+ - Supports customization and configuration options to meet specific requirements.
30
+
31
+ ## Installation
32
+
33
+ 1. Clone the repository:
34
+
35
+ ```
36
+ git clone https://github.com/th3bossc/TranscriptApi.git
37
+ ```
38
+
39
+ 2. Navigate to the project directory:
40
+
41
+ ```
42
+ cd TranscriptApi
43
+ ```
44
+
45
+ 3. Install the required dependencies using pip:
46
+
47
+ ```
48
+ pip install -r requirements.txt
49
+ ```
50
+
51
+ 4. Set up the necessary configuration variables, such as API keys, in the `.env` file.
52
+
53
+ 5. Run the Flask development server:
54
+
55
+ ```
56
+ python app.py
57
+ ```
58
+
59
+ The server should now be running locally at `http://localhost:5000`.
60
+
61
+ ## Usage
62
+
63
+ To utilize the TranscriptApi service, you can make requests to the provided API endpoints. Here's an example using cURL:
64
+
65
+ ```bash and python requet examples
66
+
67
+ # summarizing video
68
+ curl -X GET http://localhost:5000/video_api/your-video-id
69
+ requests.get("http://localhost:5000/video_api/your-video-id")
70
+
71
+ # summaring pdf file
72
+ curl -X POST -H "Content-type : application/pdf" -F "[email protected]" http://localhost:5000/file_api/pdf
73
+ requests.post("http://localhost:5000/file_api/pdf", headers = {'Content-Type' : 'application/pdf'}, files = {'file' : open('yourfile.pdf', 'rb')})
74
+
75
+ # summaring text file
76
+ curl -X POST -H "Content-type : text/plain" -F "[email protected]" http://localhost:5000/file_api/txt
77
+ requests.post("http://localhost:5000/file_api/txt", headers = {'Content-Type' : 'text/plain'}, files = {'file' : open('yourfile.txt', 'rb')})
78
+
79
+ # summarizing raw text data
80
+ curl -X POST -d '{"text" : your-text-data}' http://localhost:5000/file_api/direct_text
81
+ requests.post("http://localhost:5000/file_api/direct_text, headers = {'Content-Type : 'application/json'}, json = {'text' : your-text-data})
82
+
83
+ ```
84
+
85
+ Replace `your-video-id` with the actual YouTube video ID you want to summarize.
86
+ Replace `yourfile` with the actual file path of the file you want to summarize.
87
+ Replace `your-text-data` with the actual text string you want to summarize.
88
+
TranscriptApi/__init__.py ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from flask import Flask
2
+ from flask_sqlalchemy import SQLAlchemy
3
+ from flask_cors import CORS
4
+ import os
5
+ db = SQLAlchemy()
6
+
7
+ SQLALCHEMY_DATABASE_URI = 'sqlite:///site.db'
8
+
9
+ def create_app():
10
+ app = Flask(__name__)
11
+ CORS(app)
12
+ app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///site.db'
13
+ app.config['UPLOAD_FOLDER'] = 'TranscriptApi/common/files/'
14
+ db.init_app(app)
15
+
16
+ from TranscriptApi.resources.routes import resources
17
+ app.register_blueprint(resources)
18
+
19
+ from TranscriptApi.main.routes import main
20
+ app.register_blueprint(main)
21
+
22
+ return app
TranscriptApi/__pycache__/__init__.cpython-310.pyc ADDED
Binary file (790 Bytes). View file
 
TranscriptApi/__pycache__/models.cpython-310.pyc ADDED
Binary file (1.39 kB). View file
 
TranscriptApi/common/__init__.py ADDED
File without changes
TranscriptApi/common/__pycache__/__init__.cpython-310.pyc ADDED
Binary file (162 Bytes). View file
 
TranscriptApi/common/__pycache__/utils.cpython-310.pyc ADDED
Binary file (6.07 kB). View file
 
TranscriptApi/common/utils.py ADDED
@@ -0,0 +1,218 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import librosa
3
+ import soundfile as sf
4
+ from pytube import YouTube
5
+ import urllib.parse as urlparse
6
+ from moviepy.editor import VideoFileClip
7
+ import shutil
8
+ import whisper
9
+ import torch
10
+ from transformers import pipeline
11
+ from tqdm.auto import tqdm
12
+ from PyPDF2 import PdfReader
13
+
14
+
15
+ device = 'cuda' if torch.cuda.is_available() else 'cpu'
16
+ # device = 'cpu'
17
+
18
+
19
+ checkpoint = 'Th3BossC/SummarizationModel_t5-small_opeai_tldr'
20
+
21
+
22
+
23
+
24
+
25
+
26
+ ############### video queries ###############
27
+ def title(video_id):
28
+ return YouTube('https://www.youtube.com/watch?v=' + video_id).title
29
+
30
+ def get_video_id(video_url):
31
+ url_data = urlparse.urlparse("http://www.youtube.com/watch?v=z_AbfPXTKms&NR=1")
32
+ query = urlparse.parse_qs(url_data.query)
33
+ video = query["v"][0]
34
+ return video
35
+
36
+ def get_video(video_url, location, filename = 'audio'):
37
+ if not os.path.exists(location):
38
+ os.makedirs(location)
39
+ video_filename = location + filename + '.mp4'
40
+ audio_filename = location + filename + '.mp3'
41
+ print('[INFO] downloading video...')
42
+ video = YouTube(video_url).streams.filter(file_extension = 'mp4').first().download(filename = video_filename)
43
+ video = VideoFileClip(video_filename)
44
+ print('[INFO] extracting audio from video...')
45
+ video.audio.write_audiofile(audio_filename)
46
+ #os.remove(video_filename)
47
+
48
+ return audio_filename
49
+
50
+ ############################################################
51
+
52
+
53
+ ############### Audio ###############
54
+ def chunk_audio(filename, segment_length, output_dir):
55
+ if not os.path.isdir(output_dir):
56
+ os.mkdir(output_dir)
57
+ audio, sr = librosa.load(filename, sr = 44100)
58
+ duration = librosa.get_duration(y = audio, sr = sr)
59
+ num_segments = int(duration / segment_length) + 1
60
+ print(f'[INFO] Chunking {num_segments} chunks...')
61
+
62
+ audio_files = []
63
+
64
+ for i in range(num_segments):
65
+ start = i*segment_length*sr
66
+ end = (i+1)*segment_length*sr
67
+ segment = audio[start:end]
68
+ sf.write(os.path.join(output_dir, f"segment_{i}.mp3"), segment, sr)
69
+ audio_files.append(output_dir + f'segment_{i}.mp3')
70
+
71
+ print(audio_files)
72
+ #os.remove(filename)
73
+ return audio_files
74
+
75
+ def transcribe_audio(audio_files, output_file = None, model = whisper.load_model('base', device = device)):
76
+ print('[INFO] converting audio to text...')
77
+ transcripts = []
78
+ model.to(device)
79
+ for audio_file in audio_files:
80
+ response = model.transcribe(audio_file)
81
+ transcripts.append(response['text'])
82
+
83
+ if output_file is not None:
84
+ with open(output_file, 'w') as f:
85
+ for transcript in transcripts:
86
+ f.write(transcript + '\n')
87
+
88
+ return transcripts
89
+
90
+ ############################################################
91
+
92
+
93
+ ############################################################
94
+
95
+ ############### Compile all functions ###############
96
+ def summarize_youtube_video(video_url, outputs_dir):
97
+ print(f'[INFO] running on {device}')
98
+ raw_audio_dir = f'{outputs_dir}/raw_audio/'
99
+ chunks_dir = f'{outputs_dir}/chunks/'
100
+ transcripts_file = f'{outputs_dir}/transcripts.txt'
101
+ summary_file = f'{outputs_dir}/summary.txt'
102
+ segment_length = 60*10
103
+
104
+ if os.path.exists(outputs_dir):
105
+ shutil.rmtree(outputs_dir)
106
+ os.mkdir(outputs_dir)
107
+
108
+ audio_filename = get_video(video_url, raw_audio_dir)
109
+ chunked_audio_files = chunk_audio(audio_filename, segment_length, chunks_dir)
110
+ transcriptions = transcribe_audio(chunked_audio_files, transcripts_file)
111
+
112
+
113
+ # splitting transcription into sentences
114
+ sentences = []
115
+ for transcript in transcriptions:
116
+ sentences += transcript.split('.')
117
+
118
+ sentences_len = [len(sentence) for sentence in sentences]
119
+ sentence_mean_length = sum(sentences_len) // len(sentences_len)
120
+
121
+ num_sentences_per_step = int(1600 / (sentence_mean_length))
122
+ num_steps = (len(sentences) // num_sentences_per_step) + (len(sentences) % num_sentences_per_step != 0)
123
+
124
+ print(f"""
125
+ [INFO] sentences_len : {len(sentences_len)}
126
+ [INFO] sentence_mean_length : {sentence_mean_length},
127
+ [INFO] num_sentences_per_step : {num_sentences_per_step},
128
+ [INFO] num_steps : {num_steps}
129
+ """)
130
+
131
+ summarizer = pipeline('summarization', model = checkpoint, tokenizer = checkpoint, max_length = 200, truncation = True)
132
+
133
+ summaries = []
134
+
135
+ for i in tqdm(range(num_steps)):
136
+ chunk = ' '.join(sentences[num_sentences_per_step*i : num_sentences_per_step*(i+1)])
137
+ summary = summarizer(chunk, do_sample = False)[0]['summary_text']
138
+ summaries.append(summary)
139
+
140
+ complete_summary = ' '.join(summaries)
141
+ with open(summary_file, 'w') as f:
142
+ f.write(complete_summary)
143
+
144
+ with open(transcripts_file, 'r') as f:
145
+ complete_transcript = f.read()
146
+ return {'transcript': complete_transcript, 'summary' : complete_summary}
147
+ ############################################################
148
+
149
+
150
+
151
+ ############ File Summarize ############
152
+
153
+ def extract_text_pdf(file_location = 'TranscriptApi/static/files/temp.pdf'):
154
+ reader = PdfReader(file_location)
155
+ text = ""
156
+ for page in reader.pages:
157
+ text += page.extract_text()
158
+ return text;
159
+
160
+ def extract_text_txt(file_location = 'TranscriptApi/static/files/temp.txt'):
161
+ with open(file_location, "r") as f:
162
+ text = f.read()
163
+ return text
164
+
165
+
166
+
167
+
168
+ def summarize_string(text : str):
169
+ sentences = text.split('.')
170
+
171
+ summarizer = pipeline('summarization', model = checkpoint, tokenizer = checkpoint, max_length = 200, truncation = True, device = 0)
172
+
173
+ sentences_len = [len(sentence) for sentence in sentences]
174
+ sentence_mean_length = sum(sentences_len) // len(sentences_len)
175
+
176
+ num_sentences_per_step = int(1600 / (sentence_mean_length))
177
+ num_steps = (len(sentences) // num_sentences_per_step) + (len(sentences) % num_sentences_per_step != 0)
178
+
179
+ print(f"""
180
+ [INFO] sentences_len : {len(sentences_len)}
181
+ [INFO] sentence_mean_length : {sentence_mean_length},
182
+ [INFO] num_sentences_per_step : {num_sentences_per_step},
183
+ [INFO] num_steps : {num_steps}
184
+ """)
185
+
186
+
187
+ summaries = []
188
+ for i in tqdm(range(num_steps)):
189
+ chunk = ' '.join(sentences[num_sentences_per_step*i : num_sentences_per_step*(i+1)])
190
+ summary = summarizer(chunk, do_sample = False)[0]['summary_text']
191
+ summaries.append(summary)
192
+
193
+ complete_summary = ' '.join(summaries)
194
+ return complete_summary
195
+
196
+
197
+ ################################################
198
+
199
+
200
+ def summarize_file(file_location, file_extension, working_dir = "TranscriptApi/static/files"):
201
+ # _, file_extension = os.path.splitext(file_location)
202
+ text = ""
203
+ if file_extension == 'pdf':
204
+ text = extract_text_pdf(file_location)
205
+ elif file_extension == 'txt':
206
+ text = extract_text_txt(file_location)
207
+ else:
208
+ return "[ERROR]"
209
+
210
+ if os.path.exists(working_dir):
211
+ shutil.rmtree(working_dir)
212
+ os.mkdir(working_dir)
213
+ return [text, summarize_string(text)]
214
+
215
+ def answer(question: str, context : str):
216
+ # qa = pipeline(task = "question-answering", model = "Th3BossC/QuestionAnsweringModel", tokenizer = "Th3BossC/QuestionAnsweringModel")
217
+ qa = pipeline(task = "question-answering", model = "deepset/roberta-base-squad2")
218
+ return qa(question = question, context = context)['answer']
TranscriptApi/main/__init__.py ADDED
File without changes
TranscriptApi/main/__pycache__/__init__.cpython-310.pyc ADDED
Binary file (160 Bytes). View file
 
TranscriptApi/main/__pycache__/routes.cpython-310.pyc ADDED
Binary file (634 Bytes). View file
 
TranscriptApi/main/routes.py ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from flask import Blueprint, render_template, url_for
2
+ from TranscriptApi.resources.routes import api
3
+ main = Blueprint('main', __name__)
4
+
5
+ @main.route('/')
6
+ @main.route('/home')
7
+ def home():
8
+ return render_template('home.html')
9
+
10
+
11
+ @main.route('/online')
12
+ def online():
13
+ return {"online" : "yes"}, 200
TranscriptApi/models.py ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from TranscriptApi import db
2
+ from datetime import datetime
3
+
4
+ class VideoSummary(db.Model):
5
+ id = db.Column(db.Integer, primary_key = True)
6
+ date = db.Column(db.DateTime(), nullable = False, default = datetime.utcnow)
7
+ video_id = db.Column(db.String(10), unique = True, nullable = False)
8
+ title = db.Column(db.String(100), nullable = False)
9
+ transcript = db.Column(db.Text(), nullable = False)
10
+ summary = db.Column(db.Text(), nullable = False)
11
+
12
+ def __repr__(self):
13
+ f'VideoSummary({self.id}, {self.video_id}, {self.title})'
14
+
15
+
16
+ class FileSummary(db.Model):
17
+ id = db.Column(db.Integer, primary_key = True)
18
+ date = db.Column(db.DateTime(), nullable = False, default = datetime.utcnow)
19
+ title = db.Column(db.String(100), nullable = False)
20
+ transcript = db.Column(db.Text(), nullable = False)
21
+ summary = db.Column(db.Text(), nullable = False)
22
+
23
+ def __repr__(self):
24
+ f"FileSummary({self.id}, {self.title})"
TranscriptApi/resources/__init__.py ADDED
File without changes
TranscriptApi/resources/__pycache__/__init__.cpython-310.pyc ADDED
Binary file (165 Bytes). View file
 
TranscriptApi/resources/__pycache__/routes.cpython-310.pyc ADDED
Binary file (3.77 kB). View file
 
TranscriptApi/resources/routes.py ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from flask import Blueprint, request, current_app
2
+ from flask_restful import Api, Resource
3
+ from TranscriptApi.common.utils import title, summarize_youtube_video, summarize_file, summarize_string, answer
4
+ from TranscriptApi.models import VideoSummary, FileSummary
5
+ from TranscriptApi import db
6
+ import os
7
+ import shutil
8
+
9
+ resources = Blueprint('resources', __name__)
10
+ api = Api(resources)
11
+
12
+
13
+ class VideoTranscript(Resource):
14
+ def get(self, video_id):
15
+ print(request)
16
+ summaryExist = VideoSummary.query.filter_by(video_id = video_id).first()
17
+ if summaryExist is not None:
18
+ return {'video_id' : video_id, 'title' : summaryExist.title, 'summary' : summaryExist.summary}, 200
19
+ try:
20
+ video_title = title(video_id)
21
+ except:
22
+ return {'error' : 'Video ID not valid'}, 400
23
+ try:
24
+ output = summarize_youtube_video('https://www.youtube.com/watch?v=' + video_id, 'TranscriptApi/common/audio')
25
+ newVideo = VideoSummary(title = video_title, video_id = video_id, transcript = f"The title of the video is {video_title}. {output['transcript']}", summary = output['summary'])
26
+ db.session.add(newVideo)
27
+ db.session.commit()
28
+ return {'video_id' : video_id, 'title' : video_title, 'summary' : output['summary']}, 200
29
+ except Exception as e:
30
+ return 500
31
+ api.add_resource(VideoTranscript, '/video_api/<string:video_id>')
32
+
33
+
34
+ class FileTranscript(Resource):
35
+ def post(self, type):
36
+ if type == 'pdf' or type == 'txt':
37
+ print(request.files)
38
+ file = request.files['file']
39
+ file_location = os.path.join(current_app.config.get('UPLOAD_FOLDER'), file.filename)
40
+ file.save(os.path.join(current_app.config.get('UPLOAD_FOLDER'), file.filename))
41
+ transcript, summary = summarize_file(file_location = file_location, file_extension = type)
42
+ file_name = file.filename
43
+ elif type == 'direct_text':
44
+ transcript, summary = summarize_string(request.json['text'])
45
+ file_name = "Entered Text"
46
+ if summary == "[ERROR]":
47
+ if os.path.exists(current_app.config.get('UPLOAD_FOLDER')):
48
+ shutil.rmtree(current_app.config.get('UPLOAD_FOLDER'))
49
+ os.mkdir(current_app.config.get('UPLOAD_FOLDER'))
50
+ return {'error' : 'We are expreriencing some issues...'}, 500
51
+ else:
52
+ newSummary = FileSummary(title = file_name, transcript = transcript, summary = summary)
53
+ db.session.add(newSummary)
54
+ db.session.commit()
55
+ if os.path.exists(current_app.config.get('UPLOAD_FOLDER')):
56
+ shutil.rmtree(current_app.config.get('UPLOAD_FOLDER'))
57
+ os.mkdir(current_app.config.get('UPLOAD_FOLDER'))
58
+ return {'title' : file_name, 'summary' : summary}, 200
59
+ api.add_resource(FileTranscript, '/file_api/<string:type>')
60
+
61
+
62
+ class VideoQuestions(Resource):
63
+ def post(self, video_id):
64
+ print(request.json)
65
+ videoExists = VideoSummary.query.filter_by(video_id = video_id).first()
66
+ if videoExists is None:
67
+ transcript, summary = summarize_youtube_video('https://www.youtube.com/watch?v=' + video_id, 'TranscriptApi/common/audio')
68
+ video_title = title(video_id)
69
+ newVideo = VideoSummary(title = video_title, video_id = video_id, transcript = f"The title of the video is {video_title}. {transcript}", summary = summary)
70
+
71
+ VideoExists = VideoSummary.query.filter_by(video_id = video_id).first()
72
+ data = request.json # {question : "blabla"}
73
+ try:
74
+ ans = answer(question = data["question"], context = VideoExists.transcript)
75
+ return {'question' : data['question'], 'answer' : ans}, 200
76
+ except:
77
+ return {'error' : 'something went wrong'}, 500
78
+ api.add_resource(VideoQuestions, '/video_question_api/<string:video_id>')
79
+
80
+
81
+ class FileQuestions(Resource):
82
+ def post(self, id):
83
+ transcriptData = FileSummary.query.filter_by(id = id).first()
84
+ print(transcriptData)
85
+ if transcriptData is not None:
86
+ ans = answer(question = request.json['question'], context = transcriptData.transcript)
87
+ return {'question' : request.json['question'], 'answer' : ans}, 200
88
+ else:
89
+ return {'error' : 'file not found'}, 400
90
+ api.add_resource(FileQuestions, '/file_question_api/<int:id>')
TranscriptApi/static/app.js ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ function youtube_video_id(url){
2
+ var regExp = /^.*((youtu.be\/)|(v\/)|(\/u\/\w\/)|(embed\/)|(watch\?))\??v?=?([^#&?]*).*/;
3
+ var match = url.match(regExp);
4
+ return (match&&match[7].length==11)? match[7] : false;
5
+ }
6
+
7
+
8
+ // Theme implementation
9
+
10
+ const theme = localStorage.getItem('theme');
11
+ const navbar_bg = localStorage.getItem('navbar-bg');
12
+ const navbar_color = localStorage.getItem('navbar-color');
13
+ const button_content = localStorage.getItem('button-content');
14
+
15
+
16
+ const themeButton = document.getElementById('theme');
17
+ const body = document.body;
18
+ const nav = document.getElementById('navbar');
19
+
20
+ body.classList.add(theme || 'light');
21
+ nav.classList.add(navbar_bg || 'bg-light');
22
+ nav.classList.add(navbar_color || 'navbar-light')
23
+ themeButton.innerHTML = button_content || '<i class="bi bi-moon-fill"></i> Toggle Theme';
24
+
25
+
26
+ themeButton.onclick = () => {
27
+ if (body.classList.contains('light')) {
28
+ body.classList.replace('light', 'dark');
29
+ nav.classList.replace('bg-light', 'bg-dark');
30
+ nav.classList.replace('navbar-light', 'navbar-dark');
31
+ themeButton.innerHTML = '<i class="bi bi-brightness-high-fill"></i> Toggle Theme'
32
+
33
+ localStorage.setItem('theme', 'dark');
34
+ localStorage.setItem('navbar-bg', 'bg-dark');
35
+ localStorage.setItem('navbar-color', 'navbar-dark');
36
+ localStorage.setItem('button-content', themeButton.innerHTML);
37
+ }
38
+ else {
39
+ body.classList.replace('dark', 'light');
40
+ nav.classList.replace('bg-dark', 'bg-light');
41
+ nav.classList.replace('navbar-dark', 'navbar-light');
42
+ themeButton.innerHTML = '<i class="bi bi-moon-fill"></i> Toggle Theme';
43
+
44
+ localStorage.setItem('theme', 'light');
45
+ localStorage.setItem('navbar-bg', 'bg-light');
46
+ localStorage.setItem('navbar-color', 'navbar-light');
47
+ localStorage.setItem('button-content', themeButton.innerHTML);
48
+ }
49
+ }
50
+
51
+ // darkButton.onclick = () => {
52
+ // body.classList.replace('light', 'dark');
53
+ // nav.classList.replace('bg-light', 'bg-dark');
54
+ // nav.classList.replace('navbar-light', 'navbar-dark');
55
+ // darkButton.classList.add('active');
56
+ // darkButton.classList.add('disabled');
57
+ // lightButton.classList.remove('active');
58
+ // lightButton.classList.remove('disabled');
59
+ // };
60
+
61
+ // lightButton.onclick = () => {
62
+ // body.classList.replace('dark', 'light');
63
+ // nav.classList.replace('bg-dark', 'bg-light');
64
+ // nav.classList.replace('navbar-dark', 'navbar-light');
65
+ // lightButton.classList.add('active');
66
+ // lightButton.classList.add('disabled');
67
+ // darkButton.classList.remove('active');
68
+ // darkButton.classList.remove('disabled');
69
+ // };
70
+
71
+
72
+ const main_content = document.getElementById('main-content');
73
+ const video_title = document.getElementById('video-title');
74
+ const video_summary = document.getElementById('video-summary');
75
+
76
+ const button = document.getElementById('submit-btn');
77
+ const form = document.getElementById('url-form');
78
+ const url = document.getElementById('url')
79
+
80
+ async function getApiData(video_id) {
81
+ const response = await fetch('http://localhost:5000/video_api/' + video_id);
82
+ const jsonData = await response.json();
83
+
84
+ console.log(jsonData);
85
+ video_title.innerHTML = jsonData['title'];
86
+ return video_summary.innerHTML = jsonData['summary'];
87
+
88
+ }
89
+
90
+
91
+
92
+ form.addEventListener('submit', (e) => {
93
+ e.preventDefault();
94
+ video_url = url.value;
95
+ if (video_url == "")
96
+ return;
97
+ video_id = youtube_video_id(video_url);
98
+ video_title.innerHTML = 'Summarizing...';
99
+ console.log(main_content.classList);
100
+ //main_content.classList.remove('visually-hidden');
101
+ main_content.style.clipPath = 'circle(200% at 50% 50%)';
102
+ video_summary.innerHTML = '<div class="progress" role="progressbar" aria-label="Animated striped example" aria-valuenow="75" aria-valuemin="0" aria-valuemax="100"> \
103
+ <div class="progress-bar progress-bar-striped progress-bar-animated" style="width: 100%"></div> \
104
+ </div>';
105
+ if (video_id == false) {
106
+ video_title.innerHTML = '[Error]';
107
+ video_summary.innerHTML = 'Invalid video URL';
108
+ return;
109
+ }
110
+ try {
111
+ getApiData(video_id);
112
+ }
113
+ catch {
114
+ video_title.innerHTML = '[Error]'
115
+ video_summary.innerHTML = 'Error Video not found';
116
+ }
117
+ });
118
+
TranscriptApi/static/images/background-dark.svg ADDED
TranscriptApi/static/images/background-light.svg ADDED
TranscriptApi/static/styles.css ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .dark {
2
+ /* --bg : #353941; */
3
+ --heading-bg : #26282B;
4
+ --button-bg : #5F85DB;
5
+ --button-hover-bg : #90B8F8;
6
+ --text-color : white;
7
+ --rev-text-color : black;
8
+
9
+ --bg : url('images/background-dark.svg');
10
+ }
11
+
12
+
13
+ .light {
14
+ /* --bg : #448EF6; */
15
+ --heading-bg : #75C2F6;
16
+ --button-bg : #65DAF7;
17
+ --button-hover-bg : #FFE981;
18
+ --text-color : black;
19
+ --rev-text-color : white;
20
+
21
+ --bg : url('images/background-light.svg');
22
+ }
23
+
24
+ nav {
25
+ transition: all 200ms ease-in-out;
26
+ transition-delay : 0ms;
27
+ }
28
+
29
+ body {
30
+ background : var(--bg);
31
+ background-size: cover;
32
+ transition: background 200ms ease-in-out, color 1000ms ease-in-out;
33
+ /* overflow: hidden; */
34
+ }
35
+
36
+ .grid {
37
+ display: flex;
38
+ flex-direction: column;
39
+ flex-wrap: wrap;
40
+ /* gap: 1rem; */
41
+ grid-template-columns: minmax(240px, 1fr);
42
+ grid-template-rows: 240px;
43
+ margin : 10px;
44
+ padding : 20px;
45
+ }
46
+
47
+
48
+
49
+
50
+ .heading {
51
+ color : var(--text-color);
52
+ margin : minmax(10px, 100px);
53
+ padding: 50px;
54
+ text-align: center;
55
+ align-self: center;
56
+ font-family: 'Opens Sans', sans-serif;
57
+ font-style: italic;
58
+ font-weight: 800;
59
+ /* background-color: var(--heading-bg); */
60
+ border-radius: 8px;
61
+ /* filter: drop-shadow(.3rem .3rem 4px black); */
62
+ transition: all 100ms ease-in-out;
63
+ transition-delay : 200ms;
64
+ }
65
+
66
+ .url-submit-form {
67
+ padding : 50px;
68
+ display: flex;
69
+ flex-direction: column;
70
+ align-items: center;
71
+ justify-content: center;
72
+ }
73
+
74
+ input[type = 'text'] {
75
+ text-align : center;
76
+ border: none;
77
+ }
78
+
79
+
80
+ input[type = 'text']::placeholder {
81
+ color: var(--text-color);
82
+ opacity: 0.4;
83
+ }
84
+
85
+ .btn-primary {
86
+ background-color : var(--button-bg) !important;
87
+ border-color : var(--button-bg) !important;
88
+ color : var(--text-color) !important;
89
+ }
90
+
91
+ .btn-primary:hover {
92
+ background-color: var(--button-hover-bg) !important;
93
+ border-color : var(--button-hover-bg) !important;
94
+ color : black !important;
95
+
96
+ }
97
+
98
+
99
+ .text {
100
+ /* grid-column : span 1 / auto; */
101
+ color : var(--text-color);
102
+ padding : 30px;
103
+ border: 2px solid var(--rev-text-color);
104
+ border-radius: 8px;
105
+ backdrop-filter: blur(10px);
106
+ clip-path: circle(0% at 50% 0%);
107
+ transition : all 200ms ease-in-out, clip-path 500ms ease-in-out;
108
+ transition-delay : 400ms;
109
+ }
110
+
111
+
112
+ .title {
113
+ font-family :'Lucida Sans', 'Lucida Sans Regular', 'Lucida Grande', 'Lucida Sans Unicode', Geneva, Verdana, sans-serif;
114
+ font-style : bold;
115
+ font-size: large;
116
+ text-align: center;
117
+ }
118
+
119
+ .content {
120
+ font-family: 'Lucida Sans', 'Lucida Sans Regular', 'Lucida Grande', 'Lucida Sans Unicode', Geneva, Verdana, sans-serif;
121
+ margin: 5px;
122
+ padding : 10px;
123
+ text-align: center;
124
+ }
125
+
TranscriptApi/templates/home.html ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta http-equiv="X-UA-Compatible" content="IE=edge">
6
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
7
+ <title>Document</title>
8
+
9
+ <link href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-KK94CHFLLe+nY2dmCWGMq91rCGa5gtU4mk92HdvYe+M/SXH301p5ILy+dN9+nJOZ" crossorigin="anonymous">
10
+ <link href = "{{url_for('static', filename = 'styles.css')}}" rel = "stylesheet">
11
+ <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/animate.css/4.1.1/animate.min.css">
12
+ <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/font/bootstrap-icons.css">
13
+
14
+ <link rel="preconnect" href="https://fonts.googleapis.com">
15
+ <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
16
+ <link href="https://fonts.googleapis.com/css2?family=Open+Sans:ital,wght@1,800&display=swap" rel="stylesheet">
17
+
18
+ <script defer src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.bundle.min.js" integrity="sha384-ENjdO4Dr2bkBIFxQpeoTz1HIcje39Wm4jDKdf19U8gI4ddQ3GYNS7NTKfAdVQSZe" crossorigin="anonymous"></script>
19
+ <script defer src = "{{url_for('static', filename = 'app.js')}}"></script>
20
+
21
+ <nav class="navbar navbar-expand-lg sticky-top", id = "navbar">
22
+ <div class="container-fluid">
23
+ <a class="navbar-brand" href="#">Video summarizer</a>
24
+ <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarNavAltMarkup" aria-controls="navbarNavAltMarkup" aria-expanded="false" aria-label="Toggle navigation">
25
+ <span class="navbar-toggler-icon"></span>
26
+ </button>
27
+ <div class="collapse navbar-collapse" id="navbarNavAltMarkup">
28
+ <div class="navbar-nav">
29
+ <button class="nav-link" aria-current="page" href="#" onclick = "location.reload();">Home</button>
30
+ <a class="nav-link" href="#" id = 'theme' style = 'transition: all 200ms ease-in-out;'>
31
+ </a>
32
+ </div>
33
+ </div>
34
+ </div>
35
+ </nav>
36
+ </head>
37
+ <body class = ''>
38
+ <section class = 'grid'>
39
+ <h1 class = 'animate__animated animate__slideInDown heading'>
40
+ This page is redundant, Please visit <a href="https://th3bossc.github.io/SummarizationApp"> here </a> for the actual site
41
+ </h1>
42
+
43
+ <!-- <div class = 'url-submit-form animate__animated animate__slideInUp'>
44
+ <form class="input-group mb-3" id = "url-form">
45
+ <input type="text" class="form-control hid" id = 'url' style = "background-color: var(--heading-bg); color : var(--text-color); transition : all 200ms ease; transition-delay : 300ms;" placeholder="Enter URL here">
46
+ </form>
47
+ <button class = "btn btn-primary hid" id = 'submit-btn' type = 'submit' form = "url-form">
48
+ Summarize
49
+ </button>
50
+ </div> -->
51
+
52
+
53
+ <!-- <div class = 'text', id = 'main-content'>
54
+ <div class = 'title'>
55
+ <strong id = 'video-title'>
56
+ Text
57
+ </strong>
58
+ <hr>
59
+ </div>
60
+ <div class = 'content', id = 'video-summary'>
61
+ Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
62
+ </div>
63
+ </div> -->
64
+ </section>
65
+ </body>
66
+ </html>
__pycache__/app.cpython-310.pyc ADDED
Binary file (332 Bytes). View file
 
app.py ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from TranscriptApi import create_app
2
+ from threading import Thread
3
+
4
+ app = create_app()
5
+
6
+
7
+ if __name__ == '__main__':
8
+ app.run(debug = False, host="0.0.0.0", port=7860)
9
+
10
+
11
+ # def run():
12
+ # app.run(host = "0.0.0.0", port = 8080)
13
+
14
+ # def keep_alive():
15
+ # t = Thread(target = run)
16
+ # t.start()
17
+
18
+ # keep_alive()
instance/site.db ADDED
Binary file (32.8 kB). View file
 
requirements.txt ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ aniso8601==9.0.1
2
+ appdirs==1.4.4
3
+ audioread==3.0.0
4
+ blinker==1.6.2
5
+ certifi==2023.5.7
6
+ cffi==1.15.1
7
+ charset-normalizer==3.1.0
8
+ click==8.1.3
9
+ colorama==0.4.6
10
+ decorator==4.4.2
11
+ ffmpeg-python==0.2.0
12
+ filelock==3.12.0
13
+ Flask==2.3.2
14
+ Flask-Cors==3.0.10
15
+ Flask-RESTful==0.3.10
16
+ Flask-SQLAlchemy==3.0.3
17
+ fsspec==2023.5.0
18
+ future==0.18.3
19
+ greenlet==2.0.2
20
+ huggingface-hub==0.15.1
21
+ idna==3.4
22
+ imageio==2.31.0
23
+ imageio-ffmpeg==0.4.8
24
+ itsdangerous==2.1.2
25
+ Jinja2==3.1.2
26
+ joblib==1.2.0
27
+ lazy_loader==0.2
28
+ librosa==0.10.0.post2
29
+ llvmlite==0.40.0
30
+ MarkupSafe==2.1.3
31
+ more-itertools==9.1.0
32
+ moviepy==1.0.3
33
+ mpmath==1.3.0
34
+ msgpack==1.0.5
35
+ networkx==3.1
36
+ numba==0.57.0
37
+ numpy==1.24.3
38
+ openai-whisper==20230314
39
+ packaging==23.1
40
+ Pillow==9.5.0
41
+ pooch==1.6.0
42
+ proglog==0.1.10
43
+ pycparser==2.21
44
+ PyPDF2==3.0.1
45
+ pytube==15.0.0
46
+ pytz==2023.3
47
+ PyYAML==6.0
48
+ regex==2023.6.3
49
+ requests==2.31.0
50
+ safetensors==0.3.1
51
+ scikit-learn==1.2.2
52
+ scipy==1.10.1
53
+ six==1.16.0
54
+ soundfile==0.12.1
55
+ soxr==0.3.5
56
+ SQLAlchemy==2.0.15
57
+ sympy==1.12
58
+ threadpoolctl==3.1.0
59
+ tiktoken==0.3.1
60
+ tokenizers==0.13.3
61
+ torch==2.0.1
62
+ torchaudio==2.0.2
63
+ torchvision==0.15.2
64
+ tqdm==4.65.0
65
+ transformers==4.30.0
66
+ typing_extensions==4.6.3
67
+ urllib3==2.0.3
68
+ Werkzeug==2.3.5