dwarkesh commited on
Commit
00f0540
·
1 Parent(s): 1fb905d
Files changed (3) hide show
  1. requirements.txt +2 -1
  2. test.txt +53 -0
  3. transcript.py +33 -34
requirements.txt CHANGED
@@ -4,4 +4,5 @@ google-generativeai
4
  anthropic
5
  pandas
6
  youtube-transcript-api
7
- pydub
 
 
4
  anthropic
5
  pandas
6
  youtube-transcript-api
7
+ pydub
8
+ assemblyai
test.txt ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Speaker A 00:00:00
2
+
3
+ Today, I'm chatting with Adam Brown, a founder and lead of the Blueshift team at Google DeepMind, which is cracking maths and reasoning, and a theoretical physicist at Stanford. Adam, welcome.
4
+
5
+ Speaker B 00:00:11
6
+
7
+ Super excited to be here. Let's do this.
8
+
9
+ Speaker A 00:00:13
10
+
11
+ First question. What is going to be the ultimate fate of the universe? And how much confidence should we have?
12
+
13
+ Speaker B 00:00:19
14
+
15
+ I think it depends on physics we don't yet fully understand because the ultimate fate is a long time away. That extends a long way out into the future. It also probably depends on us. It's probably in our hands, depending on how the unknown physics breaks out.
16
+
17
+ Our idea of the answer to that question has changed quite a lot over the last century. In the 1930s, when they turned on the big telescopes, they discovered that the universe was expanding, which they were not previously aware of. The question is, how fast is it expanding?
18
+
19
+ Then in the 1990s, we discovered something that really surprised us. There had been a learned debate up to that point about whether it was expanding so slowly that it would just expand and then recollapse in a big crunch or whether it was expanding sufficiently fast that it would just keep going forever, maybe slowing down in its expansion but not growing forever.
20
+
21
+ Then, in possibly the worst day in human history in terms of expected value, in the 90s, we discovered something that had not been anticipated: not only is it expanding, but the rate at which it's expanding is accelerating. It's getting faster and faster as it expands. This is what's called a cosmological constant or dark energy.
22
+
23
+ That completely changes the answer to the question, "What is the ultimate fate?" if it's really there. Because it means that distant galaxies, galaxies that are more than maybe 20 billion light-years away from us right now, are being dragged away from us by the expansion of the universe. We'll never reach them. We'll never get to them because even if we headed towards them at the speed of light, the expansion of the universe is dragging them away faster than we'll be able to catch up with them.
24
+
25
+ That's really bad news because we have plans for those galaxies. Maybe we could go get them and turn them into tropical Edos or computronium or whatever we had a plan for. We can't if the cosmological constant is really there because they're being dragged away from us by the expansion of the universe.
26
+
27
+ So how confident of that picture should we be? In answer to your question, according to that picture, eventually, the ultimate fate will just be that these universes get dragged away. Only the galaxies that are currently within a dozen billion light-years of us will we be able to reach.
28
+
29
+ Speaker A 00:02:57
30
+
31
+ Wait, a dozen light-years?
32
+
33
+ Speaker B 00:02:58
34
+
35
+ Sorry, a dozen billion light-years. A dozen light-years is not many other galaxies.
36
+
37
+ Maybe a dozen billion light-years, those ones we'll be able to run out and grab. But anything beyond that is just going to be dragged away from us by the cosmological constant. So that's just a finite number of galaxies and a finite amount of resources.
38
+
39
+ But then you ask, how confident should we be? On first principles grounds, you should not be particularly confident in that answer at all. We've had a number of radical reimaginings of what the expansion and fate of the universe is in the last century, including in my lifetime.
40
+
41
+ So just on first principles grounds, you might imagine that you shouldn't be very confident, and indeed you shouldn't. We're not totally confident that the dark energy that currently seems to be pushing the universe apart is indeed going to be a feature of our universe forever. Things could change a lot.
42
+
43
+ Including, you could imagine that a future civilization could manipulate the cosmological constant and bleed it away or manipulate it in some way in order to avoid the heat death.
44
+
45
+ Speaker A 00:04:10
46
+
47
+ Can you say more about that? How would one do this, and how far would it apply? How much would it expand the cosmic horizon?
48
+
49
+ Speaker B 00:04:18
50
+
51
+ Now we're getting to more speculative levels, but it does seem to be a feature of our best theories, a completely untested feature, but a feature nevertheless, of our best theories that combine quantum mechanics and gravity that the cosmological constant isn't just some fixed value.
52
+
53
+ In fact, it can take different values, the amount of dark energy, the energy density, and dark energy in what's called different vacuums. For example, string theory has this property that there are many, many vacuums, if string theory is correct, in which the cosmological constant can take very different values. And that perhaps provides some hope.
transcript.py CHANGED
@@ -1,40 +1,34 @@
1
  import gradio as gr
2
- from deepgram import DeepgramClient, PrerecordedOptions
3
  from google import generativeai
4
  import os
5
  from pydub import AudioSegment
6
 
7
  # Initialize API clients
8
- DEEPGRAM_API_KEY = os.getenv("DEEPGRAM_API_KEY")
9
  GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
10
 
11
- dg_client = DeepgramClient(DEEPGRAM_API_KEY)
12
  generativeai.configure(api_key=GOOGLE_API_KEY)
13
  model = generativeai.GenerativeModel("gemini-exp-1206")
14
 
15
 
16
  def format_timestamp(seconds):
17
  """Convert seconds to HH:MM:SS format"""
18
- h = int(float(seconds)) // 3600
19
- m = (int(float(seconds)) % 3600) // 60
20
- s = int(float(seconds)) % 60
21
  return f"{h:02d}:{m:02d}:{s:02d}"
22
 
23
 
24
  def get_transcript(audio_path):
25
- """Get transcript from Deepgram with speaker diarization"""
26
- with open(audio_path, "rb") as audio:
27
- options = PrerecordedOptions(
28
- smart_format=True,
29
- diarize=True,
30
- utterances=True,
31
- model="nova-2",
32
- language="en-US",
33
- )
34
- response = dg_client.listen.rest.v("1").transcribe_file(
35
- {"buffer": audio, "mimetype": "audio/mp3"}, options
36
- )
37
- return response.results.utterances
38
 
39
 
40
  def format_transcript(utterances):
@@ -49,9 +43,9 @@ def format_transcript(utterances):
49
  if current_speaker != utterance.speaker:
50
  # Write out the previous section if it exists
51
  if current_text:
52
- timestamp = format_timestamp(current_start)
53
- # Add double line break after speaker/timestamp
54
- section = f"Speaker {current_speaker + 1} {timestamp}\n\n{' '.join(current_text).strip()}"
55
  formatted_sections.append(section)
56
  current_text = []
57
 
@@ -59,12 +53,15 @@ def format_transcript(utterances):
59
  current_speaker = utterance.speaker
60
  current_start = utterance.start
61
 
62
- current_text.append(utterance.transcript.strip())
63
 
64
  # Add the final section
65
  if current_text:
66
- timestamp = format_timestamp(current_start)
67
- section = f"Speaker {current_speaker + 1} {timestamp}\n\n{' '.join(current_text).strip()}"
 
 
 
68
  formatted_sections.append(section)
69
 
70
  return "\n\n".join(formatted_sections)
@@ -110,7 +107,7 @@ Enhance the following transcript, starting directly with the speaker format:
110
  """
111
 
112
  response = model.generate_content(
113
- [prompt, {"mime_type": "audio/mp3", "data": audio_segment.read()}]
114
  )
115
  return response.text
116
 
@@ -125,11 +122,13 @@ def create_chunks(utterances, target_tokens=7500):
125
  for utterance in utterances:
126
  # Start new chunk if this is first utterance
127
  if not current_chunk:
128
- current_start = utterance.start
129
  current_chunk = [utterance]
130
- current_end = utterance.end
131
  # Check if adding this utterance would exceed token limit
132
- elif (float(utterance.end) - float(current_start)) * 25 > target_tokens:
 
 
133
  # Save current chunk and start new one
134
  chunks.append(
135
  {
@@ -139,12 +138,12 @@ def create_chunks(utterances, target_tokens=7500):
139
  }
140
  )
141
  current_chunk = [utterance]
142
- current_start = utterance.start
143
- current_end = utterance.end
144
  else:
145
  # Add to current chunk
146
  current_chunk.append(utterance)
147
- current_end = utterance.end
148
 
149
  # Add final chunk
150
  if current_chunk:
@@ -157,7 +156,7 @@ def create_chunks(utterances, target_tokens=7500):
157
 
158
  def process_audio(audio_path):
159
  """Main processing pipeline"""
160
- print("Stage 1: Getting raw transcript from Deepgram...")
161
  transcript_data = get_transcript(audio_path)
162
 
163
  print("Stage 2: Processing in chunks...")
@@ -209,7 +208,7 @@ iface = gr.Interface(
209
  gr.Textbox(label="Enhanced Transcript"),
210
  ],
211
  title="Audio Transcript Enhancement",
212
- description="Upload an MP3 file to get both the original and enhanced transcripts using Deepgram and Gemini.",
213
  cache_examples=False,
214
  )
215
 
 
1
  import gradio as gr
2
+ import assemblyai as aai
3
  from google import generativeai
4
  import os
5
  from pydub import AudioSegment
6
 
7
  # Initialize API clients
8
+ ASSEMBLYAI_API_KEY = os.getenv("ASSEMBLYAI_API_KEY")
9
  GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
10
 
11
+ aai.settings.api_key = ASSEMBLYAI_API_KEY
12
  generativeai.configure(api_key=GOOGLE_API_KEY)
13
  model = generativeai.GenerativeModel("gemini-exp-1206")
14
 
15
 
16
  def format_timestamp(seconds):
17
  """Convert seconds to HH:MM:SS format"""
18
+ h = int(seconds) // 3600
19
+ m = (int(seconds) % 3600) // 60
20
+ s = int(seconds) % 60
21
  return f"{h:02d}:{m:02d}:{s:02d}"
22
 
23
 
24
  def get_transcript(audio_path):
25
+ """Get transcript from AssemblyAI with speaker diarization"""
26
+ config = aai.TranscriptionConfig(speaker_labels=True, language_code="en")
27
+
28
+ transcriber = aai.Transcriber()
29
+ transcript = transcriber.transcribe(audio_path, config=config)
30
+
31
+ return transcript.utterances
 
 
 
 
 
 
32
 
33
 
34
  def format_transcript(utterances):
 
43
  if current_speaker != utterance.speaker:
44
  # Write out the previous section if it exists
45
  if current_text:
46
+ # Convert milliseconds to seconds for timestamp
47
+ timestamp = format_timestamp(float(current_start) / 1000)
48
+ section = f"Speaker {current_speaker} {timestamp}\n\n{' '.join(current_text).strip()}"
49
  formatted_sections.append(section)
50
  current_text = []
51
 
 
53
  current_speaker = utterance.speaker
54
  current_start = utterance.start
55
 
56
+ current_text.append(utterance.text.strip())
57
 
58
  # Add the final section
59
  if current_text:
60
+ # Convert milliseconds to seconds for timestamp
61
+ timestamp = format_timestamp(float(current_start) / 1000)
62
+ section = (
63
+ f"Speaker {current_speaker} {timestamp}\n\n{' '.join(current_text).strip()}"
64
+ )
65
  formatted_sections.append(section)
66
 
67
  return "\n\n".join(formatted_sections)
 
107
  """
108
 
109
  response = model.generate_content(
110
+ [prompt, chunk_text, {"mime_type": "audio/mp3", "data": audio_segment.read()}]
111
  )
112
  return response.text
113
 
 
122
  for utterance in utterances:
123
  # Start new chunk if this is first utterance
124
  if not current_chunk:
125
+ current_start = float(utterance.start) / 1000 # Convert ms to seconds
126
  current_chunk = [utterance]
127
+ current_end = float(utterance.end) / 1000 # Convert ms to seconds
128
  # Check if adding this utterance would exceed token limit
129
+ elif (
130
+ len(" ".join(u.text for u in current_chunk)) + len(utterance.text)
131
+ ) / 4 > target_tokens:
132
  # Save current chunk and start new one
133
  chunks.append(
134
  {
 
138
  }
139
  )
140
  current_chunk = [utterance]
141
+ current_start = float(utterance.start) / 1000
142
+ current_end = float(utterance.end) / 1000
143
  else:
144
  # Add to current chunk
145
  current_chunk.append(utterance)
146
+ current_end = float(utterance.end) / 1000
147
 
148
  # Add final chunk
149
  if current_chunk:
 
156
 
157
  def process_audio(audio_path):
158
  """Main processing pipeline"""
159
+ print("Stage 1: Getting raw transcript from AssemblyAI...")
160
  transcript_data = get_transcript(audio_path)
161
 
162
  print("Stage 2: Processing in chunks...")
 
208
  gr.Textbox(label="Enhanced Transcript"),
209
  ],
210
  title="Audio Transcript Enhancement",
211
+ description="Upload an MP3 file to get both the original and enhanced transcripts using AssemblyAI and Gemini.",
212
  cache_examples=False,
213
  )
214