Spaces:

dwarkesh
/

producer

Running

App Files Files Community

dwarkesh commited on Dec 22, 2024

Commit

00f0540

1 Parent(s): 1fb905d

assembly

Browse files

Files changed (3) hide show

requirements.txt +2 -1
test.txt +53 -0
transcript.py +33 -34

requirements.txt CHANGED Viewed

@@ -4,4 +4,5 @@ google-generativeai
 anthropic
 pandas
 youtube-transcript-api
-pydub

 anthropic
 pandas
 youtube-transcript-api
+pydub
+assemblyai

test.txt ADDED Viewed

	@@ -0,0 +1,53 @@

+Speaker A 00:00:00
+Today, I'm chatting with Adam Brown, a founder and lead of the Blueshift team at Google DeepMind, which is cracking maths and reasoning, and a theoretical physicist at Stanford. Adam, welcome.
+Speaker B 00:00:11
+Super excited to be here. Let's do this.
+Speaker A 00:00:13
+First question. What is going to be the ultimate fate of the universe? And how much confidence should we have?
+Speaker B 00:00:19
+I think it depends on physics we don't yet fully understand because the ultimate fate is a long time away. That extends a long way out into the future. It also probably depends on us. It's probably in our hands, depending on how the unknown physics breaks out.
+Our idea of the answer to that question has changed quite a lot over the last century. In the 1930s, when they turned on the big telescopes, they discovered that the universe was expanding, which they were not previously aware of. The question is, how fast is it expanding?
+Then in the 1990s, we discovered something that really surprised us. There had been a learned debate up to that point about whether it was expanding so slowly that it would just expand and then recollapse in a big crunch or whether it was expanding sufficiently fast that it would just keep going forever, maybe slowing down in its expansion but not growing forever.
+Then, in possibly the worst day in human history in terms of expected value, in the 90s, we discovered something that had not been anticipated: not only is it expanding, but the rate at which it's expanding is accelerating. It's getting faster and faster as it expands. This is what's called a cosmological constant or dark energy.
+That completely changes the answer to the question, "What is the ultimate fate?" if it's really there. Because it means that distant galaxies, galaxies that are more than maybe 20 billion light-years away from us right now, are being dragged away from us by the expansion of the universe. We'll never reach them. We'll never get to them because even if we headed towards them at the speed of light, the expansion of the universe is dragging them away faster than we'll be able to catch up with them.
+That's really bad news because we have plans for those galaxies. Maybe we could go get them and turn them into tropical Edos or computronium or whatever we had a plan for. We can't if the cosmological constant is really there because they're being dragged away from us by the expansion of the universe.
+So how confident of that picture should we be? In answer to your question, according to that picture, eventually, the ultimate fate will just be that these universes get dragged away. Only the galaxies that are currently within a dozen billion light-years of us will we be able to reach.
+Speaker A 00:02:57
+Wait, a dozen light-years?
+Speaker B 00:02:58
+Sorry, a dozen billion light-years. A dozen light-years is not many other galaxies.
+Maybe a dozen billion light-years, those ones we'll be able to run out and grab. But anything beyond that is just going to be dragged away from us by the cosmological constant. So that's just a finite number of galaxies and a finite amount of resources.
+But then you ask, how confident should we be? On first principles grounds, you should not be particularly confident in that answer at all. We've had a number of radical reimaginings of what the expansion and fate of the universe is in the last century, including in my lifetime.
+So just on first principles grounds, you might imagine that you shouldn't be very confident, and indeed you shouldn't. We're not totally confident that the dark energy that currently seems to be pushing the universe apart is indeed going to be a feature of our universe forever. Things could change a lot.
+Including, you could imagine that a future civilization could manipulate the cosmological constant and bleed it away or manipulate it in some way in order to avoid the heat death.
+Speaker A 00:04:10
+Can you say more about that? How would one do this, and how far would it apply? How much would it expand the cosmic horizon?
+Speaker B 00:04:18
+Now we're getting to more speculative levels, but it does seem to be a feature of our best theories, a completely untested feature, but a feature nevertheless, of our best theories that combine quantum mechanics and gravity that the cosmological constant isn't just some fixed value.
+In fact, it can take different values, the amount of dark energy, the energy density, and dark energy in what's called different vacuums. For example, string theory has this property that there are many, many vacuums, if string theory is correct, in which the cosmological constant can take very different values. And that perhaps provides some hope.

transcript.py CHANGED Viewed

@@ -1,40 +1,34 @@
 import gradio as gr
-from deepgram import DeepgramClient, PrerecordedOptions
 from google import generativeai
 import os
 from pydub import AudioSegment
 # Initialize API clients
-DEEPGRAM_API_KEY = os.getenv("DEEPGRAM_API_KEY")
 GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
-dg_client = DeepgramClient(DEEPGRAM_API_KEY)
 generativeai.configure(api_key=GOOGLE_API_KEY)
 model = generativeai.GenerativeModel("gemini-exp-1206")
 def format_timestamp(seconds):
     """Convert seconds to HH:MM:SS format"""
-    h = int(float(seconds)) // 3600
-    m = (int(float(seconds)) % 3600) // 60
-    s = int(float(seconds)) % 60
     return f"{h:02d}:{m:02d}:{s:02d}"
 def get_transcript(audio_path):
-    """Get transcript from Deepgram with speaker diarization"""
-    with open(audio_path, "rb") as audio:
-        options = PrerecordedOptions(
-            smart_format=True,
-            diarize=True,
-            utterances=True,
-            model="nova-2",
-            language="en-US",
-        )
-        response = dg_client.listen.rest.v("1").transcribe_file(
-            {"buffer": audio, "mimetype": "audio/mp3"}, options
-        )
-        return response.results.utterances
 def format_transcript(utterances):
@@ -49,9 +43,9 @@ def format_transcript(utterances):
         if current_speaker != utterance.speaker:
             # Write out the previous section if it exists
             if current_text:
-                timestamp = format_timestamp(current_start)
-                # Add double line break after speaker/timestamp
-                section = f"Speaker {current_speaker + 1} {timestamp}\n\n{' '.join(current_text).strip()}"
                 formatted_sections.append(section)
                 current_text = []
@@ -59,12 +53,15 @@ def format_transcript(utterances):
             current_speaker = utterance.speaker
             current_start = utterance.start
-        current_text.append(utterance.transcript.strip())
     # Add the final section
     if current_text:
-        timestamp = format_timestamp(current_start)
-        section = f"Speaker {current_speaker + 1} {timestamp}\n\n{' '.join(current_text).strip()}"
         formatted_sections.append(section)
     return "\n\n".join(formatted_sections)
@@ -110,7 +107,7 @@ Enhance the following transcript, starting directly with the speaker format:
 """
     response = model.generate_content(
-        [prompt, {"mime_type": "audio/mp3", "data": audio_segment.read()}]
     )
     return response.text
@@ -125,11 +122,13 @@ def create_chunks(utterances, target_tokens=7500):
     for utterance in utterances:
         # Start new chunk if this is first utterance
         if not current_chunk:
-            current_start = utterance.start
             current_chunk = [utterance]
-            current_end = utterance.end
         # Check if adding this utterance would exceed token limit
-        elif (float(utterance.end) - float(current_start)) * 25 > target_tokens:
             # Save current chunk and start new one
             chunks.append(
                 {
@@ -139,12 +138,12 @@ def create_chunks(utterances, target_tokens=7500):
                 }
             )
             current_chunk = [utterance]
-            current_start = utterance.start
-            current_end = utterance.end
         else:
             # Add to current chunk
             current_chunk.append(utterance)
-            current_end = utterance.end
     # Add final chunk
     if current_chunk:
@@ -157,7 +156,7 @@ def create_chunks(utterances, target_tokens=7500):
 def process_audio(audio_path):
     """Main processing pipeline"""
-    print("Stage 1: Getting raw transcript from Deepgram...")
     transcript_data = get_transcript(audio_path)
     print("Stage 2: Processing in chunks...")
@@ -209,7 +208,7 @@ iface = gr.Interface(
         gr.Textbox(label="Enhanced Transcript"),
     ],
     title="Audio Transcript Enhancement",
-    description="Upload an MP3 file to get both the original and enhanced transcripts using Deepgram and Gemini.",
     cache_examples=False,
 )

 import gradio as gr
+import assemblyai as aai
 from google import generativeai
 import os
 from pydub import AudioSegment
 # Initialize API clients
+ASSEMBLYAI_API_KEY = os.getenv("ASSEMBLYAI_API_KEY")
 GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
+aai.settings.api_key = ASSEMBLYAI_API_KEY
 generativeai.configure(api_key=GOOGLE_API_KEY)
 model = generativeai.GenerativeModel("gemini-exp-1206")
 def format_timestamp(seconds):
     """Convert seconds to HH:MM:SS format"""
+    h = int(seconds) // 3600
+    m = (int(seconds) % 3600) // 60
+    s = int(seconds) % 60
     return f"{h:02d}:{m:02d}:{s:02d}"
 def get_transcript(audio_path):
+    """Get transcript from AssemblyAI with speaker diarization"""
+    config = aai.TranscriptionConfig(speaker_labels=True, language_code="en")
+    transcriber = aai.Transcriber()
+    transcript = transcriber.transcribe(audio_path, config=config)
+    return transcript.utterances
 def format_transcript(utterances):
         if current_speaker != utterance.speaker:
             # Write out the previous section if it exists
             if current_text:
+                # Convert milliseconds to seconds for timestamp
+                timestamp = format_timestamp(float(current_start) / 1000)
+                section = f"Speaker {current_speaker} {timestamp}\n\n{' '.join(current_text).strip()}"
                 formatted_sections.append(section)
                 current_text = []
             current_speaker = utterance.speaker
             current_start = utterance.start
+        current_text.append(utterance.text.strip())
     # Add the final section
     if current_text:
+        # Convert milliseconds to seconds for timestamp
+        timestamp = format_timestamp(float(current_start) / 1000)
+        section = (
+            f"Speaker {current_speaker} {timestamp}\n\n{' '.join(current_text).strip()}"
+        )
         formatted_sections.append(section)
     return "\n\n".join(formatted_sections)
 """
     response = model.generate_content(
+        [prompt, chunk_text, {"mime_type": "audio/mp3", "data": audio_segment.read()}]
     )
     return response.text
     for utterance in utterances:
         # Start new chunk if this is first utterance
         if not current_chunk:
+            current_start = float(utterance.start) / 1000  # Convert ms to seconds
             current_chunk = [utterance]
+            current_end = float(utterance.end) / 1000  # Convert ms to seconds
         # Check if adding this utterance would exceed token limit
+        elif (
+            len(" ".join(u.text for u in current_chunk)) + len(utterance.text)
+        ) / 4 > target_tokens:
             # Save current chunk and start new one
             chunks.append(
                 {
                 }
             )
             current_chunk = [utterance]
+            current_start = float(utterance.start) / 1000
+            current_end = float(utterance.end) / 1000
         else:
             # Add to current chunk
             current_chunk.append(utterance)
+            current_end = float(utterance.end) / 1000
     # Add final chunk
     if current_chunk:
 def process_audio(audio_path):
     """Main processing pipeline"""
+    print("Stage 1: Getting raw transcript from AssemblyAI...")
     transcript_data = get_transcript(audio_path)
     print("Stage 2: Processing in chunks...")
         gr.Textbox(label="Enhanced Transcript"),
     ],
     title="Audio Transcript Enhancement",
+    description="Upload an MP3 file to get both the original and enhanced transcripts using AssemblyAI and Gemini.",
     cache_examples=False,
 )