Level Up Your Note-Taking Game using Artificial Intelligence

Auto Summarize Lectures using Python

In this post, I will show you how to convert lecture video/ audio recording into summarized written transcriptions. We will write a program that runs a speech recognition model on a recording and then divides the transcription into chapters. Well, it doesn’t end there. It understands the essential points from those chapters and summarizes them for us.

I wish I knew how to do this when I was in college. 🙂

If you are new to the Speech Recognition subject, please check the following articles: how to build a simple speech recognizer and how to build a real-time speech recognizer. Today, we will do something a little more complicated but more fun and exciting.

Let’s get started!

Table of Contents

  • Getting Started
  • Step 1 — Libraries
  • Step 2 — Lecture Audio Data
  • Step 3 —  Speech to Text with Auto Summary
  • Final Step — Checking the Results 

Getting Started

Since our topic is Speech Recognition, let’s talk about the Speech Recognition API we will use in this project. Many different cloud services and APIs are available, each with specific benefits.

In today’s tutorial, we will use AssemblyAI’s Speech-to-Text API. It is a very well-trained artificial intelligence API. It’s free to use. You will get a unique API key after creating an account. We will use that API key to use the services and features.

After deciding on the API, it’s time to pick our coding environment. The best route is usually the one you know the best — so I will use Jupyter Notebooks; it’s my favorite for data science projects.

Step 1 — Libraries

First things first, python packages, in other words, python libraries. Python has excellent libraries; some are constructed within the programming language itself, and some are third-party libraries. 

For our project, we will need three basic libraries. And they are built-in dependencies, so we don’t have to install anything extra. All we need to do is to import them into our notebook.

import sys
import time
import requests

Step 2 — Lecture Audio Data

In this step, we will find a lecture recording and use its audio data. If you want to keep things basic, feel free to use a short recording of you talking about a random topic or reading a book page, etc.

My audio data will be from one of my favorite lectures: CS50 of Harvard.

Here is the link for the lecture video. Below the lecture video, you will see the audio version of the lecture available to be downloaded. Isn’t that amazing?

It’s an hour-long recording. I will work on the first 10 minutes of it.

Now, let’s go ahead and import the recording into the program. Below the code, you can also see a screenshot of my project folder.

audio_data = lecture0.mp3
Screenshot by the Author.

Now, let’s write a function to read this audio recording. By the way, the file format should be an audio format for our reading function to work correctly. CS50’s lecture audio version was in mp3 format, which works perfectly.

Here is the reading audio function:

def read_audio(audio_data, chunk_size=5242880):
with open(audio_data, 'rb') as _file:
while True:
data = _file.read(chunk_size)
if not data:
yield data

Now, let’s post request our audio data into the cloud using our API key. 

headers = {
"authorization": "Your API key goes here"
response = requests.post('https://api.assemblyai.com/v2/upload', headers=headers, data=read_audio(audio_data))

Perfect! After running this code block, we will receive a response from the API. And that response message will include the URL address of the uploaded audio data.

Step 3 — Speech to Text with Auto Summary

Well, this step is where the magic happens. It will be a little longer than other steps, but nothing complicated is going on. We will see the benefits of using an API instead of reinventing the wheel. Our audio file is already in AssemblyAI’s cloud storage; now, it’s time to run well-trained machine learning models.

Here is the official documentation of the auto summary feature. 

I will share the code block below, and then explain each variable. 

speech_to_text_api = "https://api.assemblyai.com/v2/transcript"
data = {
"audio_url": "the upload url address goes here",
"auto_chapters": "TRUE",
headers = {
"authorization": "Your API key goes here",
"content-type": "application/json"
response = requests.post(speech_to_text_api, json=data, headers=headers)
  • The first variable defines the API model we are planning to use.
  • The second variable is a dictionary. It contains two keys-values: the audio_url and the auto_chapters. To turn on the auto-summary feature when getting the speech-to-text results, we must add the auto_chaptersboolean key with the value of TRUE.
  • The third variable is also a dictionary. It contains our API key and the content type.
  • And lastly, we have a post request to combine all the variables into it and send it to the API. The response will be reserved in the response variable.

Let’s go ahead the print out the response. 


Before moving to the final step, let’s copy the ID value from this response. This ID value is the request-id of the request we just submitted. We will need to check the request’s status and pull the results.

Final Step — Checking the Results

Almost there! 

In this final step, we are going to call back our request and then analyze the result. I will share the code block below and explain each line. 

request_url = "https://api.assemblyai.com/v2/transcript/ok8tfaqsxb-b5be-4fe5-a6d9-16f72c00faa3"
headers = {
"authorization": "Your API key goes here"
response = requests.get(request_url, headers=headers)
auto_summary_report = response.json()['chapters']
  • First, we are defining the request URL variable. It’s the API URL followed by the request-id.
  • Secondly, we are just defining the API key.
  • Thirdly, we call a get function to receive the results into our local machine.
  • The response variable contains many attributes; that’s why we are filtering out the chapters section.
  • And lastly, we are calling the variable to see the auto-summary report.

Here is a screenshot of the report: 

Screenshot by the Author. 

Congrats! As you can see, we got an auto-generated summary of different chapters. Nothing was predefined, and the artificial intelligence model listened to the lecture and then guessed the chapters. We can also see the start and endpoints as milliseconds. This way, we can tell which part of the lecture is talking about. What are your thoughts on this?

I enjoyed working on this project; it’s great to see how machine learning and artificial intelligence can be applied in the real world — hoping that you enjoyed reading it and learned something new today. Feel free to contact me if you have any questions.

If you are wondering what kind of articles I write, here are some:


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: