Speech Recognition using IBM Speech-to-Text API

Extract conversations easily from an audio recording with Python

In this article we will learn how to use IBM’s Speech to Text API to recognize speech from an audio recording file. We are going to use the free version of the API, which may have some limitations, such as audio length limit. I will talk more about this later in Speech Recognition section.

Background

If you are reading this article, I am sure you heard the term “artificial intelligence” and how important it is. I can easily tell that a nice use of artificial intelligence in the real life is speech recognition.

Recognition of speech from audio basically allows us to save time by speaking instead of typing things. This makes it more fun and easier to use our technological gadgets. This technology also helps us to interact with these devices without writing one line of code. Imagine people had to know programming to give commands to Alexa or Siri. That would be so insane. 😊

I can’t wait to show you the speech recognizer in action. Let’s get to work. Here are the steps the we will follow in this speech recognition project.

Speech Recognition Cloud Services

Many giant tech companies have their own speech recognition models. I will share some of them here to show you the big picture. These APIs are working through cloud and can be accessed from anywhere in the world as long as there is an internet connection. Also, most of them are paid services, but can be tested for free. For example, Microsoft offers a year long free access with an Azure cloud account.

Here are some of the most popular speech to text cloud services available:

Sonsuz Design is an ad-free blog. I want anyone to have access to these contents. Your support will make me happy.

Step 1 – Library

For this project, we are going to need only one library. And that is SpeechRecognition. SpeechRecognition is free and open-source. It supports multiple speech recognition engines and APIs. Such as; Microsoft Azure Speech, Google Cloud Speech, IBM Watson Speech to Text APIs and more. For this project, we will be testing the IBM Watson Speech to Text API. Feel free to check the source code and the documentation of SpeechRecognition package from here.

Let’s start off by installing the package. We are going to use pip, which is Python library manager.

pip install SpeechRecognition

After the installation process is completed, we can go ahead and open our code editor. You can also use Jupyter Notebook.

import speech_recognition as s_r

Step 2 – Import an Audio Clip

I’ve have recorded a voice memo using the computer. It was in m4a format, but recognizer doesn’t work with m4a format, that’s why I had to convert it to wav format.

audio_file = s_r.AudioFile('my_clip.wav')

Step 3 – Define the Recognizer

In this step, all we will do is defining the speech recognizer. Earlier, we have imported the library. And now, we will create a new variable and assign the recognition attribute to it.

rcgnzr = s_r.Recognizer()

Step 4 – Speech Recognizer in Action

It’s show time! We will run IBM’s speech to text on our audio file. Before running the recognizer, I will go ahead and run another function called “adjust_for_ambient_noise” and “record”, which will clean some of the noise in the recording. This way, our recognizer will be able to return more accurate results.

with audio_file as source:
    rcgnzr.adjust_for_ambient_noise(source)
    clean_audio = rcgnzr.record(source)

Perfect, now we have a cleaner audio recording. Now, let’s go ahead and run IBM’s speech recognizer. (It took me couple of hours to figure out how IBM Speech-to-Text API integrates with the SpeechRecogniton python library). Here is the shortest way to do it:

recognized_speech_ibm = r.recognize_ibm(clean_audio, username="apkikey", password= "your API Key")

Note: IBM’s API doesn’t work without an API Key. That’s why, we will need to get one from IBM Watson page. I have created an account to test this Speech-to-Text model. Good thing about IBM’s model, we can still do 500 minutes of recording using the lite account. Which is more than enough for learning purposes.

Final Step – Exporting the Result

We are almost done. It’s time to check the result. Our recognizer has detected the speeches from the audio file in the previous step. We will go ahead and check how it worked. If we are satisfied with the result, we will export the result into a text document.

To check the recognized speech, let’s print out the recognized variable:

print(recognized_speech_ibm)

Looks good. It has did a great recognition of my audio recording. I was reading a paragraph from this article. If you are not satisfied with the result, there are many ways to preprocess the audio file to get better results. Here is a nice article showing a little more detailed information about speech recognition and how to increase the prediction of the recognizer.

Now, I will export the recognized speech into a text document. We will see the message “ready!” in our terminal when the exporting is completed.

with open('recognized_speech.txt',mode ='w') as file: 
   file.write("Recognized Speech:") 
   file.write("\n") 
   file.write(recognized) 
   print("ready!")

Congrats!! You’ve accomplished building a speech recognizer if you’re reading this paragraph. Hoping that you enjoyed this hands-on tutorial and learnt something new today. The best way of practicing your programming skills is making fun projects. I’ve shared many more hands-on projects like this one. Feel free to reach me if you have any questions while implementing the program.

Let’s connect. Check my medium blog and youtube to stay inspired. Thank you,

More hands-on projects for you:

Building a Photo Translator using Python with Google Translator API

Building a Chatbot in Python – Beginner’s Guide

4 responses to “Speech Recognition using IBM Speech-to-Text API”

5 Noteworthy Machine Learning Online Courses For Everyone – Sonsuz Design says:

February 11, 2021 at 8:35 pm

[…] Speech Recognition using IBM’s Speech-to-Text API […]

LikeLike

Enhance Your Images using OpenCV Noise Reduction Algorithm – Sonsuz Design says:

February 27, 2021 at 9:13 pm

[…] Speech Recognition using IBM’s Speech-to-Text API […]

LikeLike

Building a Speech Emotion Recognizer using Python – Sonsuz Design says:

March 15, 2021 at 9:24 am

[…] Speech Recognition using IBM Speech-to-Text API […]

LikeLike

Carmela Harlor says:

March 19, 2021 at 1:45 pm

I’m not that much of a online reader to be honest but your blogs really nice, keep it up! I’ll go ahead and bookmark your site to come back later. Cheers

LikeLike