Convert Your Speech to Text using Python

Convert your speech to text in real-time using your microphone

You can read this post on my Medium page as well. Click here for free access.

In this post, I will show you how to convert your speech into a text document using Python. In programming words, this process is basically called Speech Recognition. It is something that we commonly use in our daily life. For example, when you are typing a message to a friend using your voice. Another great example of speech to text can be adding a subtitle (closed caption) of a talking person. Many of the subtitles that you see on Netflix shows or YouTube videos are created by machines using Artificial Intelligence. Can you imagine a group of people working all day just to add those subtitles for your favorite shows, I know it’s hard to even think about it. There comes the power of computer programming. I still remember the day I learned about for loops, it felt like I found a way to reach infinity in the real world. Anyways, enough with the introduction, let’s get some work done.

As you can understand from the title, in this post we will create a python program that will convert our speech to text and export it as a text document. If you are a person that likes to take notes, this program will help you to save time by recording yourself and also have a typed version of your recordings. It’s like winning two trophies in one game 🙂

Let’s get to coding!

Importing library

We will use the SpeechRecognition module, if you don’t have it already let’s install it real quick. No worries, installing a module in Python is super easy.

pip install SpeechRecognition

Yes, that was it. SpeechRecognition module supports multiple recognition APIs, and Google Speech API is one of them. You can learn more about the module from here. We will use Google’s recognizer in our code. Now, after installing the module is completed, we can import it to our program.

import speech_recognition as sr

Create a Recognizer

In this step, we are creating a recognizer instance.

r = sr.Recognizer()

Define your Microphone

Before defining our microphone instance, we will choose our input device. There might be multiple input devices plugged into your computer and we need to choose which one we are planning to use. As you know machines are dummies, you have to tell them exactly what to do!. Using the following code you will be able to see your input devices.

print(sr.Microphone.list_microphone_names())

Here you can see the results of me checking the input devices. I recommend running this script before you define your microphone, because you may get a different result. The script returns an array list with input names, for me I want to use the “Built-in Microphone”, so the first element of the array list. Defining the microphone code will look as follows:

mic = sr.Microphone(device_index=0)

Recognize Speech

As mentioned earlier, we will be using the recognize_google method, which is a speech recognition model created by our friends at Google. Thanks to them!

with mic as source:
   audio = r.listen(source)

result = r.recognize_google(audio)

If you want to check your results before exporting it to a text document, you can add the following line to your code.

print(result)

Final step: Exporting our result

In this step, we are creating a text document and exporting our result we got in the previous step. You will see “Exporting process completed!” line in your terminal windows when the process is done.

with open('my_result.txt',mode ='w') as file:
   file.write("Recognized text:")
   file.write("\n")
   file.write(result)
   print("Exporting process completed!")

The Code

# importing the module
import speech_recognition as sr

# create the recognizer
r = sr.Recognizer()

# define the microphone
mic = sr.Microphone(device_index=0)

# recording your speech
with mic as source:
   audio = r.listen(source)

# speech recognition 
result = r.recognize_google(audio)

with open('my_result.txt',mode ='w') as file:
   file.write("Recognized text:")
   file.write("\n")
   file.write(result)
   print("Exporting process completed!")

Congrats my friend! You have created a program that will convert your speech to text and export it as a text document. Hoping that you enjoyed reading this post, and learned something new today. Working on fun programming projects like this fun will help you to sharpen your coding skills.

Another project you may like: Converting your audio files to text.

Follow my blog to stay inspired and up-to-date with deep learning field 🙂