Building a Speech Recognizer in Python

Convert your audio files into text using Google Cloud Speech API

In this post, I will show you how to convert audio files into a text document using Python. Speech recognition is the process of this conversion. It is commonly used in the real world. For example, personal voice assistants such as Google’s Home Mini, Amazon’s Alexa, Apple’s Siri are just some of the popular ones we know.

Speech recognition helps you to save time by speaking instead of typing. It helps us to communicate with our devices without writing one line of code. This makes technological devices more accessible and easier to use. Speech recognition is a nice example of using artificial intelligence in the real world.

Here is a scene from Google I/O 2019 event. Google Assistant is action!

In this post, we will create a simple speech recognition model that can detect sentences from an audio file then we will export those sentences into a text document. In a future post, I would like to show you another great example of speech recognition where you can convert your speech into a text format in real-time.

Are you ready? Let’s get to coding!

Importing library

First, let’s install the module so that we can import and use it in our program. SpeechRecognition module supports multiple recognition APIs, and Google Speech API is one of them. You can learn more about the module from here.

pip install SpeechRecognition

Now we can import the library

import speech_recognition as sr

Create a Recognizer

In this step, we will create our recognizer instance.

r = sr.Recognizer()

Import audio file

File extensions matter while importing the audio file to our program. I’ve tested my code with a couple other of other formats but the results for “wav” format worked better. You can use an online file converter website to convert your audio files format to wav.

For example, if you are using Macbook’s voice memos to record, the audio file will be saved as m4a format. Search on Google: “Convert m4a file to wav file format online”. You will find plenty of good websites.

AudioFile is a function to import the file. Sr is the Speech Recognition module.

audio_file = sr.AudioFile(‘test.wav’)

Recognize text

We are using the recognize_google method which is speech recognition from Google’s Cloud Speech API as mentioned in the introduction.

with audio_file as source:
    audio = r.record(source)
result = r.recognize_google(audio)

Export your result into a text document

In the following code, we are creating a text file and opening it. Then exporting the result we got in the previous code. You will see “ready!” in your terminal when the process is completed.

with open(‘test.txt’,mode =’w’) as file:
    file.write("Recognized text:")

Congrats! You have created your own speech recognition program with Python. Hoping that you enjoyed this tutorial and learn something new today. The best way of practicing your coding skills is making fun projects. In a future post, I would like to share another speech recognition to detect your voice and convert it to text in real time. Follow my blog to stay connected.

4 responses to “Building a Speech Recognizer in Python”

  1. Hi – thanks for this post – it has given me a reason to start a home project. 😉
    One point to note – your Medium post on this article – is missing the line “audio_file = sr.AudioFile(‘test.wav’)” from the code listing.

    Best wishes

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: