Sonsuz Design

Hands-on Programming Tutorials for Everyone

Building a Speech Recognizer in Python

Convert your audio files into text using Google Cloud Speech API

In this post, I will show you how to convert audio files into a text document using Python. Speech recognition is the process of this conversion. It is commonly used in the real world. For example, personal voice assistants such as Google’s Home Mini, Amazon’s Alexa, Apple’s Siri are just some of the popular ones we know.

Speech recognition helps you to save time by speaking instead of typing. It helps us to communicate with our devices without writing one line of code. This makes technological devices more accessible and easier to use. Speech recognition is a nice example of using artificial intelligence in the real world.

Here is a scene from Google I/O 2019 event. Google Assistant is action!

In this post, we will create a simple speech recognition model that can detect sentences from an audio file then we will export those sentences into a text document. In a future post, I would like to show you another great example of speech recognition where you can convert your speech into a text format in real-time.

Are you ready? Let’s get to coding!

Importing library

First, let’s install the module so that we can import and use it in our program. SpeechRecognition module supports multiple recognition APIs, and Google Speech API is one of them. You can learn more about the module from here.

pip install SpeechRecognition

Now we can import the library

import speech_recognition as sr

Create a Recognizer

In this step, we will create our recognizer instance.

r = sr.Recognizer()

Import audio file

File extensions matter while importing the audio file to our program. I’ve tested my code with a couple other of other formats but the results for “wav” format worked better. You can use an online file converter website to convert your audio files format to wav.

For example, if you are using Macbook’s voice memos to record, the audio file will be saved as m4a format. Search on Google: “Convert m4a file to wav file format online”. You will find plenty of good websites.

AudioFile is a function to import the file. Sr is the Speech Recognition module.

audio_file = sr.AudioFile(‘test.wav’)

Recognize text

We are using the recognize_google method which is speech recognition from Google’s Cloud Speech API as mentioned in the introduction.

with audio_file as source:
    r.adjust_for_ambient_noise(source)
    audio = r.record(source)

result = r.recognize_google(audio)

Export your result into a text document

In the following code, we are creating a text file and opening it. Then exporting the result we got in the previous code. You will see “ready!” in your terminal when the process is completed.

with open(‘test.txt’,mode =’w’) as file:
    file.write("Recognized text:")
    file.write(“\n”)
    file.write(result)
    print(“ready!”)

Congrats! You have created your own speech recognition program with Python. Hoping that you enjoyed this tutorial and learn something new today. The best way of practicing your coding skills is making fun projects. In a future post, I would like to share another speech recognition to detect your voice and convert it to text in real time. Follow my blog to stay connected.

Join me and thousands of other great writers on Medium. Make money writing.

Behic Guven

May 17, 2020

Deep Learning, Machine Learning, Programming

amazon alexa, apple siri, coding, Data Science, Google assistant, Google cloud, Machine Learning, Programming, Python, speech recognition, tutorial, voice assistant

4 responses to “Building a Speech Recognizer in Python”

Convert Your Speech to Text using Python says:

May 22, 2020 at 1:28 am

[…] Another project you may like: Converting your audio files to text. […]

LikeLike

Reply
Paul McGrath says:

May 26, 2020 at 4:43 pm

Hi – thanks for this post – it has given me a reason to start a home project. 😉
One point to note – your Medium post on this article – is missing the line “audio_file = sr.AudioFile(‘test.wav’)” from the code listing.

Best wishes

LikeLiked by 1 person

Reply
- sonsuzdesign says:
  
  June 7, 2020 at 4:45 pm
  
  Thank you Paul, I’ve updated the code. Appreciate it! 🙂
  
  LikeLiked by 1 person
  
  Reply
Building a Speech Emotion Recognizer using Python – Sonsuz Design says:

March 15, 2021 at 9:24 am

[…] Building a Speech Recognizer in Python […]

LikeLike

Reply