is the ability of a machine or program to identify words and phrases in spoken language and convert them to textual information. Speech Recognition You have probably seen it on Sci-fi, and personal assistants like , , and , and other virtual assistants that interact with through voice. Siri Cortana Google Assistant In order to understand your voice these virtual assistants need to do speech recognition. Speech Recognition is a complex process, so I'm not going to teach you how to train a Machine Learning/Deep Learning Model to do that. Instead, I will instruct you how to do it using google speech recognition API. As long as you have the basics of Python you can successfully complete this tutorial and build your own fully functioning programs in Python. speech recognition Requirements To successfully complete this tutorial, you need to have the following Python library installed on your Machine PyAudio Library SpeechRecognition Library Installation pip install PyAudio pip install SpeechRecognition SpeechRecognition library allows you to perform speech recognition with support for several engines and APIs, online and offline. Below are some of the supported Engines CMU Sphinx (works offline) Google Speech Recognition Google Cloud Speech API Wit.ai Microsoft Bing Voice Recognition Houndify API IBM Speech to Text Snowboy Hotword Detection (works offline) In this tutorial, we are going to use Google Speech recognition API which is free for basic uses perhaps it has a limit of requests you can send over a certain time. Throughout this tutorial, you will be performing Speech Recognition using sound that is directly fed from Microphone also using audio from files. Speech Recognition from Microphone When Performing Speech Recognition from Microphone, we need to record the audio from the microphone. Then, we send it to Google speech to text recognition engine, which will perform the recognition and return out transcribed text Steps involved Recording Audio from Microphone ( PyAudio) Sending Audio to the Speech recognition engine Printing the Recognized text to the screen Below is a sample code, it is pretty straight forward app.py app.py speech_recognition sr recognizer = sr.Recognizer() sr.Microphone() source: print( ) recognizer.adjust_for_ambient_noise(source, duration= ) print( ) recorded_audio = recognizer.listen(source, timeout= ) print( ) : print( ) text = recognizer.recognize_google( recorded_audio, language= ) print( .format(text)) Exception ex: print(ex) import as ''' recording the sound ''' with as "Adjusting noise " 1 "Recording for 4 seconds" 4 "Done recording" ''' Recorgnizing the Audio ''' try "Recognizing the text" "en-US" "Decoded Text : {}" except as Speech Recognition from Audio File When it comes to performing Speech Recognition from audio files only one line of code is going to change. Instead of using a Microphone as a source of audio, we will give a path to our audio file we want to transcribe to text. On demo, I have used the below sample audio. Sample Audio The below code is a sample script to perform speech recognition of audio in a file. speech_recognition sr recognizer = sr.Recognizer() sr.AudioFile( ) source: recorded_audio = recognizer.listen(source) print( ) : print( ) text = recognizer.recognize_google( recorded_audio, language= ) print( .format(text)) except Exception ex: print(ex) import as '' ' recording the sound ' '' with "./sample_audio/speech.wav" as "Done recording" '' ' Recorgnizing the Audio ' '' try "Recognizing the text" "en-US" "Decoded Text : {}" as Output kalebu@kalebu-PC:~$ python3 app_audio.py Done recording Recognizing the Decoded : python programming the best Jordan text Text is of all by Speech Recognition from Long Audio Source When you have very long audio, loading the whole audio to memory and sending it over API it can be a very slow process, to overcome that we have to split the long audio source into small chunks and then performing speech recognition on those individual chunks. We are going to use pydub to split the Long Audio Source into those small chunks. To install pydub just use pip $ ~ pip install pydub To use the below link to download sample long audio Long Sample Audio Below is a sample Python code that loads the long audio, split into the segment, and then performing the speech recognition on those individual chunks to to learn more about splitting the audio you can check out DataCamp Tutorial. os pydub AudioSegment speech_recognition sr pydub.silence split_on_silence recognizer = sr.Recognizer() long_audio = AudioSegment.from_mp3(filename) audio_chunks = split_on_silence( long_audio, min_silence_len= , silence_thresh= ) audio_chunks audio_chunk load_chunks( ): audio_chunk.export( , format= ) sr.AudioFile( ) source: audio = recognizer.listen(source) : text = recognizer.recognize_google(audio) print( .format(text)) Exception ex: print( ) print(ex) print( ) import from import import as from import : def load_chunks (filename) 1800 -17 return for in './sample_audio/long_audio.mp3' "temp" "wav" with "temp" as try "Chunk : {}" except as "Error occured" "++++++" Output $ python long_audio.py Chunk : by the time you finish reading tutorial you have already covered several techniques and natural then Chunk : learn more Chunk : forgetting to subscribe to be updated on upcoming tutorials ++++++ this Congrats you now know how to do, can't wait to see what you're going to build with this knowledge! To learn more Python you can visit my blog kalebujordan.com In case of any comment, suggestion, or difficulties comment below and I will get back to you ASAP. Previously published at https://kalebujordan.com/python-speech-recognition/