Google Speech API를 사용한 Python의 음성 인식

<시간/>

음성 인식은 홈 자동화, AI 등과 같은 여러 응용 프로그램에서 가장 유용한 기능 중 하나입니다. 이 섹션에서는 Python 및 Google의 Speech API를 사용하여 음성 인식을 수행하는 방법을 살펴보겠습니다.

이 경우 음성 인식을 위해 마이크를 사용하여 오디오를 제공합니다. 마이크를 구성하려면 몇 가지 매개변수가 있습니다.

이 모듈을 사용하려면 SpeechRecognition 모듈을 설치해야 합니다. 선택 사항인 paudio라는 또 다른 모듈이 있습니다. 이를 사용하여 다양한 오디오 모드를 설정할 수 있습니다.

sudo pip3 install SpeechRecognition
sudo apt-get install python3-pyaudio

외부 마이크 또는 USB 마이크의 경우 어려움을 피하기 위해 정확한 마이크를 제공해야 합니다. Linux에서 'lsusb'를 입력하면 USB 장치에 대한 관련 정보가 표시됩니다.

두 번째 매개변수는 청크 크기입니다. 이것을 사용하여 한 번에 읽을 데이터의 양을 지정할 수 있습니다. 1024 또는 2048 등과 같이 2의 거듭제곱인 숫자가 됩니다.

또한 처리를 위해 데이터가 기록되는 빈도를 결정하기 위해 샘플링 속도를 지정해야 합니다.

주변에 불가피한 소음이 있을 수 있으므로 정확한 음성을 들을 수 있도록 주변 소음을 조정해야 합니다.

음성을 인식하는 단계

다른 마이크 관련 정보를 가져오세요.
청크 크기, 샘플링 속도, 주변 소음 조정 등을 사용하여 마이크를 구성합니다.
음성을 들을 때까지 잠시 기다리십시오.
- 음성이 인식되면 텍스트로 변환해 보십시오. 그렇지 않으면 오류가 발생합니다.
프로세스를 중지하십시오.

예시 코드

import speech_recognition as spreg
#Setup the sampling rate and the data size
sample_rate = 48000
data_size = 8192
recog = spreg.Recognizer()
with spreg.Microphone(sample_rate = sample_rate, chunk_size = data_size) as source:
recog.adjust_for_ambient_noise(source)
print('Tell Something: ')
   speech = recog.listen(source)
try:
   text = recog.recognize_google(speech)
   print('You have said: ' + text)
except spreg.UnknownValueError:
   print('Unable to recognize the audio')
except spreg.RequestError as e: 
   print("Request error from Google Speech Recognition service; {}".format(e))

출력

$ python3 318.speech_recognition.py
Tell Something: 
You have said: here we are considering the asymptotic notation Pico to calculate the upper bound 
of the time complexity so then the definition of the big O notation is like this one
$

마이크를 사용하지 않고 일부 오디오 파일을 입력으로 받아 음성으로 변환할 수도 있습니다.

예시 코드

import speech_recognition as spreg
sound_file = 'sample_audio.wav'
recog = spreg.Recognizer()
with spreg.AudioFile(sound_file) as source:
   speech = recog.record(source) #use record instead of listning
   try:
      text = recog.recognize_google(speech)
      print('The file contains: ' + text)
   except spreg.UnknownValueError:
      print('Unable to recognize the audio')
   except spreg.RequestError as e: 
      print("Request error from Google Speech Recognition service; {}".format(e))

출력

$ python3 318a.speech_recognition_file.py 
The file contains: staying ahead of the curve demand planning new technology it also helps you progress in your career
$