Turn your Raspberry Pi into a Translator with Speech Recognition and Playback (60+ languages)

UPDATE
This project has been picked up by Make Magazine and Radioshack to create this great step by step guide for their Weekend Project Campaign. Check out the guide here, and the amazingly awesome video below:


I get many requests from people who are still looking for cheap, easy, and fun project ideas for their Raspberry Pi’s, so I wanted to share this translator project I’ve been working on. With very little effort, we can turn this 35$ mini-computer into a feature rich language translator that not only supports voice recognition and native speaker playback, but also is capable of dynamically translating between 1000’s of language pairs, FREE! Even if you are not interested in building this exact translational tool, there are still many parts of this tutorial that might be interesting to you (speech recognition, text to speech, Microsoft/Google translation APIs). Just like the rest of my posts, this one starts with our shopping list. Most of my readers will probably already have most of these items around the house:

Shopping List

QTY Required Items Price(USD)*
1 Raspberry PI $35.00
1 Micro USB cable $5.49
1 Logitech USB Headset $28.53
1 SD Card (class 4 and 4gb minimum) $13.10
Total: $82.12
Optional Items
1 Power Supply $9.95
1 HDMI Cable $2.28
1 Case $12.75

*There are definitely cheaper options available for USB Headsets, I chose the logitech as it is plug and play. For alternatives, check this list for verified Raspberry Pi supported sound cards

Assumptions

This tutorial assumes your Raspberry Pi has:
-the latest version of Raspian installed
-an internet connection
-the correct sound card drivers for your headset

Configuring and Testing Your Headset

Before we start writing any code, lets ensure that we can record and playback sound using our USB Headset. The easiest way to do this is with the built in linux commands ‘arecord’ and ‘aplay’. But first lets make sure our file system is up to date.

sudo apt-get update
sudo apt-get upgrade

Now, plug in your USB Headset and run the following commands

cat /proc/asound/cards
cat /proc/asound/modules

You should see that the Logitech Headset is listed as card 1. Additionally, the second command should show that the driver for card 0 (the default raspberry pi output) is snd_bcm2835 and the driver for card 1 (our logitech headset) is snd_usb_audio.

alsa cards module usb headset

This is a problem because it shows that Raspberry Pi defaults to transmitting sound over its built in hardware, and does not have an audio input device configured. To solve this, we need to update ALSA (Advanced Linux Sound Architecture) to use our Headset as default for audio input and output. This can be done by a quick change to the ALSA config file located in /etc/modprobe.d/alsa-base.conf:

sudo nano /etc/modprobe.d/alsa-base.conf

Near the end of this file, change the line that says

options snd-usb-audio index=-2

to

options snd-usb-audio index=0

Save and close the file and reboot the Raspberry Pi using the following command:

sudo reboot

After the system comes back online, the sound system should be reloaded so that when we rerun the above commands…

cat /proc/asound/cards
cat /proc/asound/modules

…we should see the USB Headset is now the default input/output device (card 0) as shown below.

alsa after update

We can now test this out by recording a 5 second clip from the microphone:

arecord -d 5 -r 48000 daveconroy.wav

and play it back through the headphone speakers:

aplay daveconroy.wav

To adjust the levels you can use the built in utility alsamixer. This tool handles both audio input and output levels.

sudo alsamixer

Now that our headset is configured, we can move onto the next step of converting from Speech to Text.

Speech to Text or Speech Recognition with a Raspberry Pi

There are a few options for speech recognition with rPi’s, but I thought the best solution for this tutorial was to use Google’s Speech to Text service. This service allows us to upload the file we just recorded and convert it to text (which we will later use to translate).

Let’s create a shell script to handle this process for us.

sudo nano stt.sh

with the following contents

echo "Recording your Speech (Ctrl+C to Transcribe)"
arecord -D plughw:0,0 -q -f cd -t wav -d 0 -r 16000 | flac - -f --best --sample-rate 16000 -s -o daveconroy.flac;
 
echo "Converting Speech to Text..."
wget -q -U "Mozilla/5.0" --post-file daveconroy.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12  > stt.txt
 
echo "You Said:"
value=`cat stt.txt`
echo "$value"

Make it executable

sudo chmod +x stt.sh

The last step before we can run the script is to install the FLAC Codec that is not included in the standard Raspian image.

sudo apt-get install flac

Now we can run the Script

./stt.sh

This will automatically start recording your voice, just press Ctrl+C when you are done speaking. At that point the script uploads the sound file to Google, they transcribe it and return it so it can be displayed on our screen. Pretty impressive for only a few lines of code! Sample output below:
here is an example speech recognition raspberry pi

Microsoft Translation and Google Text to Speech

Now that we can record our voice and convert it into text, we need to translate it to our desired foreign language. I would love to be able to use Google’s Translate tool for this, but unfortunately there is a 20$ sign up fee for use of this API. I plan on purchasing this for myself, but I wanted to make this project free so every one had an opportunity to try it.

As an alternative, we will be using Microsoft’s translate service which currently is still free for public use. The list of supported languages and their corresponding codes can be found here. In our previous example we used a simple shell script, but for the translation and playback process – I’ve written a more powerful python script.

All of this code can be found on my github repository (contributions welcome!).

Lets first create the file:

sudo nano PiTranslate.py

and add the following contents

import json
import requests
import urllib
import subprocess
import argparse
 
parser = argparse.ArgumentParser(description='This is a demo script by DaveConroy.com.')
parser.add_argument('-o','--origin_language', help='Origin Language',required=True)
parser.add_argument('-d','--destination_language', help='Destination Language', required=True)
parser.add_argument('-t','--text_to_translate', help='Text to Translate', required=True)
args = parser.parse_args()
 
## show values ##
print ("Origin: %s" % args.origin_language )
print ("Destination: %s" % args.destination_language )
print ("Text: %s" % args.text_to_translate )
 
text = args.text_to_translate
origin_language=args.origin_language
destination_language=args.destination_language
 
 
def speakOriginText(phrase):
    googleSpeechURL = "http://translate.google.com/translate_tts?tl="+ origin_language +"&q=" + phrase
    subprocess.call(["mplayer",googleSpeechURL], shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
 
def speakDestinationText(phrase):
    googleSpeechURL = "http://translate.google.com/translate_tts?tl=" + destination_language +"&q=" + phrase
    print googleSpeechURL
    subprocess.call(["mplayer",googleSpeechURL], shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
 
args = {
        'client_id': '',#your client id here
        'client_secret': '',#your azure secret here
        'scope': 'http://api.microsofttranslator.com',
        'grant_type': 'client_credentials'
    }
 
oauth_url = 'https://datamarket.accesscontrol.windows.net/v2/OAuth2-13'
oauth_junk = json.loads(requests.post(oauth_url,data=urllib.urlencode(args)).content)
translation_args = {
        'text': text,
        'to': destination_language,
        'from': origin_language
        }
 
headers={'Authorization': 'Bearer '+oauth_junk['access_token']}
translation_url = 'http://api.microsofttranslator.com/V2/Ajax.svc/Translate?'
translation_result = requests.get(translation_url+urllib.urlencode(translation_args),headers=headers)
translation=translation_result.text[2:-1]
 
speakOriginText('Translating ' + translation_args["text"])
speakDestinationText(translation)

For the script to run we need to import a few python libraries and a media player.

sudo apt-get install python-pip mplayer
sudo pip install requests

The last thing we need to do before we can run the script is sign up for a Microsoft Azure Marketplace API key. To do so, simply visit the marketplace, register an application, and then enter your client id and secret passcode into the script above.

Now we can run the script:

sudo python PiTranslate.py -o en -d es -t "hello my name is david conroy"

The script has 3 required inputs:
-o orignation language
-d destination language
-t “text to translate”

hola me nombre david conroy

The above command starts in English and translates to Spanish. My favorite part about the whole tutorial is how quickly you can change between languages you are translating, and how the returned voice changes according to the destination language.

Putting it all Together

It is actually very easy to combine the two scripts we created in this tutorial. In fact, it only takes one line of code to be added to the bottom of stt.sh shell script we created earlier (assuming PiTranslate.py and stt.sh are in the same directory).

sudo nano stt.sh

python PiTranslate.py -o en -d es -t "$value"

For those of you who skipped around in this tutorial, here is the entire script again with that line added:

echo "Recording your Speech (Ctrl+C to Transcribe)"
arecord -D plughw:0,0 -f cd -t wav -d 0 -q -r 16000 | flac - -s -f --best --sample-rate 16000 -o daveconroy.flac;
 
echo "Converting Speech to Text..."
wget -q -U "Mozilla/5.0" --post-file daveconroy.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12  > stt.txt
 
echo "You Said:"
value=`cat stt.txt`
echo "$value"
 
#translate from English to Spanish and play over speakers
python PiTranslate.py -o en -d es -t "$value"

Now, run the Speech To Text script again, and it will translate it from English to Spanish by default.

./stt.sh

Change your origin and destination languages in the last line as desired, and the PiTranslate.py script will do the rest! There are literally 1000’s of language pairs supported here. Here is a screenshot:

PiTranslate

Video Demo

I apologize this video is a little shaky, it was difficult holding the headset to the phone while running the scripts.

Known Limitations and Additional Resources

Both the origin and destination languages have to be supported by Microsoft Translate and Google Translate in order for this script to work.

Language Codes:
Microsoft
Google

Some special characters in certain languages will also cause trouble with the translation services, but I am working on a fix for that.

Conclusion

I really enjoyed working on this project as it incorporates a wide range of technology and tools to create something immediately useful and fun to play with. Plus, its all FREE. If you have any questions at all regarding this project, just leave a comment below or on github and I’d be happy to help you!


71 comments

  1. Amazing! You’re amazing.

  2. One of the greatest uses of Pi. Congrats.

  3. Could it be made so that it continually records and translates into text until a key is pressed?

    • Yes, I bet this could be done with sphinx, another speech recognition tool

  4. This is a great work. Thank you.

  5. when you show the text with the code are you hitting enter to submit the code at the end of the text box, i want to do this for my informatics class final project so the more detail the better!

    • Yes, enter after ever command.

      Be careful though, as some of the text boxes are full of content for necessary files (stt.sh and PiTranslate.py)

  6. Doug Barry

    Very cool stuff, thank you. I was thinking about doing something like this. I wonder if we could get a native speech recognition system to activate this kind of thing when it overhears someone say “computer”, then record the next maybe 10 seconds, trim silence, and do stuff based on content. Wish I had the time to experiment right now :)

    • You could maybe do something where you would start arecord with a set time of 10 seconds, but in the background (by adding an ampersand behind the command). Then you tell bash to wait for five seconds and start another recording of 10 secs in the background. This way you could have overlapping audio samples that you could then further analyse and store until you don’t need them any longer.
      Be careful though when using this technique, as you could easily forkbomb your own computer if you don’t set a time for the record command.

  7. hi, i want to transmet my voice when i speek from my PC to my raspberry to be amplify whith a baffle connected to the R.pi
    what should i do??
    1.creating a connexion PC-PI whith wifi (adhoc)
    2.transmette the voice from my pc
    3.recept the voice with my pi
    4.redirection to the baffle
    please help me, how can i transmette the voice from my pc?? or any step 2,3,4

  8. You are amazing.

  9. Long Nguyen

    I have problem

    http://translate.google.com/translate_tts?tl=fr&q=TranslateApiException: Cannot find an active Azure Market Place Translator Subscription associated with the request credentials. : ID=0728.V2_Json.Translate.E213E5F
    anonymous@anonymous-K40IN:~/Documents$

    • Long Nguyen

      when I translate Ennlish to French, Russian, …v.v…

    • I have the same issue from time to time.
      One moment it works fine, the other (without any changes) it throws this error.
      I have tried changing the password to a simpler one, as suggested when googling this error.

  10. hola soy de mexico, tu proyecto esta muy padre felicidades.

    hi i’m from mexico, your proyect it’s amazing. congratulations.

  11. Emiliano

    Hi Dave.
    Have you tried some generic USB audio card, instead of the Logitech?
    The audio card costs a few bucks.

    Do you know if the steps are the same with this card?

    thanks!!

    • I have, the process is the same as long as you can record your voice with the linux command arecord.

    • Please make sure the soundcard is compatible with rPi

  12. Hi Boss,
    Nice Tool need an help sorry for this Quest which is out of this Title saw ur audio recording can the audio be able to streamed like as mjpg streamer? HELP NEEDED boss pls help..!!

  13. I managed to record some sound using arecord.
    But when i execute te script I don’t get any text back,
    also the stt.sh is empty.

    Any experience with this?

    • Hrm. Can you play back the sound with aplay? Do you know what id number your sound card has? Perhaps updating the section in stt.sh (arecord -D plughw:0,0) to see if that helps. The script may be using the wrong hardware

  14. Andy Crofts

    Thanks! Works beautifully! Finally I can understand what my Finnish girlfriend is saying to me! (I have 2 scripts – enfi.sh and fien.sh)

    I actually got it working with a laptop using Linux Mint, because no joy with the webcam mic. on the Pi. Resorted to normal mic.
    The only thing that held me up a bit was the Azure bit – took awhile to sort out where to get the two required codes from. It was the “Register Application” button, that was off-screen.
    Lovely – now off to buy a decent USB soundcard. Also I want that HDMI-Pi monitor! Perfect combo.

  15. Jerome Avondo

    I actually did this a while back, some useful findings from my project to share:

    – There is a very nice “unofficial” google translate php class here: https://github.com/Stichoza/google-translate-php

    – Yandex (russian search engine) also supply an easy to use free translate api: http://api.yandex.com/translate/

    I wrapped all my functionality into a nice light-weight jquery mobile web page so I can use this on the move. You can even use the new webkit speech input on chrome,

    – Webkit-Speech https://code.google.com/p/metalmouth/wiki/webkitSpeechAPIComments

    and in general a nice little project if you want to do more voice stuff with your Pi is StevenHickson’s PiAUISuite I use this to do all sorts of stuff, like IR commands to my TV, lighting for my Philips Hue, web searches, weather etc…

    – PiAUISuite: https://github.com/StevenHickson/PiAUISuite

    That’s it!
    Nice work.

    J.

  16. I like it and I hacked a little bit to use a TTS installed in my machine with several voices and using a own bash script to translate using google translate.

    Result is amazing because you can read and listen the translation.

  17. Very cool; thanks for the tutorial. Per this stackoverflow post I added “&ie=UTF-8″ between the “tl=__” and “&q=__” is the URL for the speakDestinationText() function, and it is parsing foreign characters correctly (Russian, Japanese, et cetera).

    • Thanks soo much, it took a long time to figure out how to translate Chinese in it and that totally worked! But how does it work?

  18. In my tests seems like mplayer needs -user-agent “Mozilla/5.0″ to work with Asian languages.

  19. I get the following error:

    Traceback (most recent call last):
    File “PiTranslate.py”, line 47, in
    headers={‘Authoriziation': ‘Bearer ‘+oauth_junk['access_token']}
    KeyError: ‘access_token’]}

    Did i forgot sth.? I dont know what i have to do now. Need your help.
    Thanks in advance.

  20. Tim Burns

    Hi Dave,
    Great tutorial. Thank you for all the detail. Very nice. I’m struggling with the following syntax error in the python code: invalid syntax referring to the single quote at the end of ‘client_secret’
    Here is my code:
    args = {
    ‘client_id': ”,tims_pi_translator88
    ‘client_secret': ”, 48abasecw23ddfjsdfkjX44
    ‘scope': ‘http://api.microsofttranslator.com’,
    ‘grant_type': ‘client_credentials’
    }
    Every time I run it fails on the closing single quote on ‘client_secret’. Any suggestions are appreciated. Thank you!

  21. sorry, i’m french tester , when i use “Ctrl+C” it ‘s write
    ” $\r” command not found, i don’t understand.
    thank you for your help.

  22. how do i leave the first file after i change the 2 to a 0. i tried ^X but i didnt do anything, im trying to save and reboot but it doesnt seem to do anything. any advice?

  23. how do you save and exit the file after you change snd-usb-audio index=2 to 0?

  24. when i try to run the script ./stt.sh every line says command not found. what do i have to do?

  25. Thanx, THanx, THANKYOU…this is super.

    Ragger

    Holland

  26. This is an incredible idea. I’ve already taken your code and modified it to permit a conversation between two people using the raspberry. Thank you so much for the groundwork!

    The only problem is Microsoft and their azure api. It’s been very sporatic so far. It’ll work for 10 seconds and then not at all. Grr…
    I’ve done the one possible solution (change secret password) but it hasn’t done anything noticable. Does it work better after 24hrs?

    Or.. Would it be easier to modify the .py script and switch over to other services? I wish I knew more python…

  27. Gerardo Gomez Esteban

    David, felicitaciones estupendo trabajo. Como hacer para grabar el sonido para presentarlo en un video así como usted lo hace en su video de demo?. Me permite presentar su trabajo con los derechos de autor correspondientes?. Gracias por su apoyo. Abrazo fuerte.

    • Gracias. Era difícil grabar el sonido. Sostuve mi iPhone hasta el auricular para grabar el video. El micrófono de los auriculares y el micrófono en mi iPhone puede grabar mi voz. El iPhone también podría grabar la reproducción de los auriculares. No fue fácil.

      (editors note: I do not speak spanish but used translation services for this)

  28. Gerardo Gomez Esteban

    David, congratulations, great job. How to record sound to present it in a video like you do it in your demo video?. May I present their work with the corresponding copyright?. Thank you for your support. Strong hug.

    • You may share my work, yes.

      Dave

      • Gerardo Gomez Esteban

        Thank you, David.
        Very easy to implement and verify this nice work. Again congratulations for this wonderful job.

  29. I tried several times to execute “speech to text recognition” after making the shell script following to your instructions.

    What displayed on my screen were the following as below.

    Recording your speech
    arecord: pnc_read:1801: *********:**/********
    -:WARNING: unexpected EOF; expected ********** samples. got o samples
    converting speech to text
    You Said

    Please advise me.

    • Have you verified your mic is working? Are you able to record and playback with arecord and aplay?

  30. Hi Dave,
    I had tried to do the same stt file but my system doesn’t save nothing in the stt.txt file, so my program does’t answer anything:

    pi@raspberrypi ~/Domotica $ ./stt.sh
    Recording… Press Ctrl+C to Stop.
    ^CProcessing…
    You Said:

    I have the same code as you, my micro records well so the problem i think is in any plugin i need to install or something like that,do you know what could be the problem?

    Thank you,
    Jorge
    (Sorry for my bad english)

  31. umur can

    Hello, firstly thank you for such a smooth and helpful tutorial. Are there any way to cap off listening with seconds instead of CTRL-C command. For example can it wait for 5 seconds and translate the voice? thank you again.

  32. LUIS ANDRANGO

    I will appreciate very much if you could you please tell me how to correct this error.
    Thank you!

    Arturo

  33. LUIS ANDRANGO

    I will appreciate very much if you could you please tell me how to correct this error.
    Thank you!

    Arturo

  34. steve b

    Hi Dave, installed the new GITHUB versions of code, but, what is the XXX ‘Key’ required in the new text-to-translate.py file?? I have my google dev API licence (paid for etc) so wondered what number was required? I have client ID for compute engine and server account etc… any chance you could update Pi sites etc, or githiub readme… When I run ./stt.sh, it translates the whole http:google connection translate string, not my words.. Know I have something wrong :-)

    If I solve issue before, will post..

  35. this is realy realy wonderful! I will definitely make my hands dirty on this during this weekend, thanks a lot!

  36. Hello
    When I enter ./stt.sh pi returns permission denied. I copy and paste all the code so there shouldt be a problem there. And sudo ./stt.sh does not work either.

Trackbacks/Pingbacks

  1. Raspberry Pi Becomes a Universal Translator - […] by [Lt Cdr Sato] of the Enterprise NX-01, but [Dave] has something that’s almost as good: a speech recognition, …
  2. Programmierer verwandelt Raspberry Pi in Babelfisch | ZDNet.de - […] wurden inzwischen eine Million Stück in Großbritannien gefertigt. Conroy empfiehlt dazu in seiner Anleitung ein fast genauso teures USB-Headset …
  3. Raspberry Pi: cómo hacer un traductor simultáneo con este mini-PC - […] uno. Si tu caso es alguno de estos dos, te tenemos una interesante propuesta. Convierte tu Pi en un …
  4. Transform Your Raspberry Pi Into A Universal Translator | Lifehacker Australia - […] Turn your Raspberry Pi into a Translator with Speech Recognition and Playback (60+ languages) [Dave Conroy] […]
  5. Raspberry Pi Becomes a Universal Translator #piday #raspberrypi @Raspberry_Pi « adafruit industries blog - […] David Conroy developed a 60 language capable translation device with voice recognition and native speaker playback using a Raspbery …
  6. Transform Your Raspberry Pi into a Universal Translator | Life By Dave - […] Turn your Raspberry Pi into a Translator with Speech Recognition and Playback (60+ languages) | Dave Conroy […]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>