Smart Halloween: A Talking Pumpkin Created with IBM Watson and Raspberry PI

by Eugene MelnikovNovember 8, 2016
At Altoros, we enjoy celebrating Halloween. This year, our IoT team carved a talking pumpkin with a bit of artificial intelligence in its head.

In this article, we describe how to assemble such a smart pumpkin and what issues you may face while having fun with a similar IoT project.


What’s in the pumpkin brain?

To make the pumpkin smarter, we used a Raspberry PI device (as a host) and a Logitech C270 webcam (as a microphone), as well as Node.js and text/speech recognition services from IBM and Google.

Speech to Text (STT). For recognizing the English language, we tested IBM Watson’s Speech to Text. Even though the service failed to identify “Hello,” it was pretty good at converting other words.

While Speech to Text supports only eight languages, works with 50 languages. (We used for the Russian language.)

Chatbot. For generating answers, we chose the service that understands 39 languages. (Do not confuse it with You can play with the service in the browser and use voice, but its API supports only text messages. It looks like the service had been trained on teenager chats, so we sometimes got funny answers when we included slang in communication.

If your goal is not just fun, you should consider creating your own chatbot and training it. The services like,, and IBM Watson’s Conversation enable you to do that. Read our article about developing a five-minute “Hello, World” chatbot with IBM Watson and Bluemix.

Text to Speech (TTS). IBM Watson’s Text to Speech supports nine languages, and detects the language automatically. In addition, you can control voice expressiveness and customize the English voice using seven types of transformations. However, these features are not available for other languages. The service returns files in .ogg, .flac, .wav, and several other formats.

Google’s Text-to-Speech supports 30 languages, but it sounds more robotic. You specify the language; it is possible to adjust the speed of speech. The service output is in the .mp3 format.


Data workflow

This project flow is completely asynchronous.


Depending on the API, you could get more than one response from STT, and SoX will try to play them simultaneously. It allows the chatbot to start talking with itself. Although it might sound funny, we wanted to prevent this and just skipped the responses that came while we were playing the first response. Creating a queue of responses is also an option.

If you use Wi-Fi, you may experience some lags when working with audio formats that are not compressed. We selected the .flac output in IBM Watson, and Google uses .mp3 by default.

To save traffic and money, it also makes sense to either implement offline recognition for the wake-up word via PocketSphinx and then apply the online services of your choice to convert the rest of the speech or add a push-to-talk button.


Assembling the hardware

We summarized the steps you need to follow when making your own smart pumpkin. Before you start, ensure that you have these supplies:

  • a Raspberry PI device
  • an SD card
  • a USB microphone
  • an external speaker
  • Ethernet and power supply cables


To prepare the hardware:

  1. Flash the SD card using Raspbian and Etcher.
  2. Insert the SD card into your Raspberry PI.
  3. Connect the following components:
    • The USB microphone to the first USB port of the Raspberry PI device.
    • The external speaker to the AUX port.
    • The Ethernet cable to the Ethernet port. Your network must have a DHCP server.
    • The power supply cable to the micro-USB port.


Configuring Raspberry PI

After you connect the hardware, configure the Raspberry PI device:

  1. Find the device with the raspberrypi hostname in your network:

    arp -a | grep raspberrypi
  2. SSH to the Raspberry PI using the default password—raspberry:

    ssh pi@ip_from_previous_step
  3. (Optional) Enable VNC and change the password:

    sudo raspi-config

    To have VNC started on boot, add the nano ~/.config/autostart/tightvnc.desktop file with the following content:

    [Desktop Entry]
    Exec=vncserver :1
  4. (Optional) To get rid of the Ethernet cable, connect to your Wi-Fi.

    By default, channels 12 and 13 are not enabled, and PEAP-MS-CHAP v2 authentication is not supported. You might also want to use a power bank instead of the power supply cable.

  5. To set the USB microphone as the default input device, create the nano ~/.asoundrc file with the following content:

    pcm.!default {
    	type asym
    	playback.pcm {
    		type plug
    		slave.pcm "hw:0,0"
    	capture.pcm {
    		type plug
    		slave.pcm "hw:1,0"

    Here, 1 is the number of the Raspberry PI USB port.


Installation instructions


To install the dependencies—Node.js, SoX, and Forever—run the following commands:

curl -sL -o
sudo bash
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install sox libsox-fmt-all nodejs build-essential
sudo npm install forever -g


Running the application

We created an application that uses the mentioned web services to recognize voice, generate textual answers, and convert them to speech. Here are the instructions for running the app:

  1. Clone the repository and change your current directory:

    git clone
    cd pumpkin
  2. Create a application or IBM Watson services and export them using the corresponding command below:



  3. Start your application:

    forever start index.js

To stop the application, run this command:

forever stop index.js

At this point, you can back up the SD card. First, shut down your Raspberry PI via sudo shutdown or VNC. Then, insert the SD card into your card reader and follow these instructions.



Initially, we tried to use Jasper, an open-source project that allows you to create your own Siri offline or using service providers of your choice. It is possible to work with existing modules, which perform actions depending on your request, or make your own.

We spent one day compiling dependencies, but the platform failed to recognize even a wake-up word. Most of the libraries are obsolete, and it is hard to use them for languages other than English. So, we decided to proceed with external services.

However, in my opinion, using AI for a chatbot today can be beneficial only if it is trained for a specific case. Otherwise, it is just fun.

IBM Watson’s Text to Speech service is good enough to be used, but you still may notice that the voice is synthetic. The Speech to Text service did not perform very well, especially offline, and made a lot of mistakes. It is also quite expensive for full-time usage. If you need only English support, check out this comparison of voice recognition tools.


Some services provide support for a huge number of languages, but quantity does not equal to quality. For pet projects, consider free as a means of converting speech to text and Google Text-to-Speech as a tool for the opposite task. For business cases, IBM Watson seems to be a more reasonable choice.


Related reading