Build Your Own Voice-Controlled Robot with ESP32 & TensorFlow Lite

View All Posts

read

Want to keep up to date with the latest posts and videos? Subscribe to the newsletter

· · · · · Posts · Videos · Tags · Support

« AR Sudoku Solver in Your Browser: TensorFlow & Image Processing Magic

We weren't really going for speed - but this gets quite fast... »

HELP SUPPORT MY WORK: If you're feeling flush then please stop by Patreon Or you can make a one off donation via ko-fi

Learn how to create a voice-controlled robot using ESP32 and TensorFlow Lite with this step-by-step guide on creating neural networks, generating training data, and implementing firmware codes.

TensorFlow Lite With Platform.io and the ESP32 - Learn how to train a simple TensorFlow Lite model and run it on the ESP32 using PlatformIO! With clear instructions and a helpful video, this tutorial will have your project up and running in no time.

Build Your Own Alexa with ESP32 & Wit.ai: Step-by-Step Tutorial - Unleash your inner tech genius and build your own Alexa using an esp32 and Facebook's Wit.ai service. Learn how to create an Alexa system by utilizing TensorFlow for wake word detection, intent recognition with Wit.ai, and putting it all together on an embedded device.

Wireless Mic on ESP32: Bluetooth Struggles & Success! - Learn how to create a wireless microphone using ESP32 and Bluetooth Hands-Free Profile (HFP). Although the audio quality isn't perfect, the project pushes the ESP32's capabilities to their limits, resulting in an effective wireless speakerphone.

3D Printed Game Controller Designed in Fusion 360 - Learn how to design and build a custom game controller using 3D printing, Fusion 360, and ESP32 components in this step-by-step guide.

DIY Alexa With the ESP32 and Wit.ai - This post provides a comprehensive guide to building a do-it-yourself (DIY) Alexa using an ESP32 and Wit.ai. It illustrates how to create a wake word detection system, use Python for machine learning and employ TensorFlow for the 'wake' word identification. It also covers the usage of Wit.ai for intent recognition and managing commands. The post is fully backed with code snippets, examples and video tutorials to deliver an interactive learning experience to readers.

ESP32 Audio Input - MAX4466, MAX9814, SPH0645LM4H, INMP441 - In this blog post, I've delved deep into the world of audio input for ESP32, exploring all the different options for getting analogue audio data into the device. After discussing the use of the built-in Analogue to Digital Converts (ADCs), I2S to read ADCs with DMA, and using I2S to read directly from compatible peripherals, I go on to present hands-on experiments with four different microphones (MAX4466, MAX9814, SPH0645, INPM441). This comprehensive look at getting audio into the ESP32 should be a valuable resource for anyone hungry for a deep-dive into ESP32's audio capabilities, complete with YouTube videos for an even more detailed look!

Decoding AVI Files for Fun and... - After some quality time with my ESP32 microcontroller, I've developed a version of the TinyTV and learned a lot about video and audio streaming along the way. Using Python and Wi-Fi technology, I was able to set up the streaming server with audio data, video frames, and metadata. I've can also explored the picture quality challenges of uncompressed image data and learned about MJPEG frames. Together with JPEGDEC for depth decoding, I've managed to effectively use ESP32's dual cores to achieve an inspiring 28 frames per second. Discussing audio sync, storage options and the intricacies of container file formats for video storage led me to the AVI format. The process of reading and processing AVI file headers and the listing subtype 'movi' allowed me to make significant headway in my project. All in all, I'm pretty chuffed with my portable battery powered video player. You can check out my code over on Github!

ESP32 TV Version 3 - In the latest board revision, I've successfully resolved some key issues, including a USB interface conflict between the USB2244 and the ESP32 and a risky battery charging mistake—no more direct USB 5V to the battery! Plus, I managed to wrap this up without any clumsy bodge wiring. I've even introduced a new feature: a microphone is now on board, setting the stage for some exciting future projects. Stay tuned for what's coming!

Self Organising WS2811 LEDs - I've successfully used addressable WS2811 LED strings and an ESP-CAM board to create an adjustable lighting system. The best part is that the image processing code can be duplicated in JavaScript which allows you to use a plain dev board to drive the LEDs instead of needing a camera on your ESP32 board. If you want to replicate this project, you'll need your own ESP32 dev board and some addressable LEDs. After figuring out the location of each LED in 2D space, it's easy to map from each LED's x and y location onto a pattern you want to show on the frame buffer. Desiring to keep it accessible, I've posted detailed instructions and my sample code on GitHub, making sure anyone with basic knowledge can undertake this fun technological DIY project!

[0:00] “Forward”
[0:02] “Right”
[0:04] “Forward”
[0:06] “Right”
[0:08] “Forward”
[0:10] “Left”
[0:12] “Backward”
[0:15] “Backward”
[0:17] “Left”
[0:19] “Forward”
[0:21] Hey Everyone,
[0:22] We’re back with another dive into some speech recognition.
[0:26] In the last video, we built our very own Alexa using wake word detection running on the ESP32.
[0:33] “Marvin”
[0:35] “Tell me joke”
[0:39] “What goes up and down but does not move”
[0:43] “Stairs…”
[0:45] The actual processing of the user’s request is performed by a service called Wit.ai
[0:51] which takes speech and converts it into an intention that can be executed by the ESP32.
[0:58] In this video we’re going to some limited voice recognition on the ESP32 and build a
[1:04] voice controlled robot!
[1:09] Once again we’ll be using the Commands Dataset as our training data.
[1:13] I’ve selected a set of words that would be suitable for controlling a small robot:
[1:17] “left”, “right”, “forward”, and “backward”
[1:21] We’ll train up a neural network to recognise these words and then run that model on the
[1:27] ESP32 using TensorFlow Lite.
[1:31] We’re going to be able to reuse a lot of the code from our previous video with some minor
[1:36] modifications.
[1:37] Let’s have a quick look at generating our training data.
[1:41] We have our standed set of imports and some constants.
[1:45] In a departure from our previous Alexa work we’re going to split the words into two sections,
[1:51] command words and nonsense words.
[1:53] We’ll train our model to recognise the command words and reject the nonsense words and background noise.
[2:00] We have the same set of helper functions for getting the list of files and validating the audio
[2:05] and we have our function for extracting the spectrogram from audio data.
[2:10] Once again, we’re going to augment our data - we’ll randomly reposition the word within the audio segment
[2:17] and we’ll add some random background noise to the word.
[2:21] To get sufficient data for our command words we’ll repeat them multiple times
[2:26] this will give our neural network more data to train on and should help it to generalise.
[2:32] A couple of the words - forward and backward have fewer examples so I’ve repeated these more often.
[2:40] For our nonsense words we won’t bother repeating them as we have quite a few examples.
[2:46] As before we’ll include background noise and we’ll also include the same problem noises
[2:51] we identified in the previous project.
[2:53] With the training data generation completed we just save it to disk.
[2:58] Here are some examples of the words in their spectrogram format.
[3:05] In our previous project we just trained to recognise one word, we’ll now want to recognise multiple words.
[3:11] Once again we have our usual includes, and we have the lists of words that want to recognise.
[3:16] We load up our data and if we plot a histogram we can see the distribution of words.
[3:22] Ideally we’d have a bit more of a balanced dataset but having more negative examples
[3:26] may actually help us.
[3:28] We have a fairly simple convolutional neural network, with 2 convolution layers followed
[3:33] by a fully connected layer which is then followed by our output layer.
[3:38] As we are now trying to recognise multiple different words we use the “softmax” activation
[3:42] function and we use the “CategoricalCrossentropy” as our loss function.
[3:47] I do have a couple of introductory videos on TensorFlow that explain these terms in a bit more detail.
[3:54] After training our model we get just under 92% accuracy on our training data and just
[3:59] over 92% accuracy on our validation data.
[4:04] Our test dataset gives us a similar level of performance.
[4:10] Looking at the confusion matrix we can see that it’s mostly misclassifying our words as invalid.
[4:15] This is probably what we’d prefer as ideally we’d like to err on the side of false negatives
[4:20] instead of false positives.
[4:23] Since we don’t appear to be overfitting the model I’ve trained it on the complete dataset.
[4:28] This gives us a final accuracy of around 94% and looking at the confusion matrix we see a lot better results.
[4:35] It’s possible that now we might have some overfitting, but let’s try it in the real world.
[4:41] For that we are going to need a robot!
[4:45] I’m going to build a very simple two-wheeled robot.
[4:48] We’re going to use two continuous servos and a small powercell.
[4:52] We’ll need quite a wide wheelbase as the breadboard with the ESP32 on it is quite large.
[4:58] After a couple of iterations, I’ve ended up with something that looks like it will work.
[5:04] To assemble it, it’s pretty straightforward we just need to bolt the two servos onto the chassis
[5:09] and attach the wheels.
[5:11] The breadboard just sits on top of the whole contraption.
[5:18] motor noises…
[5:24] Let’s have a look at the firmware.
[5:27] We have some helper libraries:
[5:29] The tfmicro library contains all the TensorFlow Lite code.
[5:33] We have a wrapper around that to make it slightly easier to use.
[5:37] This library contains the trained model exported as C code along with a helper class to run
[5:43] the neural network prediction.
[5:45] We then have our audio processing.
[5:47] This recreates the code that we used when we generated the training data.
[5:52] This processes a one-second window of samples and generated the spectrogram that will be
[5:56] used by the neural network.
[5:59] Finally, we have our audio input library.
[6:03] This will read samples either from the internal ADC for analogue microphones or from the I2S
[6:08] interface for digital microphones.
[6:12] In the main application code we have the setup function which creates our command processor and our command detector.
[6:20] The command detector is run by a task that waits for audio samples to become available
[6:25] and then services the command detector.
[6:28] Our command detector rewinds the audio data by one second, gets the spectrogram and then
[6:34] runs the prediction.
[6:36] To improve the robustness of our detection we sample the prediction over multiple audio segments
[6:42] and also reject any detections that happen within one second of a previous detection.
[6:48] If we detect a command then we queue it up for processing by the command processor.
[6:54] Our command processor runs a task that listens on this queue for commands,
[6:58] when a command arrives it changes the PWM signal that is being sent to the motors to either stop them
[7:04] or set the required direction.
[7:07] To move forward we drive both motors forward, for backwards we drive both motors backward.
[7:13] For left we revers the left motor and drive the right motor forward and for right we do
[7:18] the opposite, right motor reverse and left motor forward.
[7:22] With our continuous servos a duty cycle of 1500us should hold them stopped, lower than
[7:29] this should reverse them and higher should drive them forward.
[7:34] I’ve slightly tweaked the values for the right motor forward value as it was not turning
[7:38] as fast as the left motor and this caused the robot to veer off to one side.
[7:44] Note that because we have the right motor upside down
[7:48] to drive it forward we actually run it in reverse
[7:51] and to drive it backwards we run it forward.
[7:54] You may need to calibrate your own motors to get the robot to go in a straight line.
[8:01] So, that’s the firmware code. Let’s see the robot in action again!
[8:37] How well does it actually work?
[8:40] Reasonably well…
[8:41] It’s a nice technology demonstration and fun project.
[8:45] It does occasionally confuse words and mix up left and right.
[8:48] It’s got a mind of its own and will just start wondering around it you don’t talk to it.
[9:05] We’re starting to reach the limits of what’s really possible
[9:08] We have a limited amount of RAM to play with and the models are starting to get very big.
[9:14] We also have a limited amount of CPU to play with.
[9:16] The larger models take longer to process making real-time detection harder.
[9:21] Having said that, there are a lot of improvements that can be made.
[9:25] So, thanks for watching, I hope you found the video useful and interesting, please subscribe if you did.
[9:32] All the code is on GitHub - let me know how you get on in the comments!
[9:36] See you in the next video!

HELP SUPPORT MY WORK: If you're feeling flush then please stop by Patreon Or you can make a one off donation via ko-fi

Want to keep up to date with the latest posts and videos? Subscribe to the newsletter

· · · · · Posts · Videos · Tags · Support

Build Your Own Voice-Controlled Robot with ESP32 & TensorFlow Lite

Written by

Chris Greening

Supported by

atomic14

A collection of slightly mad projects, instructive/educational videos, and generally interesting stuff. Building projects around the Arduino and ESP32 platforms - we'll be exploring AI, Computer Vision, Audio, 3D Printing - it may get a bit eclectic...

Build Your Own Voice-Controlled Robot with ESP32 & TensorFlow Lite

Related Videos

Related Posts

Written by

Chris Greening

Supported by

atomic14

A collection of slightly mad projects, instructive/educational videos, and generally interesting stuff. Building projects around the Arduino and ESP32 platforms - we'll be exploring AI, Computer Vision, Audio, 3D Printing - it may get a bit eclectic...