View All Posts
Want to keep up to date with the latest posts and videos? Subscribe to the newsletter
HELP SUPPORT MY WORK: If you're feeling flush then please stop by Patreon Or you can make a one off donation via ko-fi

I’ve been working on my version of the TinyTV for a while now. It’s all based around the ESP32 a popular and powerful WiFi enabled microcontroller.

For the first version I did the streaming from a simple Python server running on my desktop and used the WiFi capabilities of the module. I chose to do this first as in my mind it was the simplest thing.

Our server just needed three endpoints, one to get audio data, another to get a frame of video data, and finally an endpoint to get meta data about the available videos.

Web Server Endpoints

There are a couple of options for presenting the video frames - we could send completely uncompressed image data that would be ready to display directly on the screen. The screen I’m using is 280x240 pixels and each pixel is RGB565 - that’s 5 bits for red, 6 for green and 5 for blue.

This would give us a total size of 131.25KB per frame. At any reasonable frame rate this would swamp the WiFi interface and we’d spend all our time just transferring data.

A much more popular choice for this kind of thing is individual JPEG frames - which is what MJPEG or Motion JPEG is based around.

A bunch of JPEG frames from a video

With the right library (I’m using JPEGDEC) an ESP32 can decode and display a 280x240 JPEG image in about 36ms - which gives us a theoretical frame rate of about 28 frames per second.

To make things really efficient we can use the dual cores of the ESP32 and have two threads - one downloading frames from the server and another decoding and displaying the last frame.

For the audio side of things we can keep things simple and just serve up 8bit 16KHz PCM data. Audio takes up a lot less bandwidth than video so we don’t really need to worry about this - if we wanted to get clever then the ESP32 is capable of processing MP3 data - but that would take up processing power that we need to decoding JPEG files.

The only challenge that we have is keeping the audio and video in sync. There’s nothing more annoying when you’re watching a show and the audio is out of sync with the video.

The approach I’ve taken is to use the audio samples as the time basis. I can then just request frames that match the current audio timestamp. We may skip a few frames every now and then, but we’ll stay matched up to the audio that’s being played.

This works really well and you can see it all working in this video:

The server is really nice and simple as well, it doesn’t need to do any clever video encoding or variable bitrate matching with the client, it just needs to extract the requested block of audio samples and a frame at the requested timestamp.

What About Locally Stored Videos

Streaming over WiFi is nice, but it’s not exactly convenient and ultimately it would be nice to have a portable battery powered video player. I’ve also got lots of issues with my home WiFi network that make it perform really badly. Ideally we want to be able to stream video from an SD Card connected to our ESP32.

It’s surprisingly easy to connect an SD Card to an ESP32 - SD Cards support SPI and the ESP32 has built in support for this (it’s actually how it talks to the display I’m using).

SPI Pinout on an SD Card

As you can see from my increasingly Heath Robinson setup you can just connect directly from the SD Card pins to the ESP32.

That’s a lot of wires…

So storage is pretty easily - and we can connect pretty large SD Cards easily so big files are not really problem.

Our real problem is what format should we store our video files in. And the issue is that most media container formats are completely bonkers. The spec for MP4 files runs to over 80 pages - there’s no way that I want to try and implement that on an ESP32.

So I consulted with an “expert” (cough ChatGPT…) to see what my options were and it had a couple of pretty interesting suggestions:

  • Export two separate files from the movie file
  • One containing MJPEG data - so basically a bunch of JPEG files concatenated together
  • Another containing the 8 bit PCM audio data
  • Create a simple custom format with simple header structures followed by blocks of either JPEG or audio data.

The first option is very tempting the two files can be generated by simple ffmpeg commands:

ffmpeg -i input.mp4 -c:v mjpeg -q:v 3 output.mjpg
ffmpeg -i input.mp4 -vn -c:a pcm_u8 -ar 16000 -f u8 output.raw

The second option is little more convenient as it would only generate one file making things easier to manage, but it would involve writing some custom code to create the video format - potentially doable, but life is way to short for that kind of thing.

However, the biggest issue with both these formats is that you wouldn’t be able to easily play the files on your desktop computer. So it’s not very convenient.

So after a bit more back and forth we came to the conclusion that the AVI file format might be a suitable option.

AVI files are relatively straight forward - provided you make a bunch of assumptions and ignore a lot of the meta data.

The basic structure looks like this:

RIFF "AVI " (space at end)
    LIST "hdrl"
    DATA "avih", len: 56
        LIST "strl"
            DATA "strh", len: 56
            DATA "strf"
        LIST "strl"
    LIST: name: `INFO' (optional)
    LIST "movi"
        DATA "00dc"
        DATA "01wb"
    DATA "idx1"

If we look at a hex dump of an AVI file we can see this quite clearly

Hex dump of an AVI file

We have four bytes with the values “RIFF” which tell us this is a Resource Interchange File Format. And this followed by four bytes that give us the length of the data (in this case we have 208,853,828 bytes which is the size of our file minus the 8 bytes of the header - 4 bytes for “RIFF” + 4 bytes for the size).

Following this header we have the subtype of the file “AVI “ which tells use this is an AVI file.

This is then followed by chunks of data. Each one prefixed with 4 bytes telling us the chunk type and 4 bytes telling us the length of the chunk data. So we can easily read through the AVI file using the following code:

typedef struct {
    char chunkId[4];
    unsigned int chunkSize;
} ChunkHeader;

void readChunk(FILE *file, ChunkHeader *header) {
    fread(&header->chunkId, 4, 1, file);
    fread(&header->chunkSize, 4, 1, file);
    printf("ChunkId %c%c%c%c, size %u\n",
           header->chunkId[0], header->chunkId[1],
           header->chunkId[2], header->chunkId[3],

int main(int argc, char *argv[]) {
    FILE *file = fopen(argv[1], "rb");
    ChunkHeader header;
    // Read RIFF header
    readChunk(file, &header); // you could check it's "RIFF"
    // next four bytes are the RIFF type which should be 'AVI '
    char riffType[4];
    fread(&riffType, 4, 1, file); // you could check it's "AVI "
    while (!feof(file) && !ferror(file)) {
        readChunk(file, &header);
        if (strncmp(header.chunkId, "LIST", 4) == 0) {
            processListChunk(file, header.chunkSize);
        } else {
            // skip the chunk data bytes
            fseek(file, header.chunkSize, SEEK_CUR);
    return 0;

I’ve omitted any error handling and checking that the file is present and that it has valid headers. But this code will run through an AVI file printing out each chunk of data it finds along with the length.

We just need to find a “LIST” chunk that has a subtype of “movi” - if we look at the hex dump we can see how to process this:

LIST - movi Hexdump

We have the “LIST” chunk type followed by four bytes for the length. And then we have the subtype of the list “movi”. This is where we can find out video and audio data.

The LIST chunk contains child chunks. Video data has the type “00dc” and audio data has the type “01dw”.

And if we look a bit further along we can see the header of a JPEG file. These always start with the bytes FF D8 followed by some additional bytes and then the string “JFIF”.

The only slight trap which I missed is that the chunks of data are always word aligned. But the length does not include any padding. So if you have an odd chunk length you have to skip a byte at the end of reading it.

We can now add our function processListChunk:

void processMovieList(FILE *fp, unsigned int chunkSize) {
    ChunkHeader header;
    while (chunkSize > 0) {
        readChunk(fp, &header);
        if (strncmp(header.chunkId, "00dc", 4) == 0) {
            printf("Found video frame.\n");
        } else if (strncmp(header.chunkId, "01wb", 4) == 0) {
            printf("Found audio data.\n");
        fseek(fp, header.chunkSize, SEEK_CUR);
        chunkSize -= 8 + header.chunkSize;
        // handle any padding bytes
        if (header.chunkSize % 2 != 0) {
            fseek(fp, 1, SEEK_CUR);

void processListChunk(FILE *fp, unsigned int chunkSize) {
    char listType[4];
    fread(&listType, 4, 1, fp);
    chunkSize -= 4;
    // check for the movi list
    if (strncmp(listType, "movi", 4) == 0) {
        processMovieList(fp, chunkSize);
    } else {
        fseek(fp, chunkSize, SEEK_CUR);

So we can now easily locate our video frames and our audio data. This code is simple enough that it will run nicely on the ESP32.

It also kind of fits in with the current architecture - again we can split the video and audio processing into two separate processes - we just need to open the AVI file twice, once to read the audio data and once to read the video data.

Again we stream the audio data out to speaker and use this to keep track of elapsed time. For the video rendering we make sure we keep in sync with the audio by skipping over video frames as needed.

It works surprisingly well as you can see in this video:

The code is all up on GitHub - some of it is a bit rough and ready, but it should work on pretty much any ESP32 board - you just need to connect a screen and an SD Card - and for that you can use one of the many breakout boards that are available - soldering wires like I have is completely optional!


Related Posts

A Faster ESP32 JPEG Decoder? - An intriguing issue appeared in the esp32-tv project that deals with speeding up JPEG file decoding using SIMD (Single Instruction Multiple Data) instructions, showing immense performance boost. However, there were some notable differences in speed when it comes to drawing the images versus simply decoding them. The problem was found to be with the DMA drawing mechanism and the way the new fast library decodes the image all at once. But despite this hiccup, by overlapped decoding and displaying process, a high frame rate can still be achieved. Joined me in this dissecting process and my initial tests showing approximately 40 frames per second display rate, on our journey to find the most efficient way to get images on screens.
ESP32 Audio Input - MAX4466, MAX9814, SPH0645LM4H, INMP441 - In this blog post, I've delved deep into the world of audio input for ESP32, exploring all the different options for getting analogue audio data into the device. After discussing the use of the built-in Analogue to Digital Converts (ADCs), I2S to read ADCs with DMA, and using I2S to read directly from compatible peripherals, I go on to present hands-on experiments with four different microphones (MAX4466, MAX9814, SPH0645, INPM441). This comprehensive look at getting audio into the ESP32 should be a valuable resource for anyone hungry for a deep-dive into ESP32's audio capabilities, complete with YouTube videos for an even more detailed look!
Self Organising WS2811 LEDs - I've successfully used addressable WS2811 LED strings and an ESP-CAM board to create an adjustable lighting system. The best part is that the image processing code can be duplicated in JavaScript which allows you to use a plain dev board to drive the LEDs instead of needing a camera on your ESP32 board. If you want to replicate this project, you'll need your own ESP32 dev board and some addressable LEDs. After figuring out the location of each LED in 2D space, it's easy to map from each LED's x and y location onto a pattern you want to show on the frame buffer. Desiring to keep it accessible, I've posted detailed instructions and my sample code on GitHub, making sure anyone with basic knowledge can undertake this fun technological DIY project!
DIY Alexa With the ESP32 and - This post provides a comprehensive guide to building a do-it-yourself (DIY) Alexa using an ESP32 and It illustrates how to create a wake word detection system, use Python for machine learning and employ TensorFlow for the 'wake' word identification. It also covers the usage of for intent recognition and managing commands. The post is fully backed with code snippets, examples and video tutorials to deliver an interactive learning experience to readers.
Minimalist Microcontroller: Building a Bare-Bones Dev Board - In a thrilling DIY endeavour, I attempted to build the most minimalist ESP32 dev board possible. Diving deep into the schematic of the ESP32 S3 WROOM module, I chopped out the non-essentials and whittled our needs down to bare bones. The experiment saw me juggling USB data lines and voltage regulators, waving goodbye to an array of capacitors and connectors and boldly embracing the simplicity of direct connections. Despite a few hitches, the miniature Frankenboard came alive, proving that sometimes less is least in the world of microcontrollers.

Related Videos

Streaming Video From an SD Card on the ESP32. - In this video, we successfully navigated the convoluted process of setting up movie file playback from an ESP32 with an SD card. There were a few bumps along the way, such as confusing USB data pins and the intricacies of various video container formats, but our quirky PCBWay board came through. Discussed an ingenious method of creating a simple custom video container format with ffmpeg that can be effortlessly parsed by the ESP32. And yes, even though the tiny TV guys use AVI files, we pushed boundaries and learned a thing or two about list chunks, sub formats, and hex dumps. The result? We achieved smooth audio playback and video frame skipping for an optimal balance. Check out the streaming version on WiFi for more fun!
Streaming Video and Audio over WiFi with the ESP32 - In this video, we dive into a hardware hack combining several components to create my version of the TinyTV, complete with a remote control, and video streaming over Wi-Fi. We challenge the speed of image display, using different libraries and tweaking performance for optimal results. We explore Motion JPEG or MJPEG to decode and draw images quickly, and even reach about 28 frames per second. We also catered audio using 8-bit PCM data at 16kHz, and deal with syncing both video and audio streams. Finally, we add some interactive elements allowing us to change channels and control volumes, with a classic static animation thrown in for good measure. There's a few hiccups along the way, but that's part of the fun, right?
Sound and (almost 😉) Vision - We're getting closer to our own Tiny 📺 - In this exciting video, we're making progress on our miniature television project, having perfected sound and making strides with vision. We delve into the audio aspect, utilizing Mini esp32 S3 boards with 3-watt class D amplifiers based on the versatile max 98357ic. Fascinating features like class D amplifiers' efficiency and the easy PWM signal creation process are explored. We also play around with speakers of varying sizes, check out the temperature of the amplifier, and fiddle with animated gifs on our square display. Lots to come in future videos, including Version 2 of our boards and potential video playing methods!
Lots of Stuff - And a NEW PCB! It's a rare mailbag video. - In today's episode, I'm unboxing some goodies from PCB way - my super tiny esp32 breakout boards, which I'm planning to use to recreate a mini TV complete with speakers and a display. Also received some convenient adapter boards for easier testing. Excited to explore a new Arduino Nano esp32 based on a different, tinier module, and contrasting it with other products like the Tiny Pico. Also up for testing is a new mini wear electronic load compared to my old one, and an ATX power adapter for more USB ports. I'll be testing power banks, playing with inexpensive yellow displays and nunchucks for fun gaming projects, and testing out an RGB bead curtain with hackable possibilities. Also, under my ongoing experiments is a Raspberry Pi zero, turning into a 'Doom' playing device with added sound and game controllers. Finally, an air quality measuring device for detecting air particles, CO2 levels, humidity, temperature and other parameters is up for exploration as well. A whole array of fun projects queued up, so stay tuned!
A Potentially Explosive Error - Buckle up folks, this video is a thrilling one! There's everything from unboxing my new ESP32 TV boards that arrived from PCB Way to discovering some hidden issues. We're talking about some pesky problems, surprises, and even a potential catastrophic error that could've led to a disaster. The main dish is the high-speed SD card access over USB - ultimately achieving a whooping transfer rate! But, the journey is a roller-coaster ride, from the project completely failing initially, to some smart hacks and triumphant moments. All the peripherals worked well, from the display to the sound amplifier and even the infrared receiver. Despite the ups and downs, there's a lot to learn and that's what makes this video exciting! Can't wait to share the improvements I have in mind for turning the prototype into the ultimate all-in-one device. But first, let's address the elephant in the room - an ill-placed diode that's a ticking bomb, because you know, safety first!
HELP SUPPORT MY WORK: If you're feeling flush then please stop by Patreon Or you can make a one off donation via ko-fi
Want to keep up to date with the latest posts and videos? Subscribe to the newsletter
Blog Logo

Chris Greening


> Image


A collection of slightly mad projects, instructive/educational videos, and generally interesting stuff. Building projects around the Arduino and ESP32 platforms - we'll be exploring AI, Computer Vision, Audio, 3D Printing - it may get a bit eclectic...

View All Posts