View All Posts
read
Want to keep up to date with the latest posts and videos? Subscribe to the newsletter
HELP SUPPORT MY WORK: If you're feeling flush then please stop by Patreon Or you can make a one off donation via ko-fi

Learn how to choose ideal parameters for dma_buf_count and dma_buf_len when working with I2S audio and DMA, exploring the impact of these values on CPU load, latency, and memory usage.

Related Content
Transcript

[0:00] I2S audio and DMA it’s a mysterious world.
[0:04] What are these parameters: dma_buf_count and dma_buf_len?
[0:09] What are they for?
[0:10] And what values should they be set to?
[0:12] Let’s try and answer these two questions.
[0:15] Before we get started I’d like to thank PCBWay for sponsoring the channel,
[0:20] PCBWay offer PCB Production, CNC and 3D Printing, PCB Assembly and much much more.
[0:27] They are great to deal with and offer excellent quality, service and value for money.
[0:32] Check out the link in the description.
[0:34] Since we’re looking at audio, I’m going to focus on streaming sample data to and from
[0:38] the CPU to the I2S peripherals - generally microphones and speakers.
[0:44] We’ll start off with a definition of what DMA is.
[0:48] DMA stands for Direct Memory Access.
[0:51] DMA allows peripherals to directly access the system memory without involving the CPU.
[0:58] When using DMA, the CPU initiates a transfer between memory and the peripheral.
[1:03] The transfer is all taken care of by the DMA controller
[1:06] and the CPU is free to go off and do other work.
[1:10] When the DMA transfer is completed the CPU receives an interrupt and
[1:14] can then process the data that has been received or set up more data to be transmitted.
[1:20] This gives us our first data point on how to choose a size for our DMA buffer.
[1:26] Small DMA buffers mean the CPU has to do more work as it will be interrupted more often.
[1:32] Large buffers mean the CPU has to do less work as it will receive fewer interrupts.
[1:37] Taking audio as our example - suppose we are sampling in stereo at 44.1KHz with 16 bits per sample.
[1:46] This gives a data transfer rate of around 176KBytes per second.
[1:52] If we had a DMA buffer size of 8 samples we’d be interrupting the CPU every 181 micro-seconds.
[2:00] If we had a buffer size of 1024 samples we’d be interrupting the CPU every 23 milli-seconds.
[2:06] You can see this is quite a large difference.
[2:10] The naive conclusion from this is that we should make our DMA buffers as large as possible.
[2:15] But there is a tradeoff here - and that’s latency.
[2:19] We need to wait for the DMA transfer to complete before we can start reading from the buffer.
[2:24] Generally, with audio, we don’t have very hard real-time constraints,
[2:29] but you can easily imagine scenarios where a delay of 23 milliseconds could have a real impact on your application.
[2:36] What are the actual limits on the values for dma_buf_len?
[2:40] It’s easy enough to test and we get a helpful message when we try something very large
[2:45] We can have at most 1024 and must have more than 8.
[2:50] One interesting thing to note is that this value is in samples,
[2:53] so to calculate the number of bytes that are actually being used we need to use this formula
[2:59] The number of bytes per samples multiplied by the number of channels, the buffer length and the buffer count.
[3:06] So in a concrete example, with 16 bits per sample, stereo left and right channels, but len set to 1024 and buf count set to 2
[3:15] we have a total of 8K allocated.
[3:18] There’s another good tradeoff to talk about here - DMA buffers are allocated from memory
[3:23] so any space we use for DMA buffers is taken away from space that can be used by the rest of your code.
[3:28] There is also a limitation that DMA buffers cannot be allocated in PSRAM
[3:32] so we are limited to the internal SRAM of the chips.
[3:37] This limits us to a maximum of 328K.
[3:40] In testing with my Walky-Talky application I managed to allocate a total of around 100K
[3:45] before the application started run out of memory and crash.
[3:49] This leads us nicely onto discussing of what to set dma_buf_count to.
[3:54] Let’s break this question down into two parts.
[3:57] The first part is “why do I need more then one DMA buffer?”
[4:01] The second part is how much total space do I need to allocate to DMA buffers?
[4:06] Remember that we are limited to a maximum of 1024 samples per buffer
[4:11] so to have more space we need multiple DMA buffers.
[4:15] Let’s answer the first question - why do I need more than one DMA buffer?
[4:22] The problem with having only one buffer is that it does not give us any time to process the data.
[4:28] Without some serious hacking around, the DMA buffer can only be used y either the CPU or the DMA controller.
[4:34] They cannot access the buffer at the same time.
[4:38] This means that we can only start processing the buffer
[4:41] once the DMA controller has finished transferring data to it.
[4:44] If we only have one buffer
[4:46] we need to complete our processing and give the buffer back to the DMA controller before any new data needs to be transferred.
[4:53] With a sample rate of 44.1KHz, we only have 22 microseconds before the next sample comes in from the device.
[5:02] This is not very long - it’s unlikely we could do anything meaningful in this time.
[5:07] Our processing task may not even be scheduled quickly enough to even catch the data.
[5:11] The result of this means that you typically want to set the dma_buf_count to at least 2.
[5:16] With two buffers you can be processing one buffer with the CPU while the other buffer
[5:21] is being filled by the DMA controller.
[5:23] So we definitely need more than one buffer - let’s move onto the second question.
[5:29] How much total space do I need to allocate to my DMA buffers.
[5:33] To understand this we need to think about the components of our system.
[5:37] We have the I2S peripheral that is generating samples from our audio source.
[5:42] This will be generating data at a fixed rate that is set by the sample rate.
[5:46] We then have something that is consuming and processing this data.
[5:50] To understand what we need to set the total buffer size to
[5:53] we need to understand how much time this processing will take.
[5:56] Suppose for example we are sending data to a server somewhere
[5:59] and the server takes on average 100 milliseconds to process a request.
[6:04] Sometimes it is faster, sometimes it is slower.
[6:06] 100ms of audio at 44.1KHz sampling rate works out at 4410 samples
[6:14] or around 17KBytes of data for stereo 16-bit samples.
[6:19] This sets a lower limit on our DMA buffer size.
[6:22] We need space to store 4410 samples while we are busy sending the data to our server.
[6:28] We also need to allow for the fact that sometimes our server takes longer to respond.
[6:32] What if it sometimes takes 150ms to respond?
[6:36] We need to allow a larger buffer size to take this into account.
[6:40] 150ms is 6615 samples.
[6:44] For a safety margin we might just bump this up to 10000 samples.
[6:48] So we might set our dma_buf_len to 1000 samples and our dma_buf_count to 10.
[6:54] I’ve made a small visualisation of this.
[6:56] We have a constant stream of data being generated which is going into a buffer.
[7:00] And we have a process that takes chunks of data from the buffer at semi-random intervals.
[7:05] We need to make sure our buffer is large enough that the probability of it overflowing is low enough for our use case.
[7:11] This covers the case of processing audio data coming into the system.
[7:16] What about pushing samples out? How should we think about this?
[7:20] We now have something that wants to be fed samples at a constant rate.
[7:23] And have a source of samples that may not be able to consistently push samples into the buffer.
[7:30] To think about this, we need to consider how quickly on average we need to generate samples
[7:34] and we need to think about what the worst case delay there could be in generating those samples.
[7:40] How quickly we need to generate samples is set by our sampling rate.
[7:44] If we cannot generate samples fast enough to meet our sample rate our system is not going to work.
[7:50] No amount of buffering will help us.
[7:52] Unless of course, we can generate all of the audio data upfront in one very big buffer.
[7:57] An example of this as a solution would be to load an entire audio file into memory from
[8:02] slow storage and play it directly from RAM.
[8:06] Assuming we can generate data fast enough for our sample rate.
[8:10] We need to think about the length of random delays that may mean we can’t deliver some
[8:13] samples exactly when the output requires them.
[8:16] A good example of this is from our walkie-talkie project.
[8:20] We are playing samples from a stream of UDP packets.
[8:24] Depending on network conditions our packets will be delayed by some variable time.
[8:29] We need to have sufficient head room in our buffer that our sample sink does not run out
[8:33] of data when there are delays in delivery.
[8:38] Taking our UDP example of 1436 bytes once the header has been removed - which is 718 samples.
[8:46] We would need to queue up at least this many samples into the output buffer.
[8:50] To allow for packet delays we might want to have a buffer of twice this size
[8:54] so we can queue up two packets before playing.
[8:56] Once again, some safe values for this might be 2-3 DMA buffers of 1024 bytes each.
[9:03] One last point to think about is that we can also have buffers in our application code.
[9:08] We don’t need to rely solely on DMA buffers to solve our problems.
[9:12] You may choose to have relatively small DMA buffers and use your own buffers to handle things.
[9:18] There may be some good reasons for taking this path
[9:20] you may have multiple different processing steps, some that require very low latency
[9:25] and some that have quite variable time delays.
[9:28] The low latency requirement forces you to use quite small DMA buffers
[9:32] and then have your own quite large memory buffers to handle the variable time delays.
[9:36] Hopefully this has given you some insight into how to choose values for these two parameters
[9:41] - as always with these things, the real answer is
[9:43] “It depends…”
[9:45] But there is some guidance:
[9:47] Use dma_buf_len to trade-off between latency and CPU load.
[9:51] A small value improves latency but comes at the cost of more CPU load.
[9:56] Use dma_buf_count to trade-off between memory usage and the total amount of buffer space allocated.
[10:02] More buffer space gives you more time to process data,
[10:05] but comes at the cost of using extra memory.
[10:09] Now I need to go back to my code and check all the values are correct!
[10:13] Thanks for watching, I hope this video was useful and interesting,
[10:16] I’m sure it will spark some lively debates!
[10:18] I’ll see you in the next video.


HELP SUPPORT MY WORK: If you're feeling flush then please stop by Patreon Or you can make a one off donation via ko-fi
Want to keep up to date with the latest posts and videos? Subscribe to the newsletter
Blog Logo

Chris Greening

> Image

atomic14

A collection of slightly mad projects, instructive/educational videos, and generally interesting stuff. Building projects around the Arduino and ESP32 platforms - we'll be exploring AI, Computer Vision, Audio, 3D Printing - it may get a bit eclectic...

View All Posts