Learn how to effectively capture audio data using an ESP32 device and analog-to-digital converters in this detailed tutorial. Discover the power of I2S peripheral with DMA controller and optimize your system's audio performance with the MAX 4466 and MAX 9814 microphone breakout boards.
[0:00] hey everyone for my next project i need
[0:01] to get audio data into the esp32
[0:04] to do this we’re going to use the
[0:06] built-in analog-to-digital converters
[0:08] on the esp32 now the esp32
[0:12] integrates two 12-bit analog to digital
[0:14] converters
[0:15] adc1 has 8 channels and adc 2
[0:18] has 10 channels adc2 is used by the wifi
[0:22] driver so we can only use
[0:23] adc2 when not using wi-fi in addition
[0:26] some of the pins attached to adc2 are
[0:28] used for strapping pins
[0:30] so it cannot be used freely i’m going to
[0:31] be using wifi in my project so i’ll
[0:33] stick to using
[0:34] adc1 for simple low frequency sampling
[0:37] we can use the arduino analog
[0:39] read or we can use the expressive adc
[0:42] functions directly if you need
[0:44] very accurate readings then you can
[0:45] calibrate your analog to digital
[0:47] converter the chips manufactured
[0:49] recently this may have been done
[0:50] in the factory but if required you can
[0:52] also do this manually check the
[0:54] description
[0:55] for some links on how to do this to get
[0:56] a calibrated value from the adc we need
[0:59] to do a few steps we need a full back
[1:01] value for the vref
[1:02] if nothing is available from the
[1:03] calibration system we then need to set
[1:05] up our adc
[1:06] in a known configuration in this example
[1:09] code we are setting our adc
[1:10] to 12 bit resolution giving us a range
[1:13] from naught
[1:13] to 4096 we’re also setting our
[1:16] attenuation
[1:17] to 11 db this should give us the full
[1:19] input range
[1:20] from naught to 3.3 volts we can now pull
[1:23] back
[1:23] the calibration characteristics for
[1:25] these settings and now
[1:26] we can get the raw value from the adc
[1:29] and map it onto a voltage
[1:31] using the calibration characteristics to
[1:33] test this
[1:34] i’ve hooked up a potentiometer to one of
[1:37] the adc
[1:38] channels and printed out both the raw
[1:41] value
[1:42] and the calibrated value in a loop
[1:46] as we vary the voltage on the pin the
[1:49] value read from the adc
[1:52] changes
[1:58] you can see from my dev board that it’s
[2:00] been factory calibrated
[2:02] with a vref value using the adc in this
[2:07] way is fine
[2:08] if you only need to get a value every so
[2:10] often or you want to sample
[2:12] at very low frequencies for very low
[2:16] quality speech
[2:17] we need to capture a bandwidth of around
[2:19] four kilohertz
[2:21] for good quality speech we need around
[2:24] eight kilohertz
[2:25] and for high quality audio we need a
[2:27] bandwidth of 20 kilohertz
[2:31] to avoid issues with aliasing we need to
[2:34] sample at the nyquist rate
[2:36] which is twice the highest frequency you
[2:38] want to sample
[2:39] for audio this would be around 40
[2:41] kilohertz
[2:42] you can see in this animation the effect
[2:45] when the sampling rate
[2:46] is too low as the input frequency
[2:49] approaches past the nyquist limit we
[2:52] start to see aliasing
[2:53] and the output signal cannot be
[2:55] reconstructed from the samples
[2:59] since our audio signals will go up to 20
[3:01] kilohertz
[3:02] we need to sample at a minimum of 40
[3:05] kilohertz
[3:07] now doing this in a loop on the cpu
[3:09] would leave us no time
[3:11] for doing any other work our cpu
[3:14] would be constantly polling the adc for
[3:16] new samples
[3:17] and would not have much time for any
[3:19] other processing
[3:21] we would also struggle to achieve a
[3:22] consistent sampling period
[3:24] leading to weird artifacts in our audio
[3:27] fortunately
[3:28] there is an alternative mechanism for
[3:30] reading samples from the adc
[3:33] we can use the i2s peripheral to read
[3:35] the samples
[3:37] this has a dedicated dma controller that
[3:40] allows us to stream samples
[3:41] straight into ram buffers independently
[3:44] from the cpu
[3:46] i’ve written a simple sketch that uses
[3:48] i2s
[3:49] to read the samples and as audio data
[3:52] becomes available
[3:53] it streams it to a server running on my
[3:56] desktop computer
[3:57] which writes the audio data to disk
[4:01] i’ve set a sample rate of 40 kilohertz
[4:04] and allocated four dma buffers at 1024
[4:08] bytes each
[4:09] this should give us sufficient time to
[4:11] process any buffers without them being
[4:13] overwritten
[4:14] with new data we’ve set up the i2s
[4:17] peripheral
[4:18] to read from adc one channel seven this
[4:21] equates to
[4:23] gpio35 and then we set up a task
[4:26] to read the values from the its queue
[4:28] looking at our reader task
[4:31] it waits for an itos event rx done to be
[4:34] received
[4:35] and then reads bytes from the dma buffer
[4:37] into our own local buffer
[4:41] once we’ve read all the samples we’re
[4:42] interested in we do whatever processing
[4:45] we need to do
[4:46] and in our case i’m just sending the
[4:47] samples over to a local server
[4:50] running on my desktop machine here’s the
[4:52] output when we use our potentiometer
[4:54] to change the values on the adc for this
[4:57] example
[4:58] i’ve just used a sampling rate of 10
[5:00] kilohertz
[5:02] we can now move on to getting audio into
[5:04] the system
[5:05] i’ve got two microphone breakout boards
[5:08] the max 4466 and the max
[5:11] 9814 now the max 4466
[5:15] has an adjustable gain from 25 times
[5:18] to 125 times and the max
[5:22] 9814 has a built-in automatic gain
[5:25] control
[5:26] this will make quiet sounds louder and
[5:28] louder sounds
[5:29] quieter both these modules are simple
[5:32] and easy to wire up
[5:33] only requiring power and ground
[5:37] so here’s an audio recording from the
[5:39] max 4466
[5:47] testing testing
[5:55] as you can hear we have some audio data
[5:58] but it’s terribly noisy and it’s not
[6:00] going to be usable
[6:02] uh here’s the audio recording from the
[6:03] max 9814
[6:11] testing testing one two three
[6:17] it’s a bit better but there is still
[6:19] quite a lot of noise coming through
[6:21] it looks like the noise can inside with
[6:24] our transmission of data over wi-fi
[6:26] i’ve attached an oscilloscope to the 3.3
[6:29] volt output from the dev board
[6:31] and the output from the max 4466
[6:35] you can see that our 3.3 volt line has a
[6:38] lot of noise
[6:39] when the module is transmitting this
[6:42] noise is being amplified by the
[6:43] sensitive op amps
[6:44] on the microphone board looking at the v
[6:47] in line
[6:48] we can see a similar problem so what can
[6:50] we do
[6:51] we could connect up a separate power
[6:53] supply or we could use a battery
[6:56] to power our microphone boards but this
[6:58] is not going to be very convenient
[7:01] we could also turn off wi-fi but i need
[7:04] wi-fi
[7:04] for my project so the underlying problem
[7:08] seems to be the current draw when the
[7:10] esp needs to transmit
[7:12] this is causing a large amount of noise
[7:14] on the 3v3
[7:16] power rail which is then picked up by
[7:18] the very sensitive microphone amplifiers
[7:21] i’ve tried adding capacitors which have
[7:23] a small effect
[7:25] but to really fix it we would need to
[7:27] add excessively large capacitors
[7:30] to both the vin and the 3v3 rails
[7:33] so our microphone boards don’t require
[7:36] very much current
[7:37] so what we can do is take the vin line
[7:41] and pass it through an rc filter for my
[7:44] tests i’m just using a
[7:45] 30 ohm resistor and a 470
[7:48] microfarad capacitor we can then feed
[7:51] this filtered signal
[7:53] into a low dropout regulator to generate
[7:56] a clean
[7:57] 3v3 power supply for the microphone
[7:59] boards
[8:00] looking at the low-pass filtered vin we
[8:03] can see that we have a
[8:04] cleaner signal and on the 3v3
[8:07] output of the regulator we now have a
[8:10] pretty clean signal
[8:11] with noise down to around 20 millivolts
[8:14] instead of the 200 millivolts
[8:16] that we were seeing on the original 3v3
[8:18] line
[8:19] from the board looking at a trace from
[8:21] our microphone boards
[8:23] here’s the max 4466
[8:26] we don’t see any noise now when the
[8:28] esp32
[8:30] transmits and the same is true for the
[8:32] max
[8:33] 9814 so here’s an audio sample
[8:37] from the max4466 testing testing
[8:42] one two three and an audio sample
[8:45] from the max 9814
[8:48] testing testing one two three
[8:53] they are both better but still quite
[8:55] noisy
[8:57] one thing we can do is filter out some
[8:59] of the noise
[9:00] by over sampling and then take an
[9:03] average value
[9:04] as the actual sample value so i’ve tried
[9:06] using a
[9:07] median filter here and you can see
[9:10] that our simulation shows quite an
[9:12] impact
[9:14] and here’s the audio from the max 4466
[9:21] testing testing one two three
[9:24] and the audio from the max 9814
[9:28] testing testing one two three
[9:33] so which module should we actually use
[9:35] for my next project
[9:38] i think the max 9814
[9:41] comes out the winner it seems to be less
[9:44] susceptible to
[9:45] power supply noise and the automatic
[9:48] gain control makes it very useful
[9:51] we don’t have to fiddle with a
[9:53] potentiometer to change the volume
[9:56] um if you really need high quality
[9:59] low noise input then the built-in adcs
[10:03] on the esp32
[10:05] are probably not suitable and i would
[10:07] take a look at some external boards
[10:09] for this kind of project but on balance
[10:12] i think we’ll use the
[10:13] max 9814 for my next project
[10:17] so thanks for watching i hope you found
[10:19] this video
[10:20] useful and interesting all the code is
[10:23] in a github repo
[10:24] the links in the description if you did
[10:26] find the video useful
[10:28] then please subscribe to the channel and
[10:30] hit the like button
[10:32] there’s another video coming soon well
[10:34] i’ll actually do something with the
[10:36] audio data
[10:37] so see you in the next video