Confusion about FFT length and Max overlap

neodog · Nov 26, 2024

The manual has this to say about FFT length:

As the FFT length is increased the analyser starts to overlap its FFTs, calculating a new FFT for every block of input data. The degree of overlap is 50% for 16k, 75% for 32k, 87.5% for 64k and 93.75% for 128k.

And about Max Overlap:

The spectrum/RTA plot can be updated for every block of audio data that is captured from the input, overlapping sequences of the chosen FFT length. This can present a significant processor load for large FFT lengths. The processor loading can be reduced by limiting the overlap allowed using this control.

Is the overlap % the amount of the block overlapping, or is it % not overlapping? I ask because I get the slowest computations setting Max Overlap to 0%, and the fastest computations setting it to 93.75%; the values in between follow this pattern. This is exactly opposite of what I would expect: more overlap should mean more computations. It's almost as if the drop-down values have the reverse their intended ids.

John Mulcahy · Nov 26, 2024

More overlap means more frequent updates since fewer new samples are required to calculate the next FFT result.

neodog · Nov 27, 2024

Ah, of course, I wasn't compute-bound there, it just takes more time to fill out a 1M sample window.

So FFT Length and Overlap normally govern the frequency of update, determined by the duration of FFT_Length * (1-Overlap). And ideally, it seems that every sample gets processed in real-time on these intervals.

However, it seems that the smoothing algorithm has a significant impact on the update speed. No smoothing gets updates that match the FFT_Length * (1-Overlap) formula, but psychoaccoustic slows it down an order of magnitude. Is this compute-bound, then? Since it's not just affecting update speed, but the average count as well, it must mean that it is no longer processing samples in real-time at that point. That must mean samples get dropped, right? If so, which ones? At some point there would have to be a window without overlap integrity.

I might have hoped for a mode where samples are processed as quickly as real-time requires, but updates can be skipped if they fall behind. I see that the Update interval can be set as high as 4, but that's not enough to get any of the smoothing modes to collect averages as quickly as "no smoothing" would.

John Mulcahy · Nov 27, 2024

I can't follow the logic of combining a long FFT with heavy smoothing, but that is very likely to result in skipped sample blocks. The graph update is a negligible burden these days, the limit will be processing since the entire FFT length has to be processed to apply the smoothing. With a long FFT and high overlap that would be asking a lot since the smoothing has to run on a single core due to the recursive nature of the implementation.

neodog · Dec 9, 2024

John Mulcahy said:
I can't follow the logic of combining a long FFT with heavy smoothing

I can see the apparent contradiction, that's probably a conversation in its own right. I'm trying to use the RTA to analyze and improve noise floor, but struggling to find a good way to do it.

Specifically, I'm hearing changes in baseline noise that sound to me like +20b but only look like +3db (or even less) in the graphs and RMS. I've been searching for a way to isolate and measure these audible changes in noise.

Right off the bat, smoothing seems to be required to see anything visually at all, since with no smoothing all the measurements look like the same chaotic mess, even when they sound substantially different. However, smoothing still seems to underrepresent what I hear, so I was trying long FFT to see if there was some order hiding in the chaos that's more visible with narrower frequency bins. It didn't really work, but it does make a difference: longer FFT pushes down a smoothed graph (the smoothed shelf drops something like 5db when switching from 1M to 4M samples). This seems like it would follow from taller peaks and lower shoulders being seen at higher resolutions.

My best guess is that the noise is hiding in the local peaks of the spectrum, and that winds up hiding in every other presentation: the RMS averages, the unsmoothed graphs (which are indistinguishable due to noise), and the smoothed graphs, which do illustrate differences but substantially downplay their role. My suspicion says what I need is a smoothing algorithm that does little more than connect local maxima and ignores the troughs and shoulders. Psychoaccoustic does seem to appreciate the peaks more, but it's still rather diminished compared to what I hear. That still wouldn't improve the representativeness of RMS, which, while frequency weighted for psychoacoustics, still doesn't seem to account for disproportionately audible noise. I freely admit that my suspicions don't carry much weight here, I'm a layman in every aspect other than software engineering.

John Mulcahy said:
The graph update is a negligible burden these days, the limit will be processing since the entire FFT length has to be processed to apply the smoothing. With a long FFT and high overlap that would be asking a lot since the smoothing has to run on a single core due to the recursive nature of the implementation.

When I set FFT to 128k and 0 overlap, I see a noticeable delay with 'psychoaccoustic' vs 'no smoothing'. If I'm reading you correctly, anything that causes the "N averages" counter to slow down is going to correspond to dropped blocks. I take it it's non-trivial to double-buffer and run the smoothing asynchronously? The smoothing appears to be presentation only, since I can apparently turn off smoothing, collect the samples without dropping any blocks, then pause sampling and turn smoothing on at the end for a smoothed graph. That's like a Rube Goldberg prototype for an asynchronous solution. I've built many such in my career lol.

John Mulcahy · Dec 9, 2024

neodog said:
longer FFT pushes down a smoothed graph

Longer FFT = narrower bins = less energy per bin. To avoid that choose the volts per √Hz Y axis option to display the amplitude spectral density instead.

neodog said:
My best guess is that the noise is hiding in the local peaks of the spectrum, and that winds up hiding in every other presentation: the RMS averages, the unsmoothed graphs (which are indistinguishable due to noise), and the smoothed graphs, which do illustrate differences but substantially downplay their role. My suspicion says what I need is a smoothing algorithm that does little more than connect local maxima and ignores the troughs and shoulders. Psychoacoustic does seem to appreciate the peaks more, but it's still rather diminished compared to what I hear. That still wouldn't improve the representativeness of RMS, which, while frequency weighted for psychoacoustics, still doesn't seem to account for disproportionately audible noise. I freely admit that my suspicions don't carry much weight here, I'm a layman in every aspect other than software engineering.

That doesn't really make much sense. It more or less boils down to "I want smoothing that doesn't smooth".

Some noise sources, such as popcorn noise, have an intermittency that doesn't lend itself to a spectral view. The scope might better capture such effects.

Sensitivity to noise contributions follows the ear's sensitivity curve. You could create a cal file corresponding to a loudness contour to create a view that better reflected level relative to hearing thresholds.

neodog said:
The smoothing appears to be presentation only

It isn't, but it is reversible since the unsmoothed data is retained.

neodog · Dec 10, 2024

John Mulcahy said:
That doesn't really make much sense. It more or less boils down to "I want smoothing that doesn't smooth".

I'm basically describing a smoothing function like:

over all x:
value[x] = max(get_bin_values_between((x-30hz),(x+30hz))

The noise seems to have a pattern of spikes roughly every 120hz, so smoothing in terms of the maxima across blocks around that size would let me see generally how tall these spikes are. Then by toggling smoothing, I can alternate between comparing the curves (which aren't really comparable without smoothing) and studying the characteristics of the spikes themselves. This would among other things let me evaluate the hypothesis that the severity of these spikes was important to the audibility of the noise, even when the surrounding energy is low.

I don't know if you object to calling it smoothing because "max" isn't a conventional aggregator, or if it just wasn't clear what I meant. I also don't want to claim that it's actually a good idea, I just wanted it to be clear.

Here's an example of how it looks without smoothing:

There are 10 different measurements here, but I can't make out most of them because they substantially stomp on each other.

Compare that to the image below, in which I can make out each of the lines. That is helpful, but you can see in the picture above how many of the spikes in light blue, above 2khz, reach nearly to -65db, but the graph below smooths them down to -80db or below. As a result I can't really compare the height of the purple spikes to the height of the blue ones, because they are never both visible at the same time (they either occlude each other without smoothing, or they get smoothed out of existence).

John Mulcahy said:
It isn't, but it is reversible since the unsmoothed data is retained.

Understood. I guess I'll stop hoping it's an easy refactor, then. In any case, I doubt data drops are actually causing me problems.

neodog · Dec 10, 2024

John Mulcahy said:
Longer FFT = narrower bins = less energy per bin.

I came very close to forming this thought, but missed. Your phrasing it this way triggered me to rethink everything, and maybe identify the key to the problem.

I think the problem I've been facing reduces to an "area under the curve" problem: the area under the curve (especially the smoothed curves) did not explain the noise I hear. But that appears to have been nothing more than an illusion presented by logarithmic scaling. If I'm not mistaken, the bins are uniform, and the energy itself is distributed linearly across the spectrum, but with a logarithmic display most of the energy ends up squished into the right side of the graph, rendering "area under the curve" a meaningless way to read the chart. When I flip it to linear, those minor curve differences above 1k wind up covering most of the spectrum, and suddenly area under the curve becomes fairly explanatory.

I had observed that RMS also didn't align with the noise, but this turns out to be because the quieter configurations were actually quite a bit noisier at frequencies above 10khz, where they have substantially less perceived impact. So changes I made to improve noise wound up pushing it to higher frequencies, so they wound up with a very similar RMS. I did notice the higher frequency contributions, and made some progress by setting the LPF for RMS calculations to 10khz, but that wasn't quite satisfying. Flipping to linear mode really seems to clear up the mystery, proportionately illustrating the contributions to RMS across the spectrum.

To avoid that choose the volts per √Hz Y axis option to display the amplitude spectral density instead.

I don't know what amplitude spectral density is, looks like I have some reading to do.

John Mulcahy · Dec 10, 2024

The spikes at multiples of 60 Hz are usually mains hum. Multiples of 1 kHz can be USB related.

Confusion about FFT length and Max overlap

neodog

Registered

John Mulcahy

REW Author

neodog

Registered

John Mulcahy

REW Author

neodog

Registered

John Mulcahy

REW Author

neodog

Registered

neodog

Registered

John Mulcahy

REW Author

Popular tags

OUR MISSION

Quick Navigation

Buy us a cup of coffee!