UltraVox XT applies the Fast Fourier Transform algorithm applied to SFT, for this reason the frame length N is always a power of 2 (64, 128,..., 2048). The higher N, the smaller the frequency spacing df.
If the SFT frame is 2048 samples long, the SFT analysis gives you 2048 equally-spaced frequency bins from 0 Hz up to the sampling frequency divided by two. Increasing SFT length means to reduce the spacing of frequency bins according to the formula df = Sampling frequency/SFT length. This increases the waveform frequency resolution.
However, any timing resolution that occurs within a SFT frame is lost in the analysis, since all temporal changes are lumped together in a single frame. If SFT length is 2048, the frame duration is, at a sampling frequency of 384 kHz, 2048/384000 = 5.33 ms. This means that the first spectrum is created at t= 0, and the next spectrum is created at t=5.33 ms. Two subsequent events occurring within this time would not be distinguishable.
The following table shows the inverse relation between frequency resolution df (=sampling frequency / SFT length) and time resolution dt (=SFT length/sampling frequency), when sampling frequency is 384 kHz.
df (=384000/SFT length) (Hz) |
dt (=SFT length/384000 (ms) |
|
64 |
6000 |
0.17 |
128 |
3000 |
0.33 |
256 |
1500 |
0.67 |
512 |
750 |
1.33 |
1024 |
375 |
2.67 |
2048 |
187.5 |
5.33 |
The time resolution dt is the width of the spectrogram pixel, when Overlap = 0.
If you increase the SFT length, there will be more “pixels” along the frequency axis of the spectrogram, however they are now “wider”, since they represent a longer time interval; the higher the SFT, the longer this time. This relationship represents the trade-off between frequency resolution and time resolution.
See also
The following videos:
Short-time Fourier Transform and the Spectrogram
http://www.youtube.com/watch?v=NA0TwPsECUQ
FFT basic concepts
http://www.youtube.com/watch?v=z7X6jgFnB6Y