Explanation
This is an artifact of the FFT settings. When selecting a high value of the SFT length, frequency resolution increases, while time resolution decreases. This means that with high SFT length values the onset time of a vocalization is established with less precision.
Solution
1.Right-click the spectrogram area.
2.Under Fourier transform settings, reduce the SFT length. See Spectrogram settings
3.Click Detect calls in this recording to update the position of the rectangle around the calls.