Time delay estimation (TDE) techniques have many diverse signal processing applications: for instance, in such fields as radar, sonar, seismology, geophysics, and ultrasonics for identifying and localizing radiating sources.
In this case study, we evaluate the performance of two algorithms developed by Markus Nentwig to find delay and scaling factor between two cyclic signals. The first algorithm uses linear least-squares fitting to estimate the delay. The second algorthm is iterative and relies on FFT-based cross-correlation. A MATLAB® implementation of both approaches can be found in Algorithm 1 and Algorithm 2, respectively. As the author pointed out, the algorithms are not suited for real-time applications since the whole signal needs to be known in advance. However, they can be very useful as general-purpose tool for simulations and measurements to align input and output samples: properly used, the attained accuracy can be orders of magnitude higher than that for typical “ad-hoc” solutions.
The following code snippet, provided by the author, has been used to generate delayed signals:
% generate random signals of length n: s2 is a delayed replica s1 with (optionally) some noise added function [s1,s2] = generate_signals(n) % create first random signal fd = randn(1, n) + 1i * randn(1, n); % apply lowpass filter f = (mod(((0:n-1)+floor(n/2)), n)-floor(n/2))/n; fd(abs(f) > 0.045) = 0; s1 = real(ifft(fd)) * sqrt(n); % create the 2nd delayed signal dTest_samples = 12.3456; cTest = 1.23456; cTest = cTest + 1i; % disp('*** test: complex coeff enabled ***'); % cTest = -cTest; disp('*** test: negative coeff enabled ***'); s2 = cTest * cs_delay(s1, 1, dTest_samples); s2 = s2 + 0.1*randn(size(s2)); disp('*** test: noise enabled ***'); end % delay cyclic signal by phase shift function waveform = cs_delay(waveform, rate_Hz, delay_s) rflag = isreal(waveform); n = numel(waveform); cycLen_s = n / rate_Hz; nCyc = delay_s / cycLen_s(); f = 0:(n - 1); f = f + floor(n / 2); f = mod(f, n); f = f - floor(n / 2); phase = -2 * pi * f * nCyc; rot = exp(1i*phase); waveform = ifft(fft(waveform) .* rot); if rflag waveform = real(waveform); end end
The next picture demonstrates signal correction using the second iterative algorithm. Here the signal s2 is a scaled and phase-shifted version of the “reference” signal s1, with some noise added:
We have benchmarked both algorithms on the following testing platform:
- 4x Intel(R) Xeon(R) CPU E7-4860 (40 cores / 80 threads via Hyper-Threading)
- 256 GB RAM (32 * 8GB)
- 4x NVIDIA Tesla S2050 (GF100)
running MATLAB® R2012a under 64-bit Linux. The results of experiments with and without GPU acceleration for increasing signal lengths are shown below:
To summarize, using Jacket we have been able to achive on the average 2.5x speed-up for the first algorithm and more than 20x speed-up for the second approach. We also note that the MATLAB® implementation of both algorithms with Jacket requires only very little effort (up to changing the number types) since they are readily suitable for GPU acceleration.
Least-squares based algorithm with Jacket (original comments are provided by the author):
% ****************************************************************************** % delay-matching between two signals (complex/real-valued) M. Nentwig % % * matches the continuous-time equivalent waveforms of the signal vectors % (reconstruction at Nyquist limit => ideal lowpass filter) % * Signals are considered cyclic. Use arbitrary-length zero-padding to turn a % one-shot signal into a cyclic one. % % * output: % => coeff: complex scaling factor that scales 'ref' into 'signal' % => delay 'deltaN' in units of samples (subsample resolution) % apply both to minimize the least-square residual % => 'shiftedRef': a shifted and scaled version of 'ref' that % matches 'signal' % => (signal - shiftedRef) gives the residual (vector error) % % ****************************************************************************** function [coeff, shiftedRef, deltaN] = fitSignal_FFT(signal, ref) // cast the input signals to GPU signal = gdouble(signal); ref = gdouble(ref); n = length(signal); forceReal = isreal(signal) && isreal(ref); % Calculate the frequency that corresponds to each FFT bin [-0.5..0.5[ binFreq = (mod((gdouble(0:n-1)+floor(n/2)), n)-floor(n/2))/n; % Delay calculation starts: Convert to frequency domain... sig_FD = fft(signal); ref_FD = fft(ref, n); % ... calculate crosscorrelation between signal and reference... u=sig_FD .* conj(ref_FD); if mod(n, 2) == 0 % for an even sized FFT the center bin represents a signal % [-1 1 -1 1 ...] (subject to interpretation). It cannot be delayed. % The frequency component is therefore excluded from the calculation. u(length(u)/2 + 1)=0; end Xcor=abs(ifft(u)); % Each bin in Xcor corresponds to a given delay in samples. The bin with the % highest absolute value corresponds to the delay where maximum correlation % occurs. integerDelay = find(gdouble(Xcor==max(Xcor))); % (1): in case there are several bitwise identical peaks, use the first one % Minus one: Delay 0 appears in bin 1 integerDelay=integerDelay(1)-1; % Fourier transform of a pulse shifted by one sample rotN = exp(2i*pi*integerDelay .* binFreq); uDelayPhase = -2*pi*binFreq; % Since the signal was multiplied with the conjugate of the reference, the % phase is rotated back to 0 degrees in case of no delay. Delay appears as % linear increase in phase, but it has discontinuities. Use the known phase % (with +/- 1/2 sample accuracy) to rotate back the phase. This removes the % discontinuities. u=u .* rotN; % Obtain the delay using linear least mean squares fit. The phase is weighted % according to the amplitude. This suppresses the error caused by frequencies % with little power, that may have radically different phase. weight = abs(u); constRotPhase = 1 .* weight; uDelayPhase = uDelayPhase .* weight; ang = angle(u) .* weight; r = [constRotPhase; uDelayPhase] .' ang.'; % linear mean square % the same will be obtained via the phase of 'coeff' further down fractionalDelay=r(2); % Finally, the total delay is the sum of integer part and fractional part. deltaN = integerDelay + fractionalDelay; % provide shifted and scaled 'ref' signal this is effectively time-convolution % with a unit pulse shifted by deltaN rotN = exp(-2i*pi*deltaN .* binFreq); ref_FD = ref_FD .* rotN; shiftedRef = ifft(ref_FD); % Again, crosscorrelation with the now time-aligned signal coeff=sum(signal .* conj(shiftedRef)) / sum(shiftedRef .* conj(shiftedRef)); shiftedRef=shiftedRef * coeff; if forceReal shiftedRef = real(shiftedRef); end end
The iterative algorihtm with Jacket:
% ***************************************************************************** % The functions below implement a fallback solution, if the above algorithm % fails. This variant performs better at low signal-to-noise ration. The % algorithm iteratively determines the crosscorrelation, predicts the peak % location between bins, time-shifts and repeats. % Documentation: http://www.dsprelated.com/showcode/288.php % It is not as accurate and slower as the first algorithm, but should always % converge, even if "the" delay between two signals is not well-defined because % of group delay variations, multiple periods etc. % ****************************************************************************** function out = fitSignal_corrSearch(signal, ref) // cast the input signals to GPU signal = gdouble(signal); ref = gdouble(ref); [deltaN, coeff] = iterDelayEst(signal, ref); coeff = 1 / coeff; deltaN = -deltaN; % correct it out.shifted_ref = cs_delay(ref, 1, deltaN); out.shifted_ref = out.shifted_ref * coeff; end % estimates delay and scaling factor function [delay_samples, coeff] = iterDelayEst(s1, s2) s1 = s1(:) .'; % force row vectors s2 = s2(:) .'; rflag = isreal(s1) && isreal(s2); n = numel(s1); halfN = floor(n/2); assert(numel(s2) == n, 'signals must have same length'); thr_samples = 1e-12; % exit if uncertainty below threshold nIter = 250; % exit after fixed number of iterations % frequency domain representation of signals fd1 = fft(s1); fd2 = fft(s2); tau = []; % first round: No delay was applied fd2Tau = fd2; % delayed s2 in freq. domain % frequency corresponding to each FFT bin -0.5..0.5 f = (mod((gdouble(0:n-1)+floor(n/2)), n)-floor(n/2))/n; % normalization factor nf = real(sqrt((fd1 * fd1') * (fd2 * fd2'))) / n; % normalizes to 1 % search window: known maximum and two surrounding points x1 = -1; x2 = -1; x3 = -1; y1 = -1; y2 = -1; y3 = -1; for count = 1:nIter % crosscorrelation with time-shifted signal xcorr = abs(ifft(fd2Tau .* conj(fd1)))/ nf; % detect peak if isempty(tau) % startup: initialize with three adjacent bins around peak ix = find(gdouble(xcorr == max(xcorr))); ix = ix(1); % use any, if multiple bitwise identical peaks % indices of three bins around peak ixLow = mod(ix-1-1, n) + 1; % one below ixMid = ix; ixHigh = mod(ix-1+1, n) + 1; % one above % delay corresponding to the three bins tauLow = mod(ixLow -1 + halfN, n) - halfN; tauMid = mod(ixMid -1 + halfN, n) - halfN; tauHigh = mod(ixHigh -1 + halfN, n) - halfN; % crosscorrelation value for the three bins xcLow = (xcorr(ixLow)); xcMid = (xcorr(ixMid)); xcHigh = (xcorr(ixHigh)); x1 = tauLow; x2 = tauMid; x3 = tauHigh; y1 = xcLow; y2 = xcMid; y3 = xcHigh; else % only main peak at first bin is of interest tauMid = tau; xcMid = xcorr(1); if xcMid > y2 % improve midpoint if tauMid > x2 % midpoint becomes lower point x1 = x2; y1 = y2; else % midpoint becomes upper point x3 = x2; y3 = y2; end x2 = tauMid; y2 = xcMid; elseif tauMid = x1); % bitwise identical is OK assert(tauMid > x1 || xcMid > y1); % expect improvement x1 = tauMid; y1 = xcMid; elseif tauMid > x2 % improve high point assert(tauMid <= x3); % bitwise identical is OK assert((tauMid y3)); % expect improvement x3 = tauMid; y3 = xcMid; else assert(false, '?? evaluated for existing tau ??'); end end % calculate uncertainty (window width) eIter = abs(x3 - x1); if eIter x1) && (tau < x3)); else usePoly = 0; end end if ~usePoly % revert to linear interpolation on the side with the less-accurate % outer sample. Note: There is no guarantee that the side with the % more accurate outer sample is the right one, as the samples aren't % placed on a regular grid! Therefore, iterate to improve the "worse" % side, which will eventually become the "better side", and iteration % converges. tauLow = (x1 + x2) / 2; tauHigh = (x2 + x3) / 2; if y1 < y3 o = [tauLow, tauHigh]; else o = [tauHigh, tauLow]; end % don't choose point that is identical to one that is already known tau = o(1); if tau == x1 || tau == x2 || tau == x3 tau = o(2); if tau == x1 || tau == x2 || tau == x3 break; end end end % advance 2nd signal according to location of maximum % phase shift in frequency domain - delay in time domain fd2Tau = fd2 .* exp(2i * pi * f * tau); end % for % the delay estimate is the final location of the delay that % maximized crosscorrelation (center of window). delay_samples = x2; % Coefficient: Turn signal 1 into signal 2 coeff = fd2Tau * fd1' ./ (fd1 * fd1'); % chop roundoff error, if input signals are known to be real-valued. if rflag coeff = real(coeff); end end