Time delay estimation algorithms with Jacket

Unlock the secrets of time delay estimation with our in-depth analysis of advanced algorithms, showcasing how to achieve remarkable accuracy in signal processing applicationsâ€”from radar to ultrasonics. Dive into the details and discover how these techniques can enhance your projects!

John Melonakos

Sep 10, 2012

9 min read

Time delay estimation (TDE) techniques have many diverse signal processing applications: for instance, in such fields as radar, sonar, seismology, geophysics, and ultrasonics for identifying and localizing radiating sources.

In this case study, we evaluate the performance of two algorithms developed by Markus Nentwig to find delay and scaling factor between two cyclic signals. The first algorithm uses linear least-squares fitting to estimate the delay. The second algorthm is iterative and relies on FFT-based cross-correlation. A MATLABÂ® implementation of both approaches can be found in Algorithm 1 and Algorithm 2, respectively. As the author pointed out, the algorithms are not suited for real-time applications since the whole signal needs to be known in advance. However, they can be very useful as general-purpose tool for simulations and measurements to align input and output samples: properly used, the attained accuracy can be orders of magnitude higher than that for typical "ad-hoc" solutions.

The following code snippet, provided by the author, has been used to generate delayed signals:

% generate random signals of length n: s2 is a delayed replica s1 with (optionally) some noise added
function [s1,s2] = generate_signals(n)
   % create first random signal
   fd = randn(1, n) + 1i * randn(1, n);
   % apply lowpass filter
   f = (mod(((0:n-1)+floor(n/2)), n)-floor(n/2))/n;
   fd(abs(f) > 0.045) = 0;
   s1 = real(ifft(fd)) * sqrt(n);
   % create the 2nd delayed signal
   dTest_samples = 12.3456;
   cTest = 1.23456;
   cTest = cTest + 1i; % disp('*** test: complex coeff enabled ***');
   % cTest = -cTest; disp('*** test: negative coeff enabled ***');
   s2 = cTest * cs_delay(s1, 1, dTest_samples);
   s2 = s2 + 0.1*randn(size(s2)); disp('*** test: noise enabled ***');
end

% delay cyclic signal by phase shift
function waveform = cs_delay(waveform, rate_Hz, delay_s)
    rflag = isreal(waveform);
    n = numel(waveform);
    cycLen_s = n / rate_Hz;
    nCyc = delay_s / cycLen_s();
    f = 0:(n - 1);
    f = f + floor(n / 2);
    f = mod(f, n);
    f = f - floor(n / 2);
    phase = -2 * pi * f * nCyc;
    rot = exp(1i*phase);
    waveform = ifft(fft(waveform) .* rot);
    if rflag
        waveform = real(waveform);
    end
end

The next picture demonstrates signal correction using the second iterative algorithm. Here the signal s2 is a scaled and phase-shifted version of the "reference" signal s1, with some noise added:

We have benchmarked both algorithms on the following testing platform:

4x Intel(R) Xeon(R) CPU E7-4860 (40 cores / 80 threads via Hyper-Threading)
256 GB RAM (32 * 8GB)
4x NVIDIA Tesla S2050 (GF100)

running MATLABÂ® R2012a under 64-bit Linux. The results of experiments with and without GPU acceleration for increasing signal lengths are shown below:

To summarize, using Jacket we have been able to achive on the average 2.5x speed-up for the first algorithm and more than 20x speed-up for the second approach. We also note that the MATLABÂ® implementation of both algorithms with Jacket requires only very little effort (up to changing the number types) since they are readily suitable for GPU acceleration.

Least-squares based algorithm with Jacket (original comments are provided by the author):

% ******************************************************************************
% delay-matching between two signals (complex/real-valued) M. Nentwig
%
% * matches the continuous-time equivalent waveforms of the signal vectors
% (reconstruction at Nyquist limit => ideal lowpass filter)
% * Signals are considered cyclic. Use arbitrary-length zero-padding to turn a
% one-shot signal into a cyclic one.
%
% * output:
%   => coeff: complex scaling factor that scales 'ref' into 'signal'
%   => delay 'deltaN' in units of samples (subsample resolution)
%      apply both to minimize the least-square residual
%   => 'shiftedRef': a shifted and scaled version of 'ref' that
%      matches 'signal'
%   => (signal - shiftedRef) gives the residual (vector error)
%
% ******************************************************************************
function [coeff, shiftedRef, deltaN] = fitSignal_FFT(signal, ref)
    // cast the input signals to GPU
    signal = gdouble(signal); ref = gdouble(ref);
    n = length(signal);
    forceReal = isreal(signal) && isreal(ref);

    % Calculate the frequency that corresponds to each FFT bin [-0.5..0.5[
    binFreq = (mod((gdouble(0:n-1)+floor(n/2)), n)-floor(n/2))/n;

    % Delay calculation starts: Convert to frequency domain...
    sig_FD = fft(signal); ref_FD = fft(ref, n);

    % ... calculate crosscorrelation between signal and reference...
    u=sig_FD .* conj(ref_FD);
    if mod(n, 2) == 0
        % for an even sized FFT the center bin represents a signal
        % [-1 1 -1 1 ...] (subject to interpretation). It cannot be delayed.
        % The frequency component is therefore excluded from the calculation.
        u(length(u)/2 + 1)=0;
    end
    Xcor=abs(ifft(u));

    % Each bin in Xcor corresponds to a given delay in samples. The bin with the
    % highest absolute value corresponds to the delay where maximum correlation
    % occurs.
    integerDelay = find(gdouble(Xcor==max(Xcor)));

    % (1): in case there are several bitwise identical peaks, use the first one
    % Minus one: Delay 0 appears in bin 1
    integerDelay=integerDelay(1)-1;

    % Fourier transform of a pulse shifted by one sample
    rotN = exp(2i*pi*integerDelay .* binFreq);
    uDelayPhase = -2*pi*binFreq;

    % Since the signal was multiplied with the conjugate of the reference, the
    % phase is rotated back to 0 degrees in case of no delay. Delay appears as
    % linear increase in phase, but it has discontinuities. Use the known phase
    % (with +/- 1/2 sample accuracy) to rotate back the phase. This removes the
    % discontinuities.
    u=u .* rotN;

    % Obtain the delay using linear least mean squares fit. The phase is weighted
    % according to the amplitude. This suppresses the error caused by frequencies
    % with little power, that may have radically different phase.
    weight = abs(u);
    constRotPhase = 1 .* weight;
    uDelayPhase = uDelayPhase .* weight;
    ang = angle(u) .* weight;
    r = [constRotPhase; uDelayPhase] .'  ang.'; % linear mean square

    % the same will be obtained via the phase of 'coeff' further down
    fractionalDelay=r(2);
    % Finally, the total delay is the sum of integer part and fractional part.
    deltaN = integerDelay + fractionalDelay;

    % provide shifted and scaled 'ref' signal this is effectively time-convolution
    % with a unit pulse shifted by deltaN
    rotN = exp(-2i*pi*deltaN .* binFreq);
    ref_FD = ref_FD .* rotN;
    shiftedRef = ifft(ref_FD);

    % Again, crosscorrelation with the now time-aligned signal
    coeff=sum(signal .* conj(shiftedRef)) / sum(shiftedRef .* conj(shiftedRef));
    shiftedRef=shiftedRef * coeff;

    if forceReal
        shiftedRef = real(shiftedRef);
    end
end

The iterative algorihtm with Jacket:

% *****************************************************************************
% The functions below implement a fallback solution, if the above algorithm
% fails. This variant performs better at low signal-to-noise ration. The
% algorithm iteratively determines the crosscorrelation, predicts the peak
% location between bins, time-shifts and repeats.
% Documentation: http://www.dsprelated.com/showcode/288.php
% It is not as accurate and slower as the first algorithm, but should always
% converge, even if "the" delay between two signals is not well-defined because
% of group delay variations, multiple periods etc.
% ******************************************************************************
function out = fitSignal_corrSearch(signal, ref)
   // cast the input signals to GPU
   signal = gdouble(signal); ref = gdouble(ref);
   [deltaN, coeff] = iterDelayEst(signal, ref);
   coeff = 1 / coeff; deltaN = -deltaN;
   % correct it
   out.shifted_ref = cs_delay(ref, 1, deltaN);
   out.shifted_ref = out.shifted_ref * coeff;
end

% estimates delay and scaling factor
function [delay_samples, coeff] = iterDelayEst(s1, s2)
    s1 = s1(:) .'; % force row vectors
    s2 = s2(:) .';
    rflag = isreal(s1) && isreal(s2);
    n = numel(s1);  halfN = floor(n/2);
    assert(numel(s2) == n, 'signals must have same length');

    thr_samples = 1e-12; % exit if uncertainty below threshold
    nIter = 250; % exit after fixed number of iterations

    % frequency domain representation of signals
    fd1 = fft(s1); fd2 = fft(s2);

    tau = []; % first round: No delay was applied
    fd2Tau = fd2; % delayed s2 in freq. domain

    % frequency corresponding to each FFT bin -0.5..0.5
    f = (mod((gdouble(0:n-1)+floor(n/2)), n)-floor(n/2))/n;
    % normalization factor
    nf = real(sqrt((fd1 * fd1') * (fd2 * fd2'))) / n; % normalizes to 1

    % search window: known maximum and two surrounding points
    x1 = -1; x2 = -1; x3 = -1;
    y1 = -1; y2 = -1; y3 = -1;

    for count = 1:nIter
        % crosscorrelation with time-shifted signal
        xcorr = abs(ifft(fd2Tau .* conj(fd1)))/ nf;
        % detect peak
        if isempty(tau)
            % startup: initialize with three adjacent bins around peak
            ix = find(gdouble(xcorr == max(xcorr)));
            ix = ix(1); % use any, if multiple bitwise identical peaks

            % indices of three bins around peak
            ixLow = mod(ix-1-1, n) + 1; % one below
            ixMid = ix;
            ixHigh = mod(ix-1+1, n) + 1; % one above

            % delay corresponding to the three bins
            tauLow = mod(ixLow -1 + halfN, n) - halfN;
            tauMid = mod(ixMid -1 + halfN, n) - halfN;
            tauHigh = mod(ixHigh -1 + halfN, n) - halfN;

            % crosscorrelation value for the three bins
            xcLow = (xcorr(ixLow));
            xcMid = (xcorr(ixMid));
            xcHigh = (xcorr(ixHigh));

            x1 = tauLow; x2 = tauMid; x3 = tauHigh;
            y1 = xcLow; y2 = xcMid; y3 = xcHigh;
        else
            % only main peak at first bin is of interest
            tauMid = tau;
            xcMid = xcorr(1);
            if xcMid > y2 % improve midpoint
                if tauMid > x2 % midpoint becomes lower point
                    x1 = x2; y1 = y2;
                else % midpoint becomes upper point
                    x3 = x2; y3 = y2;
                end
                x2 = tauMid; y2 = xcMid;

            elseif tauMid = x1); % bitwise identical is OK
                assert(tauMid > x1 || xcMid > y1); % expect improvement
                x1 = tauMid; y1 = xcMid;

            elseif tauMid > x2 % improve high point
                assert(tauMid <= x3); % bitwise identical is OK
                assert((tauMid  y3)); % expect improvement
                x3 = tauMid; y3 = xcMid;
            else
                assert(false, '?? evaluated for existing tau ??');
            end
        end

        % calculate uncertainty (window width)
        eIter = abs(x3 - x1);
        if eIter  x1) && (tau < x3));
            else
                usePoly = 0;
            end
        end
        if ~usePoly
            % revert to linear interpolation on the side with the less-accurate
            % outer sample. Note: There is no guarantee that the side with the
            % more accurate outer sample is the right one, as the samples aren't
            % placed on a regular grid! Therefore, iterate to improve the "worse"
            % side, which will eventually become the "better side", and iteration
            % converges.
            tauLow = (x1 + x2) / 2;
            tauHigh = (x2 + x3) / 2;
            if y1 < y3
                o = [tauLow, tauHigh];
            else
                o = [tauHigh, tauLow];
            end
            % don't choose point that is identical to one that is already known
            tau = o(1);
            if tau == x1 || tau == x2 || tau == x3
                tau = o(2);
                if tau == x1 || tau == x2 || tau == x3
                    break;
                end
            end
        end
        % advance 2nd signal according to location of maximum
        % phase shift in frequency domain - delay in time domain
        fd2Tau = fd2 .* exp(2i * pi * f * tau);
    end % for

    % the delay estimate is the final location of the delay that
    % maximized crosscorrelation (center of window).
    delay_samples = x2;
    % Coefficient: Turn signal 1 into signal 2
    coeff = fd2Tau * fd1' ./ (fd1 * fd1');
    % chop roundoff error, if input signals are known to be real-valued.
    if rflag
        coeff = real(coeff);
    end
end