Phase-Only Filtering for
the Masses (of DNA Data): A New Approach to Sequence Alignment
February 2006
Andrzej K. Brodzik, The MITRE Corporation
ABSTRACT
Alignment of DNA segments containing repetitive nucleotide base patterns is an important task
in several genomics applications, including DNA sequencing, DNA fingerprinting, pathogen detection,
and gene finding. One of the most efficient procedures used for this task is the crosscorrelation
method. The main computations of the procedure are the discrete Fourier transform
and a pointwise multiplication of two complex Fourier transform sequences. In this work the
standard magnitude-and-phase cross-correlation technique is compared with the lesser known but
closely related phase-only cross-correlation method. It is shown that for a periodic DNA sequence
the standard approach leads to significant sidelobes in the cross-correlation, the magnitude of which
increases with sequence length, while the phase-only approach yields a perfect cross-correlation
with zero sidelobes. For a DNA sequence that contains both irregularly distributed symbols and
periodic patterns the difference in performance is less pronounced, but still significant. Numerical
experiments on synthesized and real data demonstrate that the phase-only approach is robust to
isolated symbol insertions and deletions, and that it is capable of identifying positions of matching segments in the sequence.

Additional Search Keywords
DNA sequence alignment, DNA symbol repeat, cross-correlation, matched
filter,
phase-only filtering
|