About Us Our Work Employment News & Events
MITRE Remote Access for MITRE Staff and Partners Site Map




Follow Us:

Visit MITRE on Facebook
Visit MITRE on Twitter
Visit MITRE on Linkedin
Visit MITRE on YouTube
View MITRE's RSS Feeds
View MITRE's Mobile Apps

Home > News & Events > MITRE Publications > Envision >
DNA Sequence Analysis: The Genome Is More Than The Sum of Its Parts

DNA SEQUENCE ANALYSIS:
THE GENOME IS MORE THAN
THE SUM OF ITS PARTS

By Andrzej Brodzik

SUMMARY: DNA sequencing technology has made rapid strides in recent years, generating a deluge of data. However, the design of robust tools that can extract useful and useable information in the face of random variations and inevitable errors has been lagging. MITRE researchers are developing new mathematical techniques to address this challenge.

Planks vs. Boat

One of the pioneers of microbial genomics, Antoine Danchin, recalls in his book Delphic Boat an ancient question originally posed to the oracle of Delphi. The question asked whether a boat whose planks have all been replaced one by one over time is still the same boat. Danchin echoes this question by asking whether it is the individual components of a gene or the relationships among them that define a gene. Should we attempt to deduce the biological function and the genetic family tree of an organism simply by comparing individual parts, or rather, should we set a more ambitious goal and try to guess the overall blueprint for the genomic construction?

This question seems to be of both fundamental and timely importance now that DNA sequencing figures so heavily into such important endeavors as DNA fingerprinting, pathogen detection, gene finding, genealogy study, and evolutionary tree reconstruction. These tasks all rely, in large part, on comparing an unfamiliar DNA sequence to one that is already known, establishing islands of similarity, and then forming a hypothesis about the function or meaning of the new sequence.

DNA Sequence Analysis: The Genome Is More Than The Sum of Its Parts

Graduating from Spelling

A DNA sequence is analogous to a sentence in English in which the letters correspond to four types of organic molecules, called bases. These bases are adenine (A), guanine (G), cytosine (C), and thymine (T). In DNA fingerprinting, for example, an unknown collection of DNA fragments, typically a few tens to thousands of bases long, is compared with one of several known collections of DNA fragments contained in a library. Either or both of these collections might be incomplete or unordered, or might contain errors, including symbol insertions and symbol deletions. Finding a match between collections establishes genome identity.

The analogy with the English language can be taken further. To determine the message contained in a text or the author of the text, we concentrate on the words in the text, what they mean, and how they fit together, rather than on the individual letters of the text. And even when we can't accurately read certain letters or sentence fragments, we can still decipher much of the meaning of the text.

It is therefore appropriate that in processing DNA sequence data, even data that is plagued with missing or corrupted information, we should similarly identify the key constituent components of genomes and investigate how they fit into the overall genomic design. Instead of comparing the letters of DNA sequences, we should compare the words and the sentences they are forming.

Andrzej Brodzik

INSIDE VIEW

Andrzej Brodzik has been a scientist for 25 years, and in that time, he has seen the focus of institutional research shift from curiosity-driven to problem-driven. "This is unfortunate because it puts blinders on the eyes of scientists. Today's problems are highly complex and require fittingly complex solutions. These solutions cannot be obtained simply by tinkering, without the laborious and difficult but necessary process of a theoretical model development. That does not mean, of course, that consideration of utility is not important. Both components are truly indispensible, and, indeed, choosing one over the other almost always leads to lowering of quality." The work on the structure of the DNA sequence provides a perfect illustration of the synergy of basic and applied research. "I found this work immensely satisfying because I took a highly theoretical approach and, in effect, obtained very broadly applicable results."

Speed Reading

The first step toward decoding a genome is to identify the atomic components of the genomic sequence.

MITRE researchers began this journey by marking the genomic atoms with short random strings associated with algebraic constructs. Then the team explored the utility of these strings in a DNA sequence homology evaluation, which means a sequence comparison performed by matching versions of the original sequences. Such evaluation permits us, at a reduced computational cost, to identify, from a large pool of data, those sequences that are in some way close to the query sequence and to perform a rough sequence alignment that provides an indication of the overall sequence similarity.

This procedure consists of two essentially algebraic computations: the construction of sequences and the alignment of these sequences by a fast implementation of the cross-correlation method. Replacing the analysis of DNA sequences with an analysis of homology marker sequences significantly reduces the number of computations required. This reduction is proportional to the ratio of lengths of these sequences. MITRE designed this method motivated by its low computational complexity and its lack of limitations on the DNA sequence size it can analyze.

MITRE's research into the mathematics of DNA sequence analysis is far from a mere academic exercise. Future progress in molecular biology will bear directly on the development of effective strategies for mitigating such threats as bio-terrorism and bio-warfare. And the rate of that progress will ultimately depend on an effective merger of mathematical and life sciences.

Related Information

Articles and News

Technical Papers and Presentations

Websites

 

For more information, please contact Andrzej Brodzik using the employee directory.


Page last updated: February 22, 2011   |   Top of page

Homeland Security Center Center for Enterprise Modernization Command, Control, Communications and Intelligence Center Center for Advanced Aviation System Development

 
 
 

Solutions That Make a Difference.®
Copyright © 1997-2013, The MITRE Corporation. All rights reserved.
MITRE is a registered trademark of The MITRE Corporation.
Material on this site may be copied and distributed with permission only.

IDG's Computerworld Names MITRE a "Best Place to Work in IT" for Eighth Straight Year The Boston Globe Ranks MITRE Number 6 Top Place to Work Fast Company Names MITRE One of the "World's 50 Most Innovative Companies"
 

Privacy Policy | Contact Us