05/10/2010 10:00 am
05/10/2010 11:00 am
Category:
Ph.D. Dissertation Proposal
Advisor:
Dr. Yi Pan Aligning multiple biological sequences such as protein sequences or DNA/RNA sequences is a fundamental task in bioinformatics and sequence analysis. Studying these alignments provides scientists with information needed to predict the sequences’ structures, determine the evolutionary relationships between them, or discover drug-like compounds that can bind to the sequences. Unfortunately, multiple sequence alignment (MSA) is NP-complete. In addition, the lack of a reliable scoring method makes it harder to align the sequences reliably and to evaluate the alignment outcomes. In this research proposal, we propose a new scoring method to use in multiple sequence alignment. Our scoring method encapsulates stereo-chemical properties of sequence residues and their substitution probabilities into a tree-structure scoring scheme. This new technique provides a reliable scoring scheme with low computation complexity. In addition to the proposal of the new scoring scheme, we will introduce new multiple sequence alignment and sequence clustering algorithms to compute the phylogeny between sequences. The new MSA algorithms are based on sequence knowledge databases, or sequence consistency database, to identify the most reliable alignment. The new sequence clustering algorithms use a unique overlapping technique to classify homologous sequences. The overlapping technique allows distantly related sequences to be in multiple clusters, and the clustering decision on them is delayed until more information becomes available. Thus, it may lead to a more reliable classification of the sequences.
Committee
Department Conference Room
|
|||