Talks at the International Computer Science Institute

The International Computer Science Institute
is pleased to present a talk:


"Applications of sequence motifs in protein sequence analysis"

Asa Ben-Hur
Stanford University

Friday, April 30, 2004
ICSI, Conference Room 6A
12:00 pm

Abstract:

Protein sequence motifs provide a representation of sequence elements that are conserved across proteins with similar structure and function. Sequence motifs are often considered the building blocks of proteins. We explore this thinking from a machine learning perspective, by showing the usefulness of motifs in several tasks in protein sequence analysis: protein function prediction, remote homology detection, and prediction of protein-protein interactions. We show that the function of a protein can be accurately inferred by its motif composition; in that respect, our method is similar to the "bag of words" representation that is commonly used in text categorization. To address the high dimensionality of the data we use feature selection, and find that most classes of enzymes can be predicted using a handful of motifs, yielding interpretable classifiers whose performance is similar to that obtained using BLAST, the standard method for measuring sequence similarity between proteins. A motif based approach also provides state of the art performance in detecting remote homology, i.e. distant evolutionary relationships between sequences, that are very hard to discover. For the prediction of protein-protein interactions, we show work in progress that is based on the presence of motifs pairs.