Bioinformatics Projects
Analysis of Genome-Wide Association Studies for Common Diseases
In these studies, sets of cases (individuals carrying a disease) and controls (background population) are collected and genotyped for genetic variants, normally single nucleotide polymorphisms (SNPs). Our group is collaborating closely with groups of geneticists and epidimiologists who have collected such samples. We take part in the analysis of these studies, and in some cases also in the design of the studies. Our current collaborations include studies of cardiovascular diseases, ovarian cancer, breast cancer, and non-Hodgkin's lymphoma; for the latter, see our recent publication in Nature Genetics.
Computational Methods for the Identification of Disease-Genotype Associations
We develop computational methods that aid in the analysis of genome-wide association studies, or other studies that involve the inference of a relation between a genetic variant such as an SNP or a copy-number variant (CNV), with a given phenotype that was measured for the studied population. These methods include haplotype inference methods, ancestry inference methods, and the incorporation of these in a statistical or machine-learning framework that is used to test for an association of a genetic marker with a phenotype.
Statistical Genetics and Populaton Genetics
We develop computational methods for the inference of evolutionary and genetic characteristics, such as the inference of recombination events, estimation of mutation rates, human history, etc., from genetic data.
Methods for the Analysis of High-Throughput Sequencing Data
We are currently developing methods for the design and analysis of studies that involve high-throughput sequencing technologies, such as the Solexa, 454, or Solid platforms.
Finding Conserved Protein Modules
A long-term goal of computational molecular biology is to extract, from large data sets, information about how proteins work together to carry out life processes at a cellular level. We are investigating protein-protein interaction (PPI) networks, in which the vertices are the proteins within a species and the edges indicate direct interactions between proteins. Our goal is to discover conserved protein modules: richly interacting sets of proteins whose patterns of interaction are conserved across two or more species. Such modules correspond to subgraphs of a PPI network that have many internal edges, relatively few external edges, and a structure that persists from one species to another. We have developed several algorithms and software packages for finding conserved protein modules, and compared them systematically with previous work.
This research leads naturally to fundamental abstract problems in computational graph theory, such as the problem of partitioning a graph into vertex-disjoint subgraphs with a high density of edges, or the colorful subgraph problem, in which each vertex of a graph is assigned a color, and the goal is to find a small connected subgraph containing at least one vertex of each color. We will systematically derive and evaluate heuristic algorithms for these NP-hard problems.
Analysis of Brain Networks
We are developing computational tools to analyze structural and functional brain networks derived from dMRI, fMRI, and MEG brain images. These are being used in a variety of joint studies with researchers at UC San Francisco to increase our understanding and improve diagnoses for a variety of neurological disorders, including agenesis of the corpus callosum (a birth defect), multiple sclerosis, and traumatic brain injuries. This project is led by Eric Friedman.
Heuristic Combinatorial Algorithms
Analysis of Heuristic Combinatorial Algorithms
In many practical situations heuristic algorithms reliably give satisfactory solutions to real-life instances of optimization problems, despite evidence from computational complexity theory that the problems are intractable. Our goal is to understand this seeming contradiction, and to put the construction and evaluation of heuristic algorithms on a firmer footing.
We have recently developed heuristic combinatorial algorithms for a number of problems arising in computational biology, including multiple global alignment of genomes, identification of protein modules containing proteins of specified types, and the partitioning of graphs into small, clique-like components.
We will develop a general empirical method for selecting an optimal choice of parameters and subroutines within a well defined heuristic algorithmic strategy. The method assumes the availability of a supply of typical problem instances which can be used to train the algorithmic strategy and evaluate its performance. The search for optimal parameters and subroutines will be treated as an optimization problem in its own right.
More about the Algorithms Research Group
>>
top |