A Model-Based Approach for Analysis of Spatial Structure in Genetic Data

TitleA Model-Based Approach for Analysis of Spatial Structure in Genetic Data
Publication TypeTechnical Report
Year of Publication2012
AuthorsYang, W-Y., Novembre J., Eskin E., & Halperin E.
Other Numbers3349
Abstract

Characterizing genetic diversity within and between populations has broad applications in studies of human disease and evolution. Two key step towards this objective are spatially global ancestry inference, which aims at predicting geographical locations for the ancestries of individual, and spatially local ancestry inference, which aims at predicting the geographical locations for chromosome segments, or ancestry blocks. We propose a new approach, SPALL (SPatial Ancestry analysis LocaL), for solving the two inference problems in a unified probabilistic model. This model takes linkage disequilibrium into account and can be solved efficiently by Expectation Maximization (EM) algorithm in conjunction with forward-backward algorithm. This new method allows us to assign geographical locations for parents, grandparents, and ancestries from more generations ago of an given individual. It also allows us to assign geographical locations for each locus-specific variant. We analyzed a European and a worldwide dataset, and showed that the SPALL can actually predict locations with a high accuracy. The proposed model is build as a generalization of our recently published work called Spatial Ancestry Analysis (SPA), which explicitly models the spatial distribution of each SNP by assigning an allele frequency as a continuous function in geographic space. The method allows us to assign an individual, or an admixed individual to geographical locations instead of predefined categories of population. A software including all the proposed methods is freely available in our website http://genetics.cs.ucla.edu/spa.

Acknowledgment

W.-Y.Y. and E.E. are supported by grants from the US National Science Foundation (0513612, 0731455, 0729049, 0916676 and 1065276) and the US National Institutes of Health (K25 HL080079, U01 DA024417, P01 HL30568 and PO1 HL28481).J.N. is supported by National Science Foundation grant (0933731) and by the Searle Scholars Program. E.H. is a faculty fellow of the Edmond J. Safra Program at Tel Aviv University and was supported in part by the Israeli Science Foundation(grant 04514831) and by IBM open collaborative research award program.

URLhttp://www.icsi.berkeley.edu/pubs/algorithms/spatialstructureingeneticdata12.pdf
Bibliographic Notes

Presented at the Annual Meeting of the American Society of Human Genetics, San Francisco, California.

Abbreviated Authors

W. Yang, J. Novembre, E. Eskin, and E. Halperin

ICSI Research Group

Research Initiatives

ICSI Publication Type

Article in conference proceedings