Identity-by-Descent Matrix Decomposition Using Latent Ancestral Allele Models期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Identity-by-Descent Matrix Decomposition Using Latent Ancestral Allele Models

Authors:	Cajo J. F. ter Braak Martin P. Boer L. Radu Totir Christopher R. Winkler Oscar S. Smith Marco C. A. M. Bink

Affiliation:	^*Biometris, Wageningen University and Research Centre, Wageningen, The Netherlands, 6700 AC and ^†Pioneer Hi-Bred International, A DuPont Business, Johnston, Iowa 50131

Abstract:	Genetic linkage and association studies are empowered by proper modeling of relatedness among individuals. Such relatedness can be inferred from marker and/or pedigree information. In this study, the genetic relatedness among n inbred individuals at a particular locus is expressed as an n × n square matrix Q. The elements of Q are identity-by-descent probabilities, that is, probabilities that two individuals share an allele descended from a common ancestor. In this representation the definition of the ancestral alleles and their number remains implicit. For human inspection and further analysis, an explicit representation in terms of the ancestral allele origin and the number of alleles is desirable. To this purpose, we decompose the matrix Q by a latent class model with K classes (latent ancestral alleles). Let P be an n × K matrix with assignment probabilities of n individuals to K classes constrained such that every element is nonnegative and each row sums to 1. The problem then amounts to approximating Q by PP^T, while disregarding the diagonal elements. This is not an eigenvalue problem because of the constraints on P. An efficient algorithm for calculating P is provided. We indicate the potential utility of the latent ancestral allele model. For representative locus-specific Q matrices constructed for a set of maize inbreds, the proposed model recovered the known ancestry.HIGH-THROUGHPUT techniques allow extensive genotyping of individuals for thousands of SNP markers (Gibbs et al. 2003) and thereby provide accurate information about the genetic diversity within a population at many chromosomal loci. If two individuals within this population carry the same DNA sequence at a locus, and this sequence can be traced to the same common ancestor, the individuals are said to be identical by descent (IBD) for this segment (Chapman and Thompson 2003). Quite often, however, the ancestral source of a chromosomal segment is ambiguous and thus IBD relationships between haplotypes are given as probabilities. Various methods have been described to estimate the IBD probability of pairs of chromosomal segments (Meuwissen and Goddard 2001; Leutenegger et al. 2003). When pedigree relationships are known, these can be included to estimate IBD probabilities (Wang et al. 1995; Heath 1997; George et al. 2000; Meuwissen and Goddard 2000; Besnier and Carlborg 2007).In quantitative genetic analysis we seek to find and characterize associations between the large number of SNPs that are now available for many organisms and phenotypic variation for traits of interest (e.g., grain yield and time to flowering). Many current methods developed for this purpose make use of IBD information. For example, a locus-specific matrix of IBD probabilities can be incorporated into restricted maximum-likelihood (REML) procedures for fine mapping quantitative trait loci (Bink and Meuwissen 2004) as well as for marker-based genetic evaluation (Fernando and Grossman 1989) using mixed models. The IBD matrix takes the role of a covariance matrix in the REML procedure.Other approaches, however, require that chromosome segments (also referred to here as haplotypes or alleles) are assigned to independent ancestors. These approaches include regression approaches with genetic predictors (Malosetti et al. 2006) and Bayesian oligo-allelic approaches that sample the ancestral origin of each chromosomal segment (Heath 1997; Uimari and Sillanpaa 2001; Bink et al. 2008a). In the IBD matrix representation the ancestral alleles and their number remain implicit. For these approaches, the locus-specific matrix of IBD probabilities must therefore be decomposed into a matrix that links the chromosomal segments to independent ancestral alleles. This decomposition is addressed in this article.The individuals that we consider in this article are inbred. For n inbred individuals the IBD matrix at a given chromosomal position is thus n × n, because there is no need to distinguish between identical chromosomes. In diploid, outbred populations, each individual would be represented by two haplotypes (alleles) and the matrix would be 2n × 2n (Fernando and Grossman 1989). This is feasible if any phase ambiguity can be resolved. From now on, the term “individual” thus means chromosomal segment or haplotype. Analogously, ancestor will be shorthand for ancestral allele (ancestral haplotype).We propose two models of IBD matrix decomposition, a simple threshold model (TIBD) and a more sophisticated latent ancestral allele model (LAAM), that provide (1) an estimate of the number of independent ancestral alleles, (2) a concise, easy-to-interpret, summary of the relatedness, (3) an explicit (probabilistic) representation of the descent of alleles, and (4) the ability to sample alleles for each individual from a set of ancestral alleles in such a way that the probability that a pair of individuals shares the same allele corresponds to their IBD probability.The last two features of the model are essential for its use in Bayesian oligo-allelic approaches to quantitative trait locus (QTL) analysis (Uimari and Sillanpaa 2001; Bink et al. 2008a).

Keywords:

设为首页 | 免责声明 | 关于勤云 | 加入收藏