A simple and efficient algorithm is presented for finding a maximum likelihood pedigree using microsatellite (STR) genotype information on a complete sample of related individuals. The computational complexity of the algorithm is at worst (O(n32n)), where n is the number of individuals. Thus it is possible to exhaustively search the space of all pedigrees of up to thirty individuals for one that maximizes the likelihood. A priori age and sex information can be used if available, but is not essential. The algorithm is applied in a simulation study, and to some real data on humans.  相似文献   

J Wang 《Heredity》2013,111(2):165-174
Many methods have been proposed to reconstruct the pedigree of a sample of individuals from their multilocus marker genotypes. These methods, like those in other fields of statistical inferences, may suffer from both type I (falsely related) and type II (falsely unrelated) errors. In sibship reconstruction, type I errors come from the spurious fusion of two or more small sibships into a single sibship, and type II errors originate from the spurious splitting of a large sibship into two or more small sibships. In this study I investigate the tendencies of both types of errors made by the likelihood methods in sibship reconstruction, using both analytical and simulation approaches. I propose an improvement on the likelihood methods to reduce sibship splitting, and thus type II errors by downscaling the number of inferred siblings sharing the same genotype at a locus. Simulations are then conducted to compare the accuracy of the original and improved likelihood methods in sibship reconstruction of a large sample of individuals in full-sib families of the same small size, the same large size and highly variable sizes, using a variable number of loci with a variable number of alleles per locus. The methods were also applied to the analysis of a salmon data set. I show that my scaling scheme prevents effectively the splitting of large sibships, and reduces type II errors greatly with little increase in type I errors. As a result, it improves the overall accuracy of sibship assignments, except when sibships are expected to be uniformly small or marker information is unrealistically scarce.  相似文献   

We introduce a new statistical computing method, called data cloning, to calculate maximum likelihood estimates and their standard errors for complex ecological models. Although the method uses the Bayesian framework and exploits the computational simplicity of the Markov chain Monte Carlo (MCMC) algorithms, it provides valid frequentist inferences such as the maximum likelihood estimates and their standard errors. The inferences are completely invariant to the choice of the prior distributions and therefore avoid the inherent subjectivity of the Bayesian approach. The data cloning method is easily implemented using standard MCMC software. Data cloning is particularly useful for analysing ecological situations in which hierarchical statistical models, such as state-space models and mixed effects models, are appropriate. We illustrate the method by fitting two nonlinear population dynamics models to data in the presence of process and observation noise.  相似文献   

Simulations are used to compare four statistics for the detection of a quantitative trait locus (QTL) in daughter and grand-daughter designs as defined by Soller and Genizi (1978) and Weller et al. (1990): (1) the Fisher test of a linear model including a marker effect within sire or grand-sire effect; (2) the likelihood ratio test of a segregation analysis without the information given by the marker; (3) the likelihood ratio test of a segregation analysis considering the information from the marker; and (4) the lod score which is the likelihood ratio test of absence of linkage between the marker and the QTL. In all cases the two segregation analyses are more powerful for QTL detection than are either the linear method or the lod score. The differences in power are generally limited but may be significant (in a ratio of 1 to 3 or 4) when the QTL has a small effect (0.2 standard deviations) and is not closely linked to the marker (recombination rate of 20% or more).  相似文献   

Ancestral state reconstruction is a method used to study the evolutionary trajectories of quantitative characters on phylogenies. Although efficient methods for univariate ancestral state reconstruction under a Brownian motion model have been described for at least 25 years, to date no generalization has been described to allow more complex evolutionary models, such as multivariate trait evolution, non‐Brownian models, missing data, and within‐species variation. Furthermore, even for simple univariate Brownian motion models, most phylogenetic comparative R packages compute ancestral states via inefficient tree rerooting and full tree traversals at each tree node, making ancestral state reconstruction extremely time‐consuming for large phylogenies. Here, a computationally efficient method for fast maximum likelihood ancestral state reconstruction of continuous characters is described. The algorithm has linear complexity relative to the number of species and outperforms the fastest existing R implementations by several orders of magnitude. The described algorithm is capable of performing ancestral state reconstruction on a 1,000,000‐species phylogeny in fewer than 2 s using a standard laptop, whereas the next fastest R implementation would take several days to complete. The method is generalizable to more complex evolutionary models, such as phylogenetic regression, within‐species variation, non‐Brownian evolutionary models, and multivariate trait evolution. Because this method enables fast repeated computations on phylogenies of virtually any size, implementation of the described algorithm can drastically alleviate the computational burden of many otherwise prohibitively time‐consuming tasks requiring reconstruction of ancestral states, such as phylogenetic imputation of missing data, bootstrapping procedures, Expectation‐Maximization algorithms, and Bayesian estimation. The described ancestral state reconstruction algorithm is implemented in the Rphylopars functions anc.recon and phylopars.  相似文献   

Maximum likelihood estimation via the ECM algorithm: A general framework   总被引:35,自引:0,他引:35  
MENG  XIAO-LI; RUBIN  DONALD B. 《Biometrika》1993,80(2):267-278

Bias reduction of maximum likelihood estimates   总被引:9,自引:0,他引:9  
FIRTH  DAVID 《Biometrika》1993,80(1):27-38

Owing to the exponential growth of genome databases, phylogenetic trees are now widely used to test a variety of evolutionary hypotheses. Nevertheless, computation time burden limits the application of methods such as maximum likelihood nonparametric bootstrap to assess reliability of evolutionary trees. As an alternative, the much faster Bayesian inference of phylogeny, which expresses branch support as posterior probabilities, has been introduced. However, marked discrepancies exist between nonparametric bootstrap proportions and Bayesian posterior probabilities, leading to difficulties in the interpretation of sometimes strongly conflicting results. As an attempt to reconcile these two indices of node reliability, we apply the nonparametric bootstrap resampling procedure to the Bayesian approach. The correlation between posterior probabilities, bootstrap maximum likelihood percentages, and bootstrapped posterior probabilities was studied for eight highly diverse empirical data sets and were also investigated using experimental simulation. Our results show that the relation between posterior probabilities and bootstrapped maximum likelihood percentages is highly variable but that very strong correlations always exist when Bayesian node support is estimated on bootstrapped character matrices. Moreover, simulations corroborate empirical observations in suggesting that, being more conservative, the bootstrap approach might be less prone to strongly supporting a false phylogenetic hypothesis. Thus, apparent conflicts in topology recovered by the Bayesian approach were reduced after bootstrapping. Both posterior probabilities and bootstrap supports are of great interest to phylogeny as potential upper and lower bounds of node reliability, but they are surely not interchangeable and cannot be directly compared.  相似文献   

On the existence of maximum likelihood estimates in logistic regression models   总被引:12,自引:0,他引:12  
ALBERT  A.; ANDERSON  J. A. 《Biometrika》1984,71(1):1-10

Summary The maximum likelihood (ML) method for constructing phylogenetic trees (both rooted and unrooted trees) from DNA sequence data was studied. Although there is some theoretical problem in the comparison of ML values conditional for each topology, it is possible to make a heuristic argument to justify the method. Based on this argument, a new algorithm for estimating the ML tree is presented. It is shown that under the assumption of a constant rate of evolution, the ML method and UPGMA always give the same rooted tree for the case of three operational taxonomic units (OTUs). This also seems to hold approximately for the case with four OTUs. When we consider unrooted trees with the assumption of a varying rate of nucleotide substitution, the efficiency of the ML method in obtaining the correct tree is similar to those of the maximum parsimony method and distance methods. The ML method was applied to Brown et al.'s data, and the tree topology obtained was the same as that found by the maximum parsimony method, but it was different from those obtained by distance methods.  相似文献   

 Integer Linear Programming was used to maximize genetic gain from selection at a given level of relatedness. Variances and breeding values for total height were available for 296 plus-trees of Pinus sylvestris which had been evaluated by open-pollinated progeny testing at a single test site in northern Sweden. Second-generation breeding and selection scenarios for this breeding population were evaluated using simulated data derived deterministically from normal distributions of estimated breeding values of progeny around mid-parent family means. The study considered two mating designs, assortative and non-assortative single-pair mating, and two selection criteria, individual phenotype and performance of half-sib progeny. Relatedness (group coancestry) was restricted to a level equivalent to that given by within-family selection of 2 trees per family from each of 25 families (the current standard in Sweden). Selection that allows the best-performing families to contribute a greater number of progeny was superior, both when the breeding population size was limited to 50 individuals and when it was allowed to be larger. The selected set giving the greatest average breeding value under restricted group coancestry included the best individual from families that would have been rejected under application of standard within-family selection. We made a comparison of the present value on retrieved gain between phenotypic selection and evaluation by progeny testing. Received: 24 November 1998 / Accepted: 14 December 1998  相似文献   

Aitkin M 《Biometrics》1999,55(1):117-128
This paper describes an EM algorithm for nonparametric maximum likelihood (ML) estimation in generalized linear models with variance component structure. The algorithm provides an alternative analysis to approximate MQL and PQL analyses (McGilchrist and Aisbett, 1991, Biometrical Journal 33, 131-141; Breslow and Clayton, 1993; Journal of the American Statistical Association 88, 9-25; McGilchrist, 1994, Journal of the Royal Statistical Society, Series B 56, 61-69; Goldstein, 1995, Multilevel Statistical Models) and to GEE analyses (Liang and Zeger, 1986, Biometrika 73, 13-22). The algorithm, first given by Hinde and Wood (1987, in Longitudinal Data Analysis, 110-126), is a generalization of that for random effect models for overdispersion in generalized linear models, described in Aitkin (1996, Statistics and Computing 6, 251-262). The algorithm is initially derived as a form of Gaussian quadrature assuming a normal mixing distribution, but with only slight variation it can be used for a completely unknown mixing distribution, giving a straightforward method for the fully nonparametric ML estimation of this distribution. This is of value because the ML estimates of the GLM parameters can be sensitive to the specification of a parametric form for the mixing distribution. The nonparametric analysis can be extended straightforwardly to general random parameter models, with full NPML estimation of the joint distribution of the random parameters. This can produce substantial computational saving compared with full numerical integration over a specified parametric distribution for the random parameters. A simple method is described for obtaining correct standard errors for parameter estimates when using the EM algorithm. Several examples are discussed involving simple variance component and longitudinal models, and small-area estimation.  相似文献   

Revealing mechanisms underlying complex diseases poses great challenges to biologists. The traditional linkage and linkage disequilibrium analysis that have been successful in the identification of genes responsible for Mendelian traits, however, have not led to similar success in discovering genes influencing the development of complex diseases. Emerging functional genomic and proteomic ('omic') resources and technologies provide great opportunities to develop new methods for systematic identification of genes underlying complex diseases. In this report, we propose a systems biology approach, which integrates omic data, to find genes responsible for complex diseases. This approach consists of five steps: (1) generate a set of candidate genes using gene-gene interaction data sets; (2) reconstruct a genetic network with the set of candidate genes from gene expression data; (3) identify differentially regulated genes between normal and abnormal samples in the network; (4) validate regulatory relationship between the genes in the network by perturbing the network using RNAi and monitoring the response using RT-PCR; and (5) genotype the differentially regulated genes and test their association with the diseases by direct association studies. To prove the concept in principle, the proposed approach is applied to genetic studies of the autoimmune disease scleroderma or systemic sclerosis.  相似文献   

In this work, we fit pattern-mixture models to data sets with responses that are potentially missing not at random (MNAR, Little and Rubin, 1987). In estimating the regression parameters that are identifiable, we use the pseudo maximum likelihood method based on exponential families. This procedure provides consistent estimators when the mean structure is correctly specified for each pattern, with further information on the variance structure giving an efficient estimator. The proposed method can be used to handle a variety of continuous and discrete outcomes. A test built on this approach is also developed for model simplification in order to improve efficiency. Simulations are carried out to compare the proposed estimation procedure with other methods. In combination with sensitivity analysis, our approach can be used to fit parsimonious semi-parametric pattern-mixture models to outcomes that are potentially MNAR. We apply the proposed method to an epidemiologic cohort study to examine cognition decline among elderly.  相似文献   

