首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
PAML 4: phylogenetic analysis by maximum likelihood   总被引:41,自引:1,他引:41  
PAML, currently in version 4, is a package of programs for phylogeneticanalyses of DNA and protein sequences using maximum likelihood(ML). The programs may be used to compare and test phylogenetictrees, but their main strengths lie in the rich repertoire ofevolutionary models implemented, which can be used to estimateparameters in models of sequence evolution and to test interestingbiological hypotheses. Uses of the programs include estimationof synonymous and nonsynonymous rates (dN and dS) between twoprotein-coding DNA sequences, inference of positive Darwinianselection through phylogenetic comparison of protein-codinggenes, reconstruction of ancestral genes and proteins for molecularrestoration studies of extinct life forms, combined analysisof heterogeneous data sets from multiple gene loci, and estimationof species divergence times incorporating uncertainties in fossilcalibrations. This note discusses some of the major applicationsof the package, which includes example data sets to demonstratetheir use. The package is written in ANSI C, and runs underWindows, Mac OSX, and UNIX systems. It is available at http://abacus.gene.ucl.ac.uk/software/paml.html.  相似文献   

2.
Malin is a software package for the analysis of eukaryotic gene structure evolution. It provides a graphical user interface for various tasks commonly used to infer the evolution of exon-intron structure in protein-coding orthologs. Implemented tasks include the identification of conserved homologous intron sites in protein alignments, as well as the estimation of ancestral intron content, lineage-specific intron losses and gains. Estimates are computed either with parsimony, or with a probabilistic model that incorporates rate variation across lineages and intron sites. Availability: Malin is available as a stand-alone Java application, as well as an application bundle for MacOS X, at the website http://www.iro.umontreal.ca/~csuros/introns/malin/. The software is distributed under a BSD-style license.  相似文献   

3.
THESEUS is a command line program for performing maximum likelihood (ML) superpositions and analysis of macromolecular structures. While conventional superpositioning methods use ordinary least-squares (LS) as the optimization criterion, ML superpositions provide substantially improved accuracy by down-weighting variable structural regions and by correcting for correlations among atoms. ML superpositioning is robust and insensitive to the specific atoms included in the analysis, and thus it does not require subjective pruning of selected variable atomic coordinates. Output includes both likelihood-based and frequentist statistics for accurate evaluation of the adequacy of a superposition and for reliable analysis of structural similarities and differences. THESEUS performs principal components analysis for analyzing the complex correlations found among atoms within a structural ensemble. AVAILABILITY: ANSI C source code and selected binaries for various computing platforms are available under the GNU open source license from http://monkshood.colorado.edu/theseus/ or http://www.theseus3d.org.  相似文献   

4.
This study presents an effective procedure for the determination of a biologically inspired, black-box model of cultures of microorganisms (including yeasts, bacteria, plant and animal cells) in bioreactors. This procedure is based on sets of experimental data measuring the time-evolution of a few extracellular species concentrations, and makes use of maximum likelihood principal component analysis to determine, independently of the kinetics, an appropriate number of macroscopic reactions and their stoichiometry. In addition, this paper provides a discussion of the geometric interpretation of a stoichiometric matrix and the potential equivalent reaction schemes. The procedure is carefully evaluated within the stoichiometric identification framework of the growth of the yeast Kluyveromyces marxianus on cheese whey. Using Monte Carlo studies, it is also compared with two other previously published approaches.  相似文献   

5.
Identification of phenotypic modules, semiautonomous sets of highly correlated traits, can be accomplished through exploratory (e.g., cluster analysis) or confirmatory approaches (e.g., RV coefficient analysis). Although statistically more robust, confirmatory approaches are generally unable to compare across different model structures. For example, RV coefficient analysis finds support for both two‐ and six‐module models for the therian mammalian skull. Here, we present a maximum likelihood approach that takes into account model parameterization. We compare model log‐likelihoods of trait correlation matrices using the finite‐sample corrected Akaike Information Criterion, allowing for comparison of hypotheses across different model structures. Simulations varying model complexity and within‐ and between‐module contrast demonstrate that this method correctly identifies model structure and parameters across a wide range of conditions. We further analyzed a dataset of 3‐D data, consisting of 61 landmarks from 181 macaque (Macaca fuscata) skulls, distributed among five age categories, testing 31 models, including no modularity among the landmarks and various partitions of two, three, six, and eight modules. Our results clearly support a complex six‐module model, with separate within‐ and intermodule correlations. Furthermore, this model was selected for all five age categories, demonstrating that this complex pattern of integration in the macaque skull appears early and is highly conserved throughout postnatal ontogeny. Subsampling analyses demonstrate that this method is robust to relatively low sample sizes, as is commonly encountered in rare or extinct taxa. This new approach allows for the direct comparison of models with different parameterizations, providing an important tool for the analysis of modularity across diverse systems.  相似文献   

6.
SUMMARY: TREE-PUZZLE is a program package for quartet-based maximum-likelihood phylogenetic analysis (formerly PUZZLE, Strimmer and von Haeseler, Mol. Biol. Evol., 13, 964-969, 1996) that provides methods for reconstruction, comparison, and testing of trees and models on DNA as well as protein sequences. To reduce waiting time for larger datasets the tree reconstruction part of the software has been parallelized using message passing that runs on clusters of workstations as well as parallel computers. AVAILABILITY: http://www.tree-puzzle.de. The program is written in ANSI C. TREE-PUZZLE can be run on UNIX, Windows and Mac systems, including Mac OS X. To run the parallel version of PUZZLE, a Message Passing Interface (MPI) library has to be installed on the system. Free MPI implementations are available on the Web (cf. http://www.lam-mpi.org/mpi/implementations/).  相似文献   

7.
Although there has been a recent proliferation in maximum‐likelihood (ML)‐based tree estimation methods based on a fixed sequence alignment (MSA), little research has been done on incorporating indel information in this traditional framework. We show, using a simple model on a single character example, that a trivial alignment of a different form than that previously identified for parsimony is optimal in ML under standard assumptions treating indels as “missing” data, but that it is not optimal when indels are incorporated into the character alphabet. We show that the optimality of the trivial alignment is not an artefact of simplified theory assumptions by demonstrating that trivial alignment likelihoods of five different multiple sequence alignment datasets exhibit this phenomenon. These results demonstrate the need for use of indel information in likelihood analysis on fixed MSAs, and suggest that caution must be exercised when drawing conclusions from software implementations claiming improvements in likelihood scores under an indels‐as‐missing assumption. © The Willi Hennig Society 2012.  相似文献   

8.
The Cox proportional hazards model or its discrete time analogue, the logistic failure time model, posit highly restrictive parametric models and attempt to estimate parameters which are specific to the model proposed. These methods are typically implemented when assessing effect modification in survival analyses despite their flaws. The targeted maximum likelihood estimation (TMLE) methodology is more robust than the methods typically implemented and allows practitioners to estimate parameters that directly answer the question of interest. TMLE will be used in this paper to estimate two newly proposed parameters of interest that quantify effect modification in the time to event setting. These methods are then applied to the Tshepo study to assess if either gender or baseline CD4 level modify the effect of two cART therapies of interest, efavirenz (EFV) and nevirapine (NVP), on the progression of HIV. The results show that women tend to have more favorable outcomes using EFV while males tend to have more favorable outcomes with NVP. Furthermore, EFV tends to be favorable compared to NVP for individuals at high CD4 levels.  相似文献   

9.
Testing congruence in phylogenomic analysis   总被引:1,自引:0,他引:1  
Phylogenomic analyses of large sets of genes or proteins have the potential to revolutionize our understanding of the tree of life. However, problems arise because estimated phylogenies from individual loci often differ because of different histories, systematic bias, or stochastic error. We have developed Concaterpillar, a hierarchical clustering method based on likelihood-ratio testing that identifies congruent loci for phylogenomic analysis. Concaterpillar also includes a test for shared relative evolutionary rates between genes indicating whether they should be analyzed separately or by concatenation. In simulation studies, the performance of this method is excellent when a multiple comparison correction is applied. We analyzed a phylogenomic data set of 60 translational protein sequences from the major supergroups of eukaryotes and identified three congruent subsets of proteins. Analysis of the largest set indicates improved congruence relative to the full data set and produced a phylogeny with stronger support for five eukaryote supergroups including the Opisthokonts, the Plantae, the stramenopiles + Apicomplexa (chromalveolates), the Amoebozoa, and the Excavata. In contrast, the phylogeny of the second largest set indicates a close relationship between stramenopiles and red algae, to the exclusion of alveolates, suggesting gene transfer from the red algal secondary symbiont to the ancestral stramenopile host nucleus during the origin of their chloroplast. Investigating phylogenomic data sets for conflicting signals has the potential to both improve phylogenetic accuracy and inform our understanding of genome evolution.  相似文献   

10.

Background

Most phylogenetic studies using molecular data treat gaps in multiple sequence alignments as missing data or even completely exclude alignment columns that contain gaps.

Results

Here we show that gap patterns in large-scale, genome-wide alignments are themselves phylogenetically informative and can be used to infer reliable phylogenies provided the gap data are properly filtered to reduce noise introduced by the alignment method. We introduce here the notion of split-inducing indels (splids) that define an approximate bipartition of the taxon set. We show both in simulated data and in case studies on real-life data that splids can be efficiently extracted from phylogenomic data sets.

Conclusions

Suitably processed gap patterns extracted from genome-wide alignment provide a surprisingly clear phylogenetic signal and an allow the inference of accurate phylogenetic trees.
  相似文献   

11.
Cancer heterogeneity is a significant factor in response to treatment and escape leading to relapse. Within an individual cancer, especially blood cancers, there exists multiple subclones as well as distinct clonal expansions unrelated to the clinically detected, dominant clone. Over time, multiple subclones and clones undergo emergence, expansion, and extinction. Although sometimes this intra-clonal and inter-clonal heterogeneity can be detected and/or quantified in tests that measure aggregate populations of cells, frequently, such heterogeneity can only be detected using single cell analysis to determine its frequency and to detect minor clones that may subsequently emerge to become drug resistant and dominant. Most genetic/genomic tests look at the pooled tumor population as a whole rather than at its individual cellular components. Yet, minor clones and cancer stem cells are unlikely to be detected against the background of expanded major clones. Because selective pressures are likely to govern much of what is seen clinically, single cell analysis allows identification of otherwise cryptic compartments of the malignancy that may ultimately mediate progression and relapse. Single cell analysis can track intra- or inter-clonal heterogeneity and provide useful clinical information, often before changes in the disease are detectable in the clinic. To a very limited extent, single cell analysis has already found roles in clinical care. Because inter- and intra-clonal heterogeneity likely occurs more frequently than can be currently appreciated on a clinical level, future use of single cell analysis is likely to have profound clinical utility.  相似文献   

12.
A Forcina 《Biometrics》1992,48(3):743-750
For linear models, assuming a within-experimental-units covariance structure that incorporates errors of measurement, serial correlation, and variation between units, results on explicit estimation of regression parameters are used to simplify maximum likelihood estimation of covariance parameters. The use of an analysis of variance table as a simpler alternative to likelihood inference is illustrated with two examples.  相似文献   

13.
Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net.  相似文献   

14.
MOTIVATION: In recent years there has been increased interest in producing large and accurate phylogenetic trees using statistical approaches. However for a large number of taxa, it is not feasible to construct large and accurate trees using only a single processor. A number of specialized parallel programs have been produced in an attempt to address the huge computational requirements of maximum likelihood. We express a number of concerns about the current set of parallel phylogenetic programs which are currently severely limiting the widespread availability and use of parallel computing in maximum likelihood-based phylogenetic analysis. RESULTS: We have identified the suitability of phylogenetic analysis to large-scale heterogeneous distributed computing. We have completed a distributed and fully cross-platform phylogenetic tree building program called distributed phylogeny reconstruction by maximum likelihood. It uses an already proven maximum likelihood-based tree building algorithm and a popular phylogenetic analysis library for all its likelihood calculations. It offers one of the most extensive sets of DNA substitution models currently available. We are the first, to our knowledge, to report the completion of a distributed phylogenetic tree building program that can achieve near-linear speedup while only using the idle clock cycles of machines. For those in an academic or corporate environment with hundreds of idle desktop machines, we have shown how distributed computing can deliver a 'free' ML supercomputer.  相似文献   

15.
Aitkin M 《Biometrics》1999,55(1):117-128
This paper describes an EM algorithm for nonparametric maximum likelihood (ML) estimation in generalized linear models with variance component structure. The algorithm provides an alternative analysis to approximate MQL and PQL analyses (McGilchrist and Aisbett, 1991, Biometrical Journal 33, 131-141; Breslow and Clayton, 1993; Journal of the American Statistical Association 88, 9-25; McGilchrist, 1994, Journal of the Royal Statistical Society, Series B 56, 61-69; Goldstein, 1995, Multilevel Statistical Models) and to GEE analyses (Liang and Zeger, 1986, Biometrika 73, 13-22). The algorithm, first given by Hinde and Wood (1987, in Longitudinal Data Analysis, 110-126), is a generalization of that for random effect models for overdispersion in generalized linear models, described in Aitkin (1996, Statistics and Computing 6, 251-262). The algorithm is initially derived as a form of Gaussian quadrature assuming a normal mixing distribution, but with only slight variation it can be used for a completely unknown mixing distribution, giving a straightforward method for the fully nonparametric ML estimation of this distribution. This is of value because the ML estimates of the GLM parameters can be sensitive to the specification of a parametric form for the mixing distribution. The nonparametric analysis can be extended straightforwardly to general random parameter models, with full NPML estimation of the joint distribution of the random parameters. This can produce substantial computational saving compared with full numerical integration over a specified parametric distribution for the random parameters. A simple method is described for obtaining correct standard errors for parameter estimates when using the EM algorithm. Several examples are discussed involving simple variance component and longitudinal models, and small-area estimation.  相似文献   

16.

Background  

In contemporary biology, complex biological processes are increasingly studied by collecting and analyzing measurements of the same entities that are collected with different analytical platforms. Such data comprise a number of data blocks that are coupled via a common mode. The goal of collecting this type of data is to discover biological mechanisms that underlie the behavior of the variables in the different data blocks. The simultaneous component analysis (SCA) family of data analysis methods is suited for this task. However, a SCA may be hampered by the data blocks being subjected to different amounts of measurement error, or noise. To unveil the true mechanisms underlying the data, it could be fruitful to take noise heterogeneity into consideration in the data analysis. Maximum likelihood based SCA (MxLSCA-P) was developed for this purpose. In a previous simulation study it outperformed normal SCA-P. This previous study, however, did not mimic in many respects typical functional genomics data sets, such as, data blocks coupled via the experimental mode, more variables than experimental units, and medium to high correlations between variables. Here, we present a new simulation study in which the usefulness of MxLSCA-P compared to ordinary SCA-P is evaluated within a typical functional genomics setting. Subsequently, the performance of the two methods is evaluated by analysis of a real life Escherichia coli metabolomics data set.  相似文献   

17.
SUMMARY: IQPNNI is a program to infer maximum-likelihood phylogenetic trees from DNA or protein data with a large number of sequences. We present an improved and MPI-parallel implementation showing very good scaling and speed-up behavior.  相似文献   

18.
In this article, we provide a template for the practical implementation of the targeted maximum likelihood estimator for analyzing causal effects of multiple time point interventions, for which the methodology was developed and presented in Part I. In addition, the application of this template is demonstrated in two important estimation problems: estimation of the effect of individualized treatment rules based on marginal structural models for treatment rules, and the effect of a baseline treatment on survival in a randomized clinical trial in which the time till event is subject to right censoring.  相似文献   

19.
Shiel  R. J.  Koste  W.  Tan  L. W. 《Hydrobiologia》1989,(1):239-245
The results of four field surveys for Rotifera in Tasmania are summarized. Most new species and records in a 1987 survey were from acid waters (pH < 4.0) of dune lakes on the west coast (42° S). Marked intra- and interhabitat differences in rotifer communities of lakes and ponds were demonstrated by cluster analysis and related to habitat heterogeneity.  相似文献   

20.
Summary The efficiency of obtaining the correct tree by the maximum likelihood method (Felsenstein 1981) for inferring trees from DNA sequence data was compared with trees obtained by distance methods. It was shown that the maximum likelihood method is superior to distance methods in the efficiency particularly when the evolutionary rate differs among lineages.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号