首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In phylogenetic analyses of molecular sequence data, partitioning involves estimating independent models of molecular evolution for different sets of sites in a sequence alignment. Choosing an appropriate partitioning scheme is an important step in most analyses because it can affect the accuracy of phylogenetic reconstruction. Despite this, partitioning schemes are often chosen without explicit statistical justification. Here, we describe two new objective methods for the combined selection of best-fit partitioning schemes and nucleotide substitution models. These methods allow millions of partitioning schemes to be compared in realistic time frames and so permit the objective selection of partitioning schemes even for large multilocus DNA data sets. We demonstrate that these methods significantly outperform previous approaches, including both the ad hoc selection of partitioning schemes (e.g., partitioning by gene or codon position) and a recently proposed hierarchical clustering method. We have implemented these methods in an open-source program, PartitionFinder. This program allows users to select partitioning schemes and substitution models using a range of information-theoretic metrics (e.g., the Bayesian information criterion, akaike information criterion [AIC], and corrected AIC). We hope that PartitionFinder will encourage the objective selection of partitioning schemes and thus lead to improvements in phylogenetic analyses. PartitionFinder is written in Python and runs under Mac OSX 10.4 and above. The program, source code, and a detailed manual are freely available from www.robertlanfear.com/partitionfinder.  相似文献   

2.
Xia X 《Systematic biology》2000,49(1):87-100
The horseshoe crabs, known as living fossils, have maintained their morphology almost unchanged for the past 150 million years. The little morphological differentiation among horseshoe crab lineages has resulted in substantial controversy concerning the phylogenetic relationship among the extant species of horseshoe crabs, especially among the three species in the Indo-Pacific region. Previous studies suggest that the three species constitute a phylogenetically unresolvable trichotomy, the result of a cladogenetic process leading to the formation of all three Indo-Pacific species in a short geological time. Data from two mitochondrial genes (for 16S ribosomal rRNA and cytochrome oxidase subunit I) and one nuclear gene (for coagulogen) in the four species of horseshoe crabs and outgroup species were used in a phylogenetic analysis with various substitution models. All three genes yield the same tree topology, with Tachypleus-gigas and Carcinoscorpius-rotundicauda grouped together as a monophyletic taxon. This topology is significantly better than all the alternatives when evaluated with the RELL (resampling estimated log-likelihood) method.  相似文献   

3.
Previous work has shown that it is often essential to account for the variation in rates at different sites in phylogenetic models in order to avoid phylogenetic artifacts such as long branch attraction. In most current models, the gamma distribution is used for the rates-across-sites distributions and is implemented as an equal-probability discrete gamma. In this article, we introduce discrete distribution estimates with large numbers of equally spaced rate categories allowing us to investigate the appropriateness of the gamma model. With large numbers of rate categories, these discrete estimates are flexible enough to approximate the shape of almost any distribution. Likelihood ratio statistical tests and a nonparametric bootstrap confidence-bound estimation procedure based on the discrete estimates are presented that can be used to test the fit of a parametric family. We applied the methodology to several different protein data sets, and found that although the gamma model often provides a good parametric model for this type of data, rate estimates from an equal-probability discrete gamma model with a small number of categories will tend to underestimate the largest rates. In cases when the gamma model assumption is in doubt, rate estimates coming from the discrete rate distribution estimate with a large number of rate categories provide a robust alternative to gamma estimates. An alternative implementation of the gamma distribution is proposed that, for equal numbers of rate categories, is computationally more efficient during optimization than the standard gamma implementation and can provide more accurate estimates of site rates.  相似文献   

4.
Phylogenetic tree reconstruction frequently assumes the homogeneity of the substitution process over the whole tree. To test this assumption statistically, we propose a test based on the sample covariance matrix of the set of substitution rate matrices estimated from pairwise sequence comparison. The sample covariance matrix is condensed into a one-dimensional test statistic Delta = sum ln(1 + delta(i)), where delta(i) are the eigenvalues of the sample covariance matrix. The test does not assume a specific mutational model. It analyses the variation in the estimated rate matrices. The distribution of this test statistic is determined by simulations based on the phylogeny estimated from the data. We study the power of the test under various scenarios and apply the test to X chromosome and mtDNA primate sequence data. Finally, we demonstrate how to include rate variation in the test.  相似文献   

5.
The root of a phylogenetic tree is fundamental to its biological interpretation, but standard substitution models do not provide any information on its position. Here, we describe two recently developed models that relax the usual assumptions of stationarity and reversibility, thereby facilitating root inference without the need for an outgroup. We compare the performance of these models on a classic test case for phylogenetic methods, before considering two highly topical questions in evolutionary biology: the deep structure of the tree of life and the root of the archaeal radiation. We show that all three alignments contain meaningful rooting information that can be harnessed by these new models, thus complementing and extending previous work based on outgroup rooting. In particular, our analyses exclude the root of the tree of life from the eukaryotes or Archaea, placing it on the bacterial stem or within the Bacteria. They also exclude the root of the archaeal radiation from several major clades, consistent with analyses using other rooting methods. Overall, our results demonstrate the utility of non-reversible and non-stationary models for rooting phylogenetic trees, and identify areas where further progress can be made.  相似文献   

6.

Background  

We compared two methods of rooting a phylogenetic tree: the stationary and the nonstationary substitution processes. These methods do not require an outgroup.  相似文献   

7.
Liang LJ  Weiss RE 《Biometrics》2007,63(3):733-741
Phylogenetic modeling is computationally challenging and most phylogeny models fit a single phylogeny to a single set of molecular sequences. Individual phylogenetic analyses are typically performed independently using publicly available software that fits a computationally intensive Bayesian model using Markov chain Monte Carlo (MCMC) simulation. We develop a Bayesian hierarchical semiparametric regression model to combine multiple phylogenetic analyses of HIV-1 nucleotide sequences and estimate parameters of interest within and across analyses. We use a mixture of Dirichlet processes as a prior for the parameters to relax inappropriate parametric assumptions and to ensure the prior distribution for the parameters is continuous. We use several reweighting algorithms for combining completed MCMC analyses to shrink parameter estimates while adjusting for data set-specific covariates. This avoids constructing a large complex model involving all the original data, which would be computationally challenging and would require rewriting the existing stand-alone software.  相似文献   

8.
Most models and algorithms developed to perform statistical inference from DNA data make the assumption that substitution processes affecting distinct nucleotide sites are stochastically independent. This assumption ensures both mathematical and computational tractability but is in disagreement with observed data in many situations--one well-known example being CpG dinucleotide hypermutability in mammalian genomes. In this paper, we consider the class of RN95 + YpR substitution models, which allows neighbor-dependent effects--including CpG hypermutability--to be taken into account, through transitions between pyrimidine-purine dinucleotides. We show that it is possible to adapt inference methods originally developed under the assumption of independence between sites to RN95 + YpR models, using a mathematically rigorous framework provided by specific structural properties of this class of models. We assess how efficient this approach is at inferring the CpG hypermutability rate from aligned DNA sequences. The method is tested on simulated data and compared against several alternatives; the results suggest that it delivers a high degree of accuracy at a low computational cost. We then apply our method to an alignment of 10 DNA sequences from primate species. Model comparisons within the RN95 + YpR class show the importance of taking into account neighbor-dependent effects. An application of the method to the detection of hypomethylated islands is discussed.  相似文献   

9.
Using real sequence data, we evaluate the adequacy of assumptions made in evolutionary models of nucleotide substitution and the effects that these assumptions have on estimation of evolutionary trees. Two aspects of the assumptions are evaluated. The first concerns the pattern of nucleotide substitution, including equilibrium base frequencies and the transition/transversion-rate ratio. The second concerns the variation of substitution rates over sites. The maximum-likelihood estimate of tree topology appears quite robust to both these aspects of the assumptions of the models, but evaluation of the reliability of the estimated tree by using simpler, less realistic models can be misleading. Branch lengths are underestimated when simpler models of substitution are used, but the underestimation caused by ignoring rate variation over nucleotide sites is much more serious. The goodness of fit of a model is reduced by ignoring spatial rate variation, but unrealistic assumptions about the pattern of nucleotide substitution can lead to an extraordinary reduction in the likelihood. It seems that evolutionary biologists can obtain accurate estimates of certain evolutionary parameters even with an incorrect phylogeny, while systematists cannot get the right tree with confidence even when a realistic, and more complex, model of evolution is assumed.   相似文献   

10.
Molecular evolutionary rates can show significant variation among lineages, complicating the task of estimating substitution rates and divergence times using phylogenetic methods. Accordingly, relaxed molecular clock models have been developed to accommodate such rate heterogeneity, but these often make the assumption of rate autocorrelation among lineages. In this paper, I examine the validity of this assumption.  相似文献   

11.
Although phylogenetic inference of protein-coding sequences continues to dominate the literature, few analyses incorporate evolutionary models that consider the genetic code. This problem is exacerbated by the exclusion of codon-based models from commonly employed model selection techniques, presumably due to the computational cost associated with codon models. We investigated an efficient alternative to standard nucleotide substitution models, in which codon position (CP) is incorporated into the model. We determined the most appropriate model for alignments of 177 RNA virus genes and 106 yeast genes, using 11 substitution models including one codon model and four CP models. The majority of analyzed gene alignments are best described by CP substitution models, rather than by standard nucleotide models, and without the computational cost of full codon models. These results have significant implications for phylogenetic inference of coding sequences as they make it clear that substitution models incorporating CPs not only are a computationally realistic alternative to standard models but may also frequently be statistically superior.  相似文献   

12.
The blind use of models of nucleotide substitution in evolutionary analyses is a common practice in the viral community. Typically, a simple model of evolution like the Kimura two-parameter model is used for estimating genetic distances and phylogenies, either because other authors have used it or because it is the default in various phylogenetic packages. Using two statistical approaches to model fitting, hierarchical likelihood ratio tests and the Akaike information criterion, we show that different viral data sets are better explained by different models of evolution. We demonstrate our results with the analysis of HIV-1 sequences from a hierarchy of samples; sequences within individuals, individuals within subtypes, and subtypes within groups. We also examine results for three different gene regions: gag, pol, and env. The Kimura two-parameter model was not selected as the best-fit model for any of these data sets, despite its widespread use in phylogenetic analyses of HIV-1 sequences. Furthermore, the model complexity increased with increasing sequence divergence. Finally, the molecular-clock hypothesis was rejected in most of the data sets analyzed, throwing into question clock-based estimates of divergence times for HIV-1. The importance of models in evolutionary analyses and their repercussions on the derived conclusions are discussed.  相似文献   

13.
The subtyping of 350 isolates of HIV-1, isolated on the territories of 38 subjects of the Russian Federation, was carried out. The analysis was made by the method of the comparative heteroduplex mobility assay, as well as by the determination of the sequence of genes env [correction of ens] (gp 120) and gag (p17-p24). The study revealed that more than 50% of all cases of HIV-1 infection were caused by closely related variants of subtype A virus. The number of cases of HIV-1 infection caused by recombinant virus A/B was not less than 25%. The total number of cases caused by viruses of subtypes C, D, E, F and H was not more than 5%.  相似文献   

14.
Khoja S  Ojwang P  Khan S  Okinda N  Harania R  Ali S 《PloS one》2008,3(9):e3191

Background

Genetic analysis of a viral infection helps in following its spread in a given population, in tracking the routes of infection and, where applicable, in vaccine design. Additionally, sequence analysis of the viral genome provides information about patterns of genetic divergence that may have occurred during viral evolution.

Objective

In this study we have analyzed the subtypes of Human Immunodeficiency Virus -1 (HIV-1) circulating in a diverse sample population of Nairobi, Kenya.

Methodology

69 blood samples were collected from a diverse subject population attending the Aga Khan University Hospital in Nairobi, Kenya. Total DNA was extracted from peripheral blood mononuclear cells (PBMCs), and used in a Polymerase Chain Reaction (PCR) to amplify the HIV gag gene. The PCR amplimers were partially sequenced, and alignment and phylogenetic analysis of these sequences was performed using the Los Alamos HIV Database.

Results

Blood samples from 69 HIV-1 infected subjects from varying ethnic backgrounds were analyzed. Sequence alignment and phylogenetic analysis showed 39 isolates to be subtype A, 13 subtype D, 7 subtype C, 3 subtype AD and CRF01_AE, 2 subtype G and 1 subtype AC and 1 AG. Deeper phylogenetic analysis revealed HIV subtype A sequences to be highly divergent as compared to subtypes D and C.

Conclusion

Our analysis indicates that HIV-1 subtypes in the Nairobi province of Kenya are dominated by a genetically diverse clade A. Additionally, the prevalence of highly divergent, complex subtypes, intersubtypes, and the recombinant forms indicates viral mixing in Kenyan population, possibly as a result of dual infections.  相似文献   

15.
La D  Kihara D 《Proteins》2012,80(1):126-141
Protein-protein binding events mediate many critical biological functions in the cell. Typically, functionally important sites in proteins can be well identified by considering sequence conservation. However, protein-protein interaction sites exhibit higher sequence variation than other functional regions, such as catalytic sites of enzymes. Consequently, the mutational behavior leading to weak sequence conservation poses significant challenges to the protein-protein interaction site prediction. Here, we present a phylogenetic framework to capture critical sequence variations that favor the selection of residues essential for protein-protein binding. Through the comprehensive analysis of diverse protein families, we show that protein binding interfaces exhibit distinct amino acid substitution as compared with other surface residues. On the basis of this analysis, we have developed a novel method, BindML, which utilizes the substitution models to predict protein-protein binding sites of protein with unknown interacting partners. BindML estimates the likelihood that a phylogenetic tree of a local surface region in a query protein structure follows the substitution patterns of protein binding interface and nonbinding surfaces. BindML is shown to perform well compared to alternative methods for protein binding interface prediction. The methodology developed in this study is very versatile in the sense that it can be generally applied for predicting other types of functional sites, such as DNA, RNA, and membrane binding sites in proteins.  相似文献   

16.
目的 研究上海地区人类免疫缺陷病毒1型(HIV-1)感染/艾滋病(AIDS)患者中HIV-1耐药株出现的情况及亚型分布。方法 对33例HIV-1感染/AIDS患者的血浆HIV-1分离株,进行抗HIV-1药物(核苷类反转录酶抑制剂、非核苷类反转录酶抑制剂和蛋白酶抑制剂)的基因型耐药检测和亚型分析。结果 33例的HIV-1均未检出对PI的耐药突变;10例高效抗反转录病毒疗法(HAART)治疗失败或抑制病毒复制不完全者中,检出的耐药突变为70%,过渡型耐药突变为20%;23例未经抗HIV-1治疗者中,耐药突变为4.3%,过渡型耐药突变为13%。所有过渡型耐药突变均为T215S。15例经血制品传播的HIV- 1均为B亚型;18例经吸毒和性传播的HIV-1中,B和CRF01-AE亚型分别为39%,和33%,此外,还有C、D、G、K和CRF02-AG亚型。结论 上海地区HIV-1感染/AIDS患者中,HAART治疗失败或复制抑制不完全者HIV-1的NRTI和NNRTI耐药突变率高;吸毒和性传播者的HIV-1中,除主要为B和CRF01-AE亚型外,尚有其他少见的亚型。  相似文献   

17.
18.
19.
To determine the incidence of human immunodeficiency virus type-1 (HIV-1) subtypes in Fukuoka, Japan, viruses from 41 HIV-1 infected individuals were subtyped. Subtyping by V3-loop enzyme-linked immunosorbent assay (ELISA) showed 31 of the 41 subjects as subtype B (MN type), one as subtype A, one as subtype C, and eight untypable. The subject infected with subtype C was identified as a foreigner; the subtype A subject was Japanese. A phylogenetic analysis of nucleic acid sequences from the env C2-V3 region was also conducted. Genetic subtyping was successful for 25 samples: 23 samples were determined as subtype B, one subtype A and one subtype E. One of the individuals infected with subtype B, as well as the subtype A and subtype E subjects, were not Japanese. This study indicated that subtype B (USA and European type) is still dominant among HIV-1 infections in Fukuoka. Further, no Japanese were subtype E positive, which is increasing in the Kanto region. It is notable, however, that subtype A and subtype C infections, which are rare in Japan, were found in Fukuoka, located far from the metropolitan area of Tokyo.  相似文献   

20.
Choice of a substitution model is a crucial step in the maximum likelihood (ML) method of phylogenetic inference, and investigators tend to prefer complex mathematical models to simple ones. However, when complex models with many parameters are used, the extent of noise in statistical inferences increases, and thus complex models may not produce the true topology with a higher probability than simple ones. This problem was studied using computer simulation. When the number of nucleotides used was relatively large (1000 bp), the HKY+Gamma model showed smaller d(T) topological distance between the inferred and the true trees) than the JC and Kimura models. In the cases of shorter sequences (300 bp) simpler model and search algorithm such as JC model and SA+NNI search were found to be as efficient as more complicated searches and models in terms of topological distances, although the topologies obtained under HKY+Gamma model had the highest likelihood values. The performance of relatively simple search algorithm SA+NNI was found to be essentially the same as that of more extensive SA+TBR search under all models studied. Similarly to the conclusions reached by Takahashi and Nei [Mol. Biol. Evol. 17 (2000) 1251], our results indicate that simple models can be as efficient as complex models, and that use of complex models does not necessarily give more reliable trees compared with simple models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号