首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Currently available methods for model selection used in phylogenetic analysis are based on an initial fixed-tree topology. Once a model is picked based on this topology, a rigorous search of the tree space is run under that model to find the maximum-likelihood estimate of the tree (topology and branch lengths) and the maximum-likelihood estimates of the model parameters. In this paper, we propose two extensions to the decision-theoretic (DT) approach that relax the fixed-topology restriction. We also relax the fixed-topology restriction for the Bayesian information criterion (BIC) and the Akaike information criterion (AIC) methods. We compare the performance of the different methods (the relaxed, restricted, and the likelihood-ratio test [LRT]) using simulated data. This comparison is done by evaluating the relative complexity of the models resulting from each method and by comparing the performance of the chosen models in estimating the true tree. We also compare the methods relative to one another by measuring the closeness of the estimated trees corresponding to the different chosen models under these methods. We show that varying the topology does not have a major impact on model choice. We also show that the outcome of the two proposed extensions is identical and is comparable to that of the BIC, Extended-BIC, and DT. Hence, using the simpler methods in choosing a model for analyzing the data is more computationally feasible, with results comparable to the more computationally intensive methods. Another outcome of this study is that earlier conclusions about the DT approach are reinforced. That is, LRT, Extended-AIC, and AIC result in more complicated models that do not contribute to the performance of the phylogenetic inference, yet cause a significant increase in the time required for data analysis.  相似文献   

2.
对模型选择中交叉验证量CV进行改进,得到新的验证模型是否合适的准则RCV,RCV包含了CV的信息,并包含了拟合程度,模型中的待估参数个数和样本容量等等,比起AIC,BIC和CV具有更好的稳定性和分辨功能.  相似文献   

3.
In recent years, likelihood ratio tests (LRTs) based on DNA and protein sequence data have been proposed for testing various evolutionary hypotheses. Because conducting an LRT requires an evolutionary model of nucleotide or amino acid substitution, which is almost always unknown, it becomes important to investigate the robustness of LRTs to violations of assumptions of these evolutionary models. Computer simulation was used to examine performance of LRTs of the molecular clock, transition/transversion bias, and among-site rate variation under different substitution models. The results showed that when correct models are used, LRTs perform quite well even when the DNA sequences are as short as 300 nt. However, LRTs were found to be biased under incorrect models. The extent of bias varies considerably, depending on the hypotheses tested, the substitution models assumed, and the lengths of the sequences used, among other things. A preliminary simulation study also suggests that LRTs based on parametric bootstrapping may be more sensitive to substitution models than are standard LRTs. When an assumed substitution model is grossly wrong and a more realistic model is available, LRTs can often reject the wrong model; thus, the performance of LRTs may be improved by using a more appropriate model. On the other hand, many factors of molecular evolution have not been considered in any substitution models so far built, and the possibility of an influence of this negligence on LRTs is often overlooked. The dependence of LRTs on substitution models calls for caution in interpreting test results and highlights the importance of clarifying the substitution patterns of genes and proteins and building more realistic models.  相似文献   

4.
Phylogenetic analyses frequently rely on models of sequence evolution that detail nucleotide substitution rates, nucleotide frequencies, and site-to-site rate heterogeneity. These models can influence hypothesis testing and can affect the accuracy of phylogenetic inferences. Maximum likelihood methods of simultaneously constructing phylogenetic tree topologies and estimating model parameters are computationally intensive, and are not feasible for sample sizes of 25 or greater using personal computers. Techniques that initially construct a tree topology and then use this non-maximized topology to estimate ML substitution rates, however, can quickly arrive at a model of sequence evolution. The accuracy of this two-step estimation technique was tested using simulated data sets with known model parameters. The results showed that for a star-like topology, as is often seen in human immunodeficiency virus type 1 (HIV-1) subtype B sequences, a random starting topology could produce nucleotide substitution rates that were not statistically different than the true rates. Samples were isolated from 100 HIV-1 subtype B infected individuals from the United States and a 620 nt region of the env gene was sequenced for each sample. The sequence data were used to obtain a substitution model of sequence evolution specific for HIV-1 subtype B env by estimating nucleotide substitution rates and the site-to-site heterogeneity in 100 individuals from the United States. The method of estimating the model should provide users of large data sets with a way to quickly compute a model of sequence evolution, while the nucleotide substitution model we identified should prove useful in the phylogenetic analysis of HIV-1 subtype B env sequences. Received: 4 October 2000 / Accepted: 1 March 2001  相似文献   

5.
Claeskens G  Consentino F 《Biometrics》2008,64(4):1062-1069
SUMMARY: Application of classical model selection methods such as Akaike's information criterion (AIC) becomes problematic when observations are missing. In this article we propose some variations on the AIC, which are applicable to missing covariate problems. The method is directly based on the expectation maximization (EM) algorithm and is readily available for EM-based estimation methods, without much additional computational efforts. The missing data AIC criteria are formally derived and shown to work in a simulation study and by application to data on diabetic retinopathy.  相似文献   

6.
A simple nearly neutral mutation model of protein evolution was studied using computer simulation assuming a constant population size. In this model, a gene consists of a finite number of codons and there is no recombination within a gene. Each codon has two replacement and one silent sites. The fitness of a gene was determined multiplicatively by amino acids specified by codons (the independent multicodon model). Nucleotide diversity at replacement sites decreases as selection becomes stronger. A reduction of nucleotide diversity at silent sites also occurs as selection intensifies but the magnitude of the reduction is not a monotone function of the intensity of selection. The dispersion index is close to one. The average value of Tajima's and Fu and Li's statistics are negative and their absolute values increases as selection intensifies. However, their powers of detecting selection under the present model were not high unless the number of sites is large or mutation rate is high. The MK test was shown to detect intermediate selection fairly well. For comparison, the house-of-cards model was also investigated and its behavior was shown to be more sensitive to changes of population size than that of the independent multicodon model. The relevance of the present model for explaining protein evolution was discussed comparing its prediction and recent DNA data. Received: 24 May 1999 / Accepted: 17 August 1999  相似文献   

7.
RNA viruses and retroviruses fix substitutions approximately 1 million-fold faster than their hosts. This diversification could represent an inevitable drift under purifying selection, the majority of substitutions being phenotypically neutral. The alternative is to suppose that most fixed mutations are beneficial to the virus, allowing it to keep ahead of the host and/or host population. Here, relative sequence diversification of different proteins encoded by viral genomes is found to be linear. The examples encompass a wide variety of retroviruses and RNA viruses. The smoothness of relative divergence spans quasispeciation following clonal infection, to variation among different isolates of the same virus, to viruses from different species or those associated with different diseases, indicating that the majority of fixed mutations likely reflects drift. This held for both mammalian and plant viruses, indicating that adaptive immunity doesn't necessarily shape the relative accumulation of amino acid substitutions. When compared to their hosts RNA viruses evolution appears conservative. Received: 16 November 1999 / Accepted: 10 March 2000  相似文献   

8.
The Path from the RNA World   总被引:1,自引:0,他引:1  
We describe a sequential (step by step) Darwinian model for the evolution of life from the late stages of the RNA world through to the emergence of eukaryotes and prokaryotes. The starting point is our model, derived from current RNA activity, of the RNA world just prior to the advent of genetically-encoded protein synthesis. By focusing on the function of the protoribosome we develop a plausible model for the evolution of a protein-synthesizing ribosome from a high-fidelity RNA polymerase that incorporated triplets of oligonucleotides. With the standard assumption that during the evolution of enzymatic activity, catalysis is transferred from RNA → RNP → protein, the first proteins in the ``breakthrough organism' (the first to have encoded protein synthesis) would be nonspecific chaperone-like proteins rather than catalytic. Moreover, because some RNA molecules that pre-date protein synthesis under this model now occur as introns in some of the very earliest proteins, the model predicts these particular introns are older than the exons surrounding them, the ``introns-first' theory. Many features of the model for the genome organization in the final RNA world ribo-organism are more prevalent in the eukaryotic genome and we suggest that the prokaryotic genome organization (a single, circular genome with one center of replication) was derived from a ``eukaryotic-like' genome organization (a fragmented linear genome with multiple centers of replication). The steps from the proposed ribo-organism RNA genome → eukaryotic-like DNA genome → prokaryotic-like DNA genome are all relatively straightforward, whereas the transition prokaryotic-like genome → eukaryotic-like genome appears impossible under a Darwinian mechanism of evolution, given the assumption of the transition RNA → RNP → protein. A likely molecular mechanism, ``plasmid transfer,' is available for the origin of prokaryotic-type genomes from an eukaryotic-like architecture. Under this model prokaryotes are considered specialized and derived with reduced dependence on ssRNA biochemistry. A functional explanation is that prokaryote ancestors underwent selection for thermophily (high temperature) and/or for rapid reproduction (r selection) at least once in their history. Received: 14 January 1997 / Accepted: 19 May 1997  相似文献   

9.
As methods of molecular phylogeny have become more explicit and more biologically realistic following the pioneering work of Thomas Jukes, they have had to relax their initial assumption that rates of evolution were equal at all sites. Distance matrix and likelihood methods of inferring phylogenies make this assumption; parsimony, when valid, is less limited by it. Nucleotide sequences, including RNA sequences, can show substantial rate variation; protein sequences show rates that vary much more widely. Assuming a prior distribution of rates such as a gamma distribution or lognormal distribution has deservedly been popular, but for likelihood methods it leads to computational difficulties. These can be resolved using hidden Markov model (HMM) methods which approximate the distribution by one with a modest number of discrete rates. Generalized Laguerre quadrature can be used to improve the selection of rates and their probabilities so as to more nearly approach the desired gamma distribution. A model based on population genetics is presented predicting how the rates of evolution might vary from locus to locus. Challenges for the future include allowing rates at a given site to vary along the tree, as in the ``covarion' model, and allowing them to have correlations that reflect three-dimensional structure, rather than position in the coding sequence. Markov chain Monte Carlo likelihood methods may be the only practical way to carry out computations for these models. Received: 8 February 2001 / Accepted: 20 May 2001  相似文献   

10.
Chromosome terminal, complex repeats in the dipteran Chironomus pallidivittatus show rapid concerted evolution during which there is remarkably efficient homogenization of the repeat units within and between chromosome ends. It has been shown previously that gene conversion is likely to be an important component during these changes. The sequence evolution could be a result of different processes—exchanges between repeats in the tandem array as well as information transfer between units in different chromosomes—and is therefore difficult to analyze in detail. In this study the concerted evolution of a region present only once per chromosome, at the junction between the telomeric complex repeats and the subtelomeric DNA was therefore investigated in the two sibling species C. pallidivittatus and C. tentans. Material from individual microdissected chromosome ends was used, as well as clones from bulk genomic DNA. On the telomeric side of the border pronounced species-specific sequence differences were observed, the patterns being similar for clones of different origin within each species. Mutations had been transmitted efficiently between chromosomes also when adjoining, more distally localized DNA showed great differences in sequence, suggesting that gene conversion had taken place. The evolving telomeric region bordered proximally to subtelomeric DNA with high evolutionary constancy. More proximally localized, subtelomeric DNA evolved more rapidly and showed heterogeneity between species and chromosomes. Received: 24 September 1997 / Accepted: 24 November 1997  相似文献   

11.
The origin and diversification of RNA secondary structure were traced using cladistic methods. Structural components were coded as polarized and ordered multi-state characters, following a model of character state transformation outlined by considerations in statistical mechanics. Several classes of functional RNA were analyzed, including ribosomal RNA (rRNA). Considerable phylogenetic signal was present in their secondary structure. The intrinsically rooted phylogenies reconstructed from evolved RNA structure depicted those derived from nucleic acid sequence at all taxonomical levels, and grouped organisms in concordance with traditional classification, especially in the archaeal and eukaryal domains. Natural selection appears therefore to operate early in the information flow that originates in sequence and ends in an adapted phenotype. When examining the hierarchical classification of the living world, phylogenetic analysis of secondary structure of the small and large rRNA subunits reconstructed a universal tree of life that branched in three monophyletic groups corresponding to Eucarya, Archaea, and Bacteria, and was rooted in the eukaryotic branch. Ribosomal characters involved in the translational cycle could be easily traced and showed that transfer RNA (tRNA) binding domains in the large rRNA subunit evolved concurrently with the rest of the rRNA molecule. Results suggest it is equally parsimonious to consider that ancestral unicellular eukaryotes or prokaryotes gave rise to all extant life forms and provide a rare insight into the early evolution of nucleic acid and protein biosynthesis. Received: 13 September 2000 / Accepted: 27 August 2001  相似文献   

12.
A higher rate of molecular evolution in rodents than in primates at synonymous sites and, to a lesser extent, at amino acid replacement sites has been reported previously for most nuclear genes examined. Thus in these genes the average ratio of amino acid replacement to synonymous substitution rates in rodents is lower than in primates, an observation at odds with the neutral model of molecular evolution. Under Ohta's mildly deleterious model of molecular evolution, these observations are seen as the consequence of the combined effects of a shorter generation time (driving a higher mutation rate) and a larger effective population size (resulting in more effective selection against mildly deleterious mutations) in rodents. The present study reports the results of a maximum-likelihood analysis of the ratio of amino acid replacements to synonymous substitutions for genes encoded in mitochondrial DNA (mtDNA) in these two lineages. A similar pattern is observed: in rodents this ratio is significantly lower than in primates, again consistent only with the mildly deleterious model. Interestingly the lineage-specific difference is much more pronounced in mtDNA-encoded than in nuclear-encoded proteins, an observation which is shown to run counter to expectation under Ohta's model. Finally, accepting certain fossil divergence dates, the lineage-specific difference in amino acid replacement-to-synonymous substitution ratio in mtDNA can be partitioned and is found to be entirely the consequence of a higher mutation rate in rodents. This conclusion is consistent with a replication-dependent model of mutation in mtDNA. Received: 24 September 1999 / Accepted: 18 September 2000  相似文献   

13.
Sequence differences in the tRNA-proline (tRNApro) end of the mitochondrial control-region of three species of Pacific butterflyfishes accumulated 33–43 times more rapidly than did changes within the mitochondrial cytochrome b gene (cytb). Rapid evolution in this region was accompanied by strong transition/transversion bias and large variation in the probability of a DNA substitution among sites. These substitution constraints placed an absolute ceiling on the magnitude of sequence divergence that could be detected between individuals. This divergence ``ceiling' was reached rapidly and led to a decay in the relative rate of control-region/cytb b evolution. A high rate of evolution in this section of the control-region of butterflyfishes stands in marked contrast to the patterns reported in some other fish lineages. Although the mechanism underlying rate variation remains unclear, all taxa with rapid evolution in the 5′-end of the control-region showed extreme transition biases. By contrast, in taxa with slower control-region evolution, transitions accumulated at nearly the same rate as transversions. More information is needed to understand the relationship between nucleotide bias and the rate of evolution in the 5′-end of the control-region. Despite strong constraints on sequence change, phylogenetic information was preserved in the group of recently differentiated species and supported the clustering of sequences into three major mtDNA groupings. Within these groups, very similar control-region sequences were widely distributed across the Pacific Ocean and were shared between recognized species, indicating a lack of mitochondrial sequence monophyly among species. Received: 30 June 1996 / Accepted: 15 May 1997  相似文献   

14.
Membrane proteins move in heterogeneous environments with spatially (sometimes temporally) varying friction and with biochemical interactions with various partners. It is important to reliably distinguish different modes of motion to improve our knowledge of the membrane architecture and to understand the nature of interactions between membrane proteins and their environments. Here, we present an analysis technique for single molecule tracking (SMT) trajectories that can determine the preferred model of motion that best matches observed trajectories. The method is based on Bayesian inference to calculate the posteriori probability of an observed trajectory according to a certain model. Information theory criteria, such as the Bayesian information criterion (BIC), the Akaike information criterion (AIC), and modified AIC (AICc), are used to select the preferred model. The considered group of models includes free Brownian motion, and confined motion in 2nd or 4th order potentials. We determine the best information criteria for classifying trajectories. We tested its limits through simulations matching large sets of experimental conditions and we built a decision tree. This decision tree first uses the BIC to distinguish between free Brownian motion and confined motion. In a second step, it classifies the confining potential further using the AIC. We apply the method to experimental Clostridium Perfingens -toxin (CPT) receptor trajectories to show that these receptors are confined by a spring-like potential. An adaptation of this technique was applied on a sliding window in the temporal dimension along the trajectory. We applied this adaptation to experimental CPT trajectories that lose confinement due to disaggregation of confining domains. This new technique adds another dimension to the discussion of SMT data. The mode of motion of a receptor might hold more biologically relevant information than the diffusion coefficient or domain size and may be a better tool to classify and compare different SMT experiments.  相似文献   

15.
The relationship between the silent substitution rate (K s) and the GC content along the genome is a focal point of the debate about the origin of the isochore structure in vertebrates. Recent estimation of the silent substitution rate showed a positive correlation between K s and GC content, in contradiction with the predictions of both the regional mutation bias model and the selection or biased gene conversion model. The aim of this paper is to help resolve this contradiction between theoretical studies and data. We analyzed the relationship between K s and GC content under (1) uniform mutation bias, (2) a regional mutation bias, and (3) mutation bias and selection. We report that an increase in K s with GC content is expected under mutation bias because of either nonequilibrium of the isochore structure or an increasing mutation rate from AT toward GC nucleotides in GC-richer isochores. We show by simulations that CpG deamination tends to increase the mutation rate with GC content in a regional mutation bias model. We also demonstrate that the relationship between K s and GC under the selectionist or biased gene conversion model is positive under weak selection if the mutation selection equilibrium GC frequency is less than 0.5. Received: 28 March 2001 / Accepted: 16 May 2001  相似文献   

16.
17.
How did the ``universal' genetic code arise? Several hypotheses have been put forward, and the code has been analyzed extensively by authors looking for clues to selection pressures that might have acted during its evolution. But this approach has been ineffective. Although an impressive number of properties has been attributed to the universal code, it has been impossible to determine whether selection on any of these properties was important in the code's evolution or whether the observed properties arose as a consequence of selection on some other characteristic. Therefore we turned the question around and asked, what would a genetic code look like if it had evolved in response to various different selection pressures? To address this question, we constructed a genetic algorithm. We found first that selecting on a particular measure yields codes that are similar to each other. Second, we found that the universal code is far from minimized with respect to the effects of mutations (or translation errors) on the amino acid compositions of proteins. Finally, we found that the codes that most closely resembled real codes were those generated by selecting on aspects of the code's structure, not those generated by selecting to minimize the effects of amino acid substitutions on proteins. This suggests that the universal genetic code has been selected for a particular structure—a structure that confers an important flexibility on the evolution of genes and proteins—and that the particular assignments of amino acids to codons are secondary. Received: 29 December 1998 / Accepted: 8 July 1999  相似文献   

18.
The recent availability of genomic sequence information for the class I region of the MHC has provided an opportunity to examine the genomic organization of HLA class I (HLAcI) and PERB11/MIC genes with a view to explaining their evolution from the perspective of extended genomic duplications rather than by simple gene duplications and/or gene conversion events. Analysis of genomic sequence from two regions of the MHC (the alpha- and beta-blocks) revealed that at least 6 PERB11 and 14 HLAcI genes, pseudogenes, and gene fragments are contained within extended duplicated segments. Each segment was searched for the presence of shared (paralogous) retroelements by RepeatMasker in order to use them as markers of evolution, genetic rearrangements, and evidence of segmental duplications. Shared Alu elements and other retroelements allowed the duplicated segments to be classified into five distinct groups (A to E) that could be further distilled down to an ancient preduplication segment containing a HLA and PERB11 gene, an endogenous retrovirus (HERV-16), and distinctive retroelements. The breakpoints within and between the different HLAcI segments were found mainly within the PERB11 and HLA genes, HERV-16, and other retroelements, suggesting that the latter have played a major role in duplication and indel events leading to the present organization of PERB11 and HLAcI genes. On the basis of the features contained within the segments, a coevolutionary model premised on tandem duplication of single and multipartite genomic segments is proposed. The model is used to explain the origins and genomic organization of retroelements, HERV-16, DNA transposons, PERB11, and HLAcI genes as distinct segmental combinations within the alpha- and beta-blocks of the human MHC. Received: 5 December 1998 / Accepted: 27 January 1999  相似文献   

19.
Pervasive adaptive evolution in mammalian fertilization proteins   总被引:1,自引:0,他引:1  
Mammalian fertilization exhibits species specificity, and the proteins mediating sperm-egg interactions evolve rapidly between species. In this study, we demonstrate that the evolution of seven genes involved in mammalian fertilization is promoted by positive Darwinian selection by using likelihood ratio tests (LRTs). Several of these proteins are sperm proteins that have been implicated in binding the mammalian egg coat zona pellucida glycoproteins, which were shown previously to be subjected to positive selection. Taken together, these represent the major candidates involved in mammalian fertilization, indicating positive selection is pervasive amongst mammalian reproductive proteins. A new LRT is implemented to determine if the d(N)/d(S) ratio is significantly greater than one. This is a more refined test of positive selection than the previous LRTs which only identified if there was a class of sites with a d(N)/d(S) ratio >1 but did not test if that ratio was significantly greater than one.  相似文献   

20.
The information provided by completely sequenced genomes can yield insights into the multi-level organization of organisms and their evolution. At the lowest level of molecular organization individual enzymes are formed, often through assembly of multiple polypeptides. At a higher level, sets of enzymes group into metabolic networks. Much has been learned about the relationship of species from phylogenetic trees comparing individual enzymes. In this article we extend conventional phylogenetic analysis of individual enzymes in different organisms to the organisms' metabolic networks. For this purpose we suggest a method that combines sequence information with information about the underlying reaction networks. A distance between pathways is defined as incorporating distances between substrates and distances between corresponding enzymes. The new analysis is applied to electron-transfer and amino acid biosynthesis networks yielding a more comprehensive understanding of similarities and differences between organisms. Received: 14 August 2000 / Accepted: 4 January 2001  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号