共查询到20条相似文献,搜索用时 350 毫秒
1.
Currently available methods for model selection used in phylogenetic analysis are based on an initial fixed-tree topology. Once a model is picked based on this topology, a rigorous search of the tree space is run under that model to find the maximum-likelihood estimate of the tree (topology and branch lengths) and the maximum-likelihood estimates of the model parameters. In this paper, we propose two extensions to the decision-theoretic (DT) approach that relax the fixed-topology restriction. We also relax the fixed-topology restriction for the Bayesian information criterion (BIC) and the Akaike information criterion (AIC) methods. We compare the performance of the different methods (the relaxed, restricted, and the likelihood-ratio test [LRT]) using simulated data. This comparison is done by evaluating the relative complexity of the models resulting from each method and by comparing the performance of the chosen models in estimating the true tree. We also compare the methods relative to one another by measuring the closeness of the estimated trees corresponding to the different chosen models under these methods. We show that varying the topology does not have a major impact on model choice. We also show that the outcome of the two proposed extensions is identical and is comparable to that of the BIC, Extended-BIC, and DT. Hence, using the simpler methods in choosing a model for analyzing the data is more computationally feasible, with results comparable to the more computationally intensive methods. Another outcome of this study is that earlier conclusions about the DT approach are reinforced. That is, LRT, Extended-AIC, and AIC result in more complicated models that do not contribute to the performance of the phylogenetic inference, yet cause a significant increase in the time required for data analysis. 相似文献
2.
对模型选择中交叉验证量CV进行改进,得到新的验证模型是否合适的准则RCV,RCV包含了CV的信息,并包含了拟合程度,模型中的待估参数个数和样本容量等等,比起AIC,BIC和CV具有更好的稳定性和分辨功能. 相似文献
3.
J Zhang 《Molecular biology and evolution》1999,16(6):868-875
In recent years, likelihood ratio tests (LRTs) based on DNA and protein sequence data have been proposed for testing various evolutionary hypotheses. Because conducting an LRT requires an evolutionary model of nucleotide or amino acid substitution, which is almost always unknown, it becomes important to investigate the robustness of LRTs to violations of assumptions of these evolutionary models. Computer simulation was used to examine performance of LRTs of the molecular clock, transition/transversion bias, and among-site rate variation under different substitution models. The results showed that when correct models are used, LRTs perform quite well even when the DNA sequences are as short as 300 nt. However, LRTs were found to be biased under incorrect models. The extent of bias varies considerably, depending on the hypotheses tested, the substitution models assumed, and the lengths of the sequences used, among other things. A preliminary simulation study also suggests that LRTs based on parametric bootstrapping may be more sensitive to substitution models than are standard LRTs. When an assumed substitution model is grossly wrong and a more realistic model is available, LRTs can often reject the wrong model; thus, the performance of LRTs may be improved by using a more appropriate model. On the other hand, many factors of molecular evolution have not been considered in any substitution models so far built, and the possibility of an influence of this negligence on LRTs is often overlooked. The dependence of LRTs on substitution models calls for caution in interpreting test results and highlights the importance of clarifying the substitution patterns of genes and proteins and building more realistic models. 相似文献
4.
Jon P. Anderson Allen G. Rodrigo Gerald H. Learn Yang Wang Hillard Weinstock Marcia L. Kalish Kenneth E. Robbins Leroy Hood James I. Mullins 《Journal of molecular evolution》2001,53(1):55-62
Phylogenetic analyses frequently rely on models of sequence evolution that detail nucleotide substitution rates, nucleotide
frequencies, and site-to-site rate heterogeneity. These models can influence hypothesis testing and can affect the accuracy
of phylogenetic inferences. Maximum likelihood methods of simultaneously constructing phylogenetic tree topologies and estimating
model parameters are computationally intensive, and are not feasible for sample sizes of 25 or greater using personal computers.
Techniques that initially construct a tree topology and then use this non-maximized topology to estimate ML substitution rates,
however, can quickly arrive at a model of sequence evolution. The accuracy of this two-step estimation technique was tested
using simulated data sets with known model parameters. The results showed that for a star-like topology, as is often seen
in human immunodeficiency virus type 1 (HIV-1) subtype B sequences, a random starting topology could produce nucleotide substitution
rates that were not statistically different than the true rates. Samples were isolated from 100 HIV-1 subtype B infected individuals
from the United States and a 620 nt region of the env gene was sequenced for each sample. The sequence data were used to obtain a substitution model of sequence evolution specific
for HIV-1 subtype B env by estimating nucleotide substitution rates and the site-to-site heterogeneity in 100 individuals from the United States.
The method of estimating the model should provide users of large data sets with a way to quickly compute a model of sequence
evolution, while the nucleotide substitution model we identified should prove useful in the phylogenetic analysis of HIV-1
subtype B env sequences.
Received: 4 October 2000 / Accepted: 1 March 2001 相似文献
5.
Tachida H 《Journal of molecular evolution》2000,50(1):69-81
A simple nearly neutral mutation model of protein evolution was studied using computer simulation assuming a constant population
size. In this model, a gene consists of a finite number of codons and there is no recombination within a gene. Each codon
has two replacement and one silent sites. The fitness of a gene was determined multiplicatively by amino acids specified by
codons (the independent multicodon model). Nucleotide diversity at replacement sites decreases as selection becomes stronger.
A reduction of nucleotide diversity at silent sites also occurs as selection intensifies but the magnitude of the reduction
is not a monotone function of the intensity of selection. The dispersion index is close to one. The average value of Tajima's
and Fu and Li's statistics are negative and their absolute values increases as selection intensifies. However, their powers
of detecting selection under the present model were not high unless the number of sites is large or mutation rate is high.
The MK test was shown to detect intermediate selection fairly well. For comparison, the house-of-cards model was also investigated
and its behavior was shown to be more sensitive to changes of population size than that of the independent multicodon model.
The relevance of the present model for explaining protein evolution was discussed comparing its prediction and recent DNA
data.
Received: 24 May 1999 / Accepted: 17 August 1999 相似文献
6.
SUMMARY: Application of classical model selection methods such as Akaike's information criterion (AIC) becomes problematic when observations are missing. In this article we propose some variations on the AIC, which are applicable to missing covariate problems. The method is directly based on the expectation maximization (EM) algorithm and is readily available for EM-based estimation methods, without much additional computational efforts. The missing data AIC criteria are formally derived and shown to work in a simulation study and by application to data on diabetic retinopathy. 相似文献
7.
RNA viruses and retroviruses fix substitutions approximately 1 million-fold faster than their hosts. This diversification
could represent an inevitable drift under purifying selection, the majority of substitutions being phenotypically neutral.
The alternative is to suppose that most fixed mutations are beneficial to the virus, allowing it to keep ahead of the host
and/or host population. Here, relative sequence diversification of different proteins encoded by viral genomes is found to
be linear. The examples encompass a wide variety of retroviruses and RNA viruses. The smoothness of relative divergence spans
quasispeciation following clonal infection, to variation among different isolates of the same virus, to viruses from different
species or those associated with different diseases, indicating that the majority of fixed mutations likely reflects drift.
This held for both mammalian and plant viruses, indicating that adaptive immunity doesn't necessarily shape the relative accumulation
of amino acid substitutions. When compared to their hosts RNA viruses evolution appears conservative.
Received: 16 November 1999 / Accepted: 10 March 2000 相似文献
8.
The Path from the RNA World 总被引:1,自引:0,他引:1
We describe a sequential (step by step) Darwinian model for the evolution of life from the late stages of the RNA world through
to the emergence of eukaryotes and prokaryotes. The starting point is our model, derived from current RNA activity, of the
RNA world just prior to the advent of genetically-encoded protein synthesis. By focusing on the function of the protoribosome
we develop a plausible model for the evolution of a protein-synthesizing ribosome from a high-fidelity RNA polymerase that
incorporated triplets of oligonucleotides. With the standard assumption that during the evolution of enzymatic activity, catalysis
is transferred from RNA → RNP → protein, the first proteins in the ``breakthrough organism' (the first to have encoded protein
synthesis) would be nonspecific chaperone-like proteins rather than catalytic. Moreover, because some RNA molecules that pre-date
protein synthesis under this model now occur as introns in some of the very earliest proteins, the model predicts these particular
introns are older than the exons surrounding them, the ``introns-first' theory. Many features of the model for the genome
organization in the final RNA world ribo-organism are more prevalent in the eukaryotic genome and we suggest that the prokaryotic
genome organization (a single, circular genome with one center of replication) was derived from a ``eukaryotic-like' genome
organization (a fragmented linear genome with multiple centers of replication). The steps from the proposed ribo-organism
RNA genome → eukaryotic-like DNA genome → prokaryotic-like DNA genome are all relatively straightforward, whereas the transition
prokaryotic-like genome → eukaryotic-like genome appears impossible under a Darwinian mechanism of evolution, given the assumption
of the transition RNA → RNP → protein. A likely molecular mechanism, ``plasmid transfer,' is available for the origin of
prokaryotic-type genomes from an eukaryotic-like architecture. Under this model prokaryotes are considered specialized and
derived with reduced dependence on ssRNA biochemistry. A functional explanation is that prokaryote ancestors underwent selection
for thermophily (high temperature) and/or for rapid reproduction (r selection) at least once in their history.
Received: 14 January 1997 / Accepted: 19 May 1997 相似文献
9.
Joseph Felsenstein 《Journal of molecular evolution》2001,53(4-5):447-455
As methods of molecular phylogeny have become more explicit and more biologically realistic following the pioneering work
of Thomas Jukes, they have had to relax their initial assumption that rates of evolution were equal at all sites. Distance
matrix and likelihood methods of inferring phylogenies make this assumption; parsimony, when valid, is less limited by it.
Nucleotide sequences, including RNA sequences, can show substantial rate variation; protein sequences show rates that vary
much more widely. Assuming a prior distribution of rates such as a gamma distribution or lognormal distribution has deservedly
been popular, but for likelihood methods it leads to computational difficulties. These can be resolved using hidden Markov
model (HMM) methods which approximate the distribution by one with a modest number of discrete rates. Generalized Laguerre
quadrature can be used to improve the selection of rates and their probabilities so as to more nearly approach the desired
gamma distribution. A model based on population genetics is presented predicting how the rates of evolution might vary from
locus to locus. Challenges for the future include allowing rates at a given site to vary along the tree, as in the ``covarion'
model, and allowing them to have correlations that reflect three-dimensional structure, rather than position in the coding
sequence. Markov chain Monte Carlo likelihood methods may be the only practical way to carry out computations for these models.
Received: 8 February 2001 / Accepted: 20 May 2001 相似文献
10.
Weinreich DM 《Journal of molecular evolution》2001,52(1):40-50
A higher rate of molecular evolution in rodents than in primates at synonymous sites and, to a lesser extent, at amino acid
replacement sites has been reported previously for most nuclear genes examined. Thus in these genes the average ratio of amino
acid replacement to synonymous substitution rates in rodents is lower than in primates, an observation at odds with the neutral
model of molecular evolution. Under Ohta's mildly deleterious model of molecular evolution, these observations are seen as
the consequence of the combined effects of a shorter generation time (driving a higher mutation rate) and a larger effective
population size (resulting in more effective selection against mildly deleterious mutations) in rodents. The present study
reports the results of a maximum-likelihood analysis of the ratio of amino acid replacements to synonymous substitutions for
genes encoded in mitochondrial DNA (mtDNA) in these two lineages. A similar pattern is observed: in rodents this ratio is
significantly lower than in primates, again consistent only with the mildly deleterious model. Interestingly the lineage-specific
difference is much more pronounced in mtDNA-encoded than in nuclear-encoded proteins, an observation which is shown to run
counter to expectation under Ohta's model. Finally, accepting certain fossil divergence dates, the lineage-specific difference
in amino acid replacement-to-synonymous substitution ratio in mtDNA can be partitioned and is found to be entirely the consequence
of a higher mutation rate in rodents. This conclusion is consistent with a replication-dependent model of mutation in mtDNA.
Received: 24 September 1999 / Accepted: 18 September 2000 相似文献
11.
Chromosome terminal, complex repeats in the dipteran Chironomus pallidivittatus show rapid concerted evolution during which there is remarkably efficient homogenization of the repeat units within and between
chromosome ends. It has been shown previously that gene conversion is likely to be an important component during these changes.
The sequence evolution could be a result of different processes—exchanges between repeats in the tandem array as well as information
transfer between units in different chromosomes—and is therefore difficult to analyze in detail. In this study the concerted
evolution of a region present only once per chromosome, at the junction between the telomeric complex repeats and the subtelomeric
DNA was therefore investigated in the two sibling species C. pallidivittatus and C. tentans. Material from individual microdissected chromosome ends was used, as well as clones from bulk genomic DNA. On the telomeric
side of the border pronounced species-specific sequence differences were observed, the patterns being similar for clones of
different origin within each species. Mutations had been transmitted efficiently between chromosomes also when adjoining,
more distally localized DNA showed great differences in sequence, suggesting that gene conversion had taken place. The evolving
telomeric region bordered proximally to subtelomeric DNA with high evolutionary constancy. More proximally localized, subtelomeric
DNA evolved more rapidly and showed heterogeneity between species and chromosomes.
Received: 24 September 1997 / Accepted: 24 November 1997 相似文献
12.
Caetano-Anollés G 《Journal of molecular evolution》2002,54(3):333-345
The origin and diversification of RNA secondary structure were traced using cladistic methods. Structural components were
coded as polarized and ordered multi-state characters, following a model of character state transformation outlined by considerations
in statistical mechanics. Several classes of functional RNA were analyzed, including ribosomal RNA (rRNA). Considerable phylogenetic
signal was present in their secondary structure. The intrinsically rooted phylogenies reconstructed from evolved RNA structure
depicted those derived from nucleic acid sequence at all taxonomical levels, and grouped organisms in concordance with traditional
classification, especially in the archaeal and eukaryal domains. Natural selection appears therefore to operate early in the
information flow that originates in sequence and ends in an adapted phenotype. When examining the hierarchical classification
of the living world, phylogenetic analysis of secondary structure of the small and large rRNA subunits reconstructed a universal
tree of life that branched in three monophyletic groups corresponding to Eucarya, Archaea, and Bacteria, and was rooted in
the eukaryotic branch. Ribosomal characters involved in the translational cycle could be easily traced and showed that transfer
RNA (tRNA) binding domains in the large rRNA subunit evolved concurrently with the rest of the rRNA molecule. Results suggest
it is equally parsimonious to consider that ancestral unicellular eukaryotes or prokaryotes gave rise to all extant life forms
and provide a rare insight into the early evolution of nucleic acid and protein biosynthesis.
Received: 13 September 2000 / Accepted: 27 August 2001 相似文献
13.
Sequence differences in the tRNA-proline (tRNApro) end of the mitochondrial control-region of three species of Pacific butterflyfishes accumulated 33–43 times more rapidly
than did changes within the mitochondrial cytochrome b gene (cytb). Rapid evolution in this region was accompanied by strong
transition/transversion bias and large variation in the probability of a DNA substitution among sites. These substitution
constraints placed an absolute ceiling on the magnitude of sequence divergence that could be detected between individuals.
This divergence ``ceiling' was reached rapidly and led to a decay in the relative rate of control-region/cytb b evolution.
A high rate of evolution in this section of the control-region of butterflyfishes stands in marked contrast to the patterns
reported in some other fish lineages. Although the mechanism underlying rate variation remains unclear, all taxa with rapid
evolution in the 5′-end of the control-region showed extreme transition biases. By contrast, in taxa with slower control-region
evolution, transitions accumulated at nearly the same rate as transversions. More information is needed to understand the
relationship between nucleotide bias and the rate of evolution in the 5′-end of the control-region.
Despite strong constraints on sequence change, phylogenetic information was preserved in the group of recently differentiated
species and supported the clustering of sequences into three major mtDNA groupings. Within these groups, very similar control-region
sequences were widely distributed across the Pacific Ocean and were shared between recognized species, indicating a lack of
mitochondrial sequence monophyly among species.
Received: 30 June 1996 / Accepted: 15 May 1997 相似文献
14.
Membrane proteins move in heterogeneous environments with spatially (sometimes temporally) varying friction and with biochemical interactions with various partners. It is important to reliably distinguish different modes of motion to improve our knowledge of the membrane architecture and to understand the nature of interactions between membrane proteins and their environments. Here, we present an analysis technique for single molecule tracking (SMT) trajectories that can determine the preferred model of motion that best matches observed trajectories. The method is based on Bayesian inference to calculate the posteriori probability of an observed trajectory according to a certain model. Information theory criteria, such as the Bayesian information criterion (BIC), the Akaike information criterion (AIC), and modified AIC (AICc), are used to select the preferred model. The considered group of models includes free Brownian motion, and confined motion in 2nd or 4th order potentials. We determine the best information criteria for classifying trajectories. We tested its limits through simulations matching large sets of experimental conditions and we built a decision tree. This decision tree first uses the BIC to distinguish between free Brownian motion and confined motion. In a second step, it classifies the confining potential further using the AIC. We apply the method to experimental Clostridium Perfingens -toxin (CPT) receptor trajectories to show that these receptors are confined by a spring-like potential. An adaptation of this technique was applied on a sliding window in the temporal dimension along the trajectory. We applied this adaptation to experimental CPT trajectories that lose confinement due to disaggregation of confining domains. This new technique adds another dimension to the discussion of SMT data. The mode of motion of a receptor might hold more biologically relevant information than the diffusion coefficient or domain size and may be a better tool to classify and compare different SMT experiments. 相似文献
15.
Gwenaël Piganeau Dominique Mouchiroud Laurent Duret Christian Gautier 《Journal of molecular evolution》2002,54(1):129-133
The relationship between the silent substitution rate (K
s) and the GC content along the genome is a focal point of the debate about the origin of the isochore structure in vertebrates.
Recent estimation of the silent substitution rate showed a positive correlation between K
s and GC content, in contradiction with the predictions of both the regional mutation bias model and the selection or biased
gene conversion model. The aim of this paper is to help resolve this contradiction between theoretical studies and data. We
analyzed the relationship between K
s and GC content under (1) uniform mutation bias, (2) a regional mutation bias, and (3) mutation bias and selection. We report
that an increase in K
s with GC content is expected under mutation bias because of either nonequilibrium of the isochore structure or an increasing
mutation rate from AT toward GC nucleotides in GC-richer isochores. We show by simulations that CpG deamination tends to increase
the mutation rate with GC content in a regional mutation bias model. We also demonstrate that the relationship between K
s and GC under the selectionist or biased gene conversion model is positive under weak selection if the mutation selection
equilibrium GC frequency is less than 0.5.
Received: 28 March 2001 / Accepted: 16 May 2001 相似文献
16.
17.
How did the ``universal' genetic code arise? Several hypotheses have been put forward, and the code has been analyzed extensively
by authors looking for clues to selection pressures that might have acted during its evolution. But this approach has been
ineffective. Although an impressive number of properties has been attributed to the universal code, it has been impossible
to determine whether selection on any of these properties was important in the code's evolution or whether the observed properties
arose as a consequence of selection on some other characteristic. Therefore we turned the question around and asked, what
would a genetic code look like if it had evolved in response to various different selection pressures? To address this question,
we constructed a genetic algorithm. We found first that selecting on a particular measure yields codes that are similar to
each other. Second, we found that the universal code is far from minimized with respect to the effects of mutations (or translation
errors) on the amino acid compositions of proteins. Finally, we found that the codes that most closely resembled real codes
were those generated by selecting on aspects of the code's structure, not those generated by selecting to minimize the effects
of amino acid substitutions on proteins. This suggests that the universal genetic code has been selected for a particular
structure—a structure that confers an important flexibility on the evolution of genes and proteins—and that the particular
assignments of amino acids to codons are secondary.
Received: 29 December 1998 / Accepted: 8 July 1999 相似文献
18.
Jerzy K. Kulski Silvana Gaudieri Annalise Martin Roger L. Dawkins 《Journal of molecular evolution》1999,49(1):84-97
The recent availability of genomic sequence information for the class I region of the MHC has provided an opportunity to
examine the genomic organization of HLA class I (HLAcI) and PERB11/MIC genes with a view to explaining their evolution from
the perspective of extended genomic duplications rather than by simple gene duplications and/or gene conversion events. Analysis
of genomic sequence from two regions of the MHC (the alpha- and beta-blocks) revealed that at least 6 PERB11 and 14 HLAcI
genes, pseudogenes, and gene fragments are contained within extended duplicated segments. Each segment was searched for the
presence of shared (paralogous) retroelements by RepeatMasker in order to use them as markers of evolution, genetic rearrangements,
and evidence of segmental duplications. Shared Alu elements and other retroelements allowed the duplicated segments to be
classified into five distinct groups (A to E) that could be further distilled down to an ancient preduplication segment containing
a HLA and PERB11 gene, an endogenous retrovirus (HERV-16), and distinctive retroelements. The breakpoints within and between
the different HLAcI segments were found mainly within the PERB11 and HLA genes, HERV-16, and other retroelements, suggesting
that the latter have played a major role in duplication and indel events leading to the present organization of PERB11 and
HLAcI genes. On the basis of the features contained within the segments, a coevolutionary model premised on tandem duplication
of single and multipartite genomic segments is proposed. The model is used to explain the origins and genomic organization
of retroelements, HERV-16, DNA transposons, PERB11, and HLAcI genes as distinct segmental combinations within the alpha- and
beta-blocks of the human MHC.
Received: 5 December 1998 / Accepted: 27 January 1999 相似文献
19.
Pervasive adaptive evolution in mammalian fertilization proteins 总被引:1,自引:0,他引:1
Mammalian fertilization exhibits species specificity, and the proteins mediating sperm-egg interactions evolve rapidly between species. In this study, we demonstrate that the evolution of seven genes involved in mammalian fertilization is promoted by positive Darwinian selection by using likelihood ratio tests (LRTs). Several of these proteins are sperm proteins that have been implicated in binding the mammalian egg coat zona pellucida glycoproteins, which were shown previously to be subjected to positive selection. Taken together, these represent the major candidates involved in mammalian fertilization, indicating positive selection is pervasive amongst mammalian reproductive proteins. A new LRT is implemented to determine if the d(N)/d(S) ratio is significantly greater than one. This is a more refined test of positive selection than the previous LRTs which only identified if there was a class of sites with a d(N)/d(S) ratio >1 but did not test if that ratio was significantly greater than one. 相似文献
20.
The information provided by completely sequenced genomes can yield insights into the multi-level organization of organisms
and their evolution. At the lowest level of molecular organization individual enzymes are formed, often through assembly of
multiple polypeptides. At a higher level, sets of enzymes group into metabolic networks. Much has been learned about the relationship
of species from phylogenetic trees comparing individual enzymes. In this article we extend conventional phylogenetic analysis
of individual enzymes in different organisms to the organisms' metabolic networks. For this purpose we suggest a method that
combines sequence information with information about the underlying reaction networks. A distance between pathways is defined
as incorporating distances between substrates and distances between corresponding enzymes. The new analysis is applied to
electron-transfer and amino acid biosynthesis networks yielding a more comprehensive understanding of similarities and differences
between organisms.
Received: 14 August 2000 / Accepted: 4 January 2001 相似文献