首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Estimation of evolutionary distances between nucleotide sequences   总被引:11,自引:0,他引:11  
A formal mathematical analysis of the substitution process in nucleotide sequence evolution was done in terms of the Markov process. By using matrix algebra theory, the theoretical foundation of Barry and Hartigan's (Stat. Sci. 2:191–210, 1987) and Lanave et al.'s (J. Mol. Evol. 20:86–93, 1984) methods was provided. Extensive computer simulation was used to compare the accuracy and effectiveness of various methods for estimating the evolutionary distance between two nucleotide sequences. It was shown that the multiparameter methods of Lanave et al.'s (J. Mol. Evol. 20:86–93, 1984), Gojobori et al.'s (J. Mol. Evol. 18:414–422, 1982), and Barry and Hartigan's (Stat. Sci. 2:191–210, 1987) are preferable to others for the purpose of phylogenetic analysis when the sequences are long. However, when sequences are short and the evolutionary distance is large, Tajima and Nei's (Mol. Biol. Evol. 1:269–285, 1984) method is superior to others.  相似文献   

3.
New equations are derived to estimate the number of amino acid substitutions per site between two homologous proteins from the root mean square (RMS) deviation between two spatial structures and from the fraction of identical residues between two sequences. The equations are based on evolutionary models, analyzing predominantly structural changes and not sequence changes. Evolution of spatial structure is treated as a diffusion in an elastic force field. Diffusion accounts for structural changes caused by amino acid substitutions, and elastic force reflects selection, which preserves protein fold. Obtained equations are supported by analysis of protein spatial structures. Received: 21 September 1995 / Accepted: 19 May 1997  相似文献   

4.
Species enter and persist in local communities because of their ecological fit to local conditions, and recently, ecologists have moved from measuring diversity as species richness and evenness, to using measures that reflect species ecological differences. There are two principal approaches for quantifying species ecological differences: functional (trait‐based) and phylogenetic pairwise distances between species. Both approaches have produced new ecological insights, yet at the same time methodological issues and assumptions limit them. Traits and phylogeny may provide different, and perhaps complementary, information about species' differences. To adequately test assembly hypotheses, a framework integrating the information provided by traits and phylogenies is required. We propose an intuitive measure for combining functional and phylogenetic pairwise distances, which provides a useful way to assess how functional and phylogenetic distances contribute to understanding patterns of community assembly. Here, we show that both traits and phylogeny inform community assembly patterns in alpine plant communities across an elevation gradient, because they represent complementary information. Differences in historical selection pressures have produced variation in the strength of the trait‐phylogeny correlation, and as such, integrating traits and phylogeny can enhance the ability to detect assembly patterns across habitats or environmental gradients.  相似文献   

5.
Summary Operator metrics are explicity designed to measure evolutionary distances from nucleic acid sequences when substitution rates differ greatly among the organisms being compared, or when substitutions have been extensive. Unlike lengths calculated by the distance matrix and parsimony methods, in which substitutions in one branch of a tree can alter the measured length of another branch, lengths determined by operator metrics are not affected by substitutions outside the branch.In the method, lengths (operator metrics) corresponding to each of the branches of an unrooted tree are calculated. The metric length of a branch reconstructs the number of (transversion) differences between sequences at a tip and a node (or between nodes) of a tree. The theory is general and is fundamentally independent of differences in substitution rates among the organisms being compared. Mathematically, the independence has been obtained becuase the metrics are eigen vectors of fundamental equations which describe the evolution of all unrooted trees.Even under conditions when both the distance matrix method or a simple parsimony length method are show to indicate lengths than are an order of magnitude too large or too small, the operator metrics are accurate. Examples, using data calculated with evolutionary rates and branchings designed to confuse the measurement of branch lengths and to camouflage the topology of the true tree, demonstrate the validity of operator metrics. The method is robust. Operator metric distances are easy to calculated, can be extended to any number of taxa, and provide a statistical estimate of their variances.The utility of the method is demonstrated by using it to analyze the origins and evolutionary of chloroplasts, mitochondria, and eubacteria.  相似文献   

6.

Background  

The understanding of evolutionary relationships is a fundamental aspect of modern biology, with the phylogenetic tree being a primary tool for describing these associations. However, comparison of trees for the purpose of assessing similarity and the quantification of various biological processes remains a significant challenge.  相似文献   

7.
The morphospecies of the genus Paramecium have several mating type groups, so-called syngens, composed of cells of complementary mating types. The Paramecium aurelia complex is composed of 15 sibling species assigned to the species from the syngen. To increase our understanding of the evolutionary relationships among syngen and sibling species of the genus Paramecium, we investigated the gene sequences of cytosol-type hsp70 from 7 syngens of Paramecium caudatum and 15 sibling species of P. aurelia. Molecular phylogenetic trees indicated that the P. aurelia complex could be divided into four lineages and separated into each sibling species. However, we did not find any obvious genetic distance among syngens of P. caudatum, and they could only be separated into two closely related groups. These results indicated that the concept of syngens in P. caudatum differs quite markedly from that of the P. aurelia complex. In addition, we also discuss the relationships among these species and other species, Paramecium jenningsi and Paramecium multimicronucleatum, which were once classified as varieties of P. aurelia.  相似文献   

8.
Over the past ten years there have been a large number of publications that have described hundreds of quantitative trait loci (QTL) in livestock species. To facilitate the comparison of QTL results across experiments, the Animal QTL database (QTLdb) was developed to house all published QTL information as a public repository. The QTLdb was originally developed to serve the porcine genomics community (previously known as PigQTLdb). We have further developed the QTLdb to house QTL data from multiple species, including but not limited to cattle, chickens, and pigs. In addition, tools have been developed to allow QTL map alignments against consensus linkage maps, radiation hybrid (RH) maps, BAC fingerprinted contig (FPC) maps, single nucleotide polymorphism (SNP) location maps, and human maps. In addition, we have expanded the capabilities of the database such that research tools were developed where “private” preliminary QTL data could be entered and compared against all public data. This allows researchers to visualize data before publication and compare it with published results to aid in data interpretation. To serve this purpose, the database curator/editor tools also include functions that allow registered users to enter their own QTL data, make use of the QTLdb tools for data analysis, and use the QTLdb as a publishing tool (URL: ).  相似文献   

9.
A protein alignment scoring system sensitive at all evolutionary distances   总被引:1,自引:0,他引:1  
Summary Protein sequence alignments generally are constructed with the aid of a substitution matrix that specifies a score for aligning each pair of amino acids. Assuming a simple random protein model, it can be shown that any such matrix, when used for evaluating variable-length local alignments, is implicitly a log-odds matrix, with a specific probability distribution for amino acid pairs to which it is uniquely tailored. Given a model of protein evolution from which such distributions may be derived, a substitution matrix adapted to detecting relationships at any chosen evolutionary distance can be constructed. Because in a database search it generally is not known a priori what evolutionary distances will characterize the similarities found, it is necessary to employ an appropriate range of matrices in order not to overlook potential homologies. This paper formalizes this concept by defining a scoring system that is sensitive at all detectable evolutionary distances. The statistical behavior of this scoring system is analyzed, and it is shown that for a typical protein database search, estimating the originally unknown evolutionary distance appropriate to each alignment costs slightly over two bits of information, or somewhat less than a factor of five in statistical significance. A much greater cost may be incurred, however, if only a single substitution matrix, corresponding to the wrong evolutionary distance, is employed.  相似文献   

10.
Allozyme genetic distances were estimated for ten species of akodontine rodents, as compared with the Oryzomyini Oligoryzomys longicaudatus , which was used as an outgroup to assess plesiomorphic character-states. Twenty-six loci were analysed. Distribution patterns of allele frequencies were determined by both phenetic (UPGMA) and cladistic (PAUP') techniques. Allozyme analysis confirmed monophyly for the Akodontini, and among them, the distinctiveness of the genus Oxymycterus. Genetic divergence among the eight species of Akodon was small compared to most known species of rodents. Phenrtic and phylogenetic analysis between Bolonys obscurus and species of Akodon was m agreement with previous chromosomal work but in disagreement with the indications of morphology. The general lack of allozymic differentiation among members of the Akodontini suggests that in this group molecular divergence is unrelated to speciation. Molecular clock estimation calibrated by fossils showed that generic divergence within Akodontini started at least in the late Miocene and that divergence of Akodintini from Orizomyini occurred within the Miocene.  相似文献   

11.
12.
13.
Tekaia F  Yeramian E 《Gene》2012,492(1):199-211
The proper detection of orthologs is crucial for evolutionary studies of genes and species. Despite large efforts to solve this problem the methodological situation appears unsettled to a large extent and the “quest for orthologs” is still an ongoing task in large-scale genome comparisons.Here, we introduce a simple operational framework for the detection of orthologs and their classification. The operational framework relies on well-established principles, optimizing their implementation for the considered purposes, and chaining components in coherent procedures: 1) We take advantage of the efficiency and simplicity of the Reciprocal Best Hit (RBH) detections, remedying (by design) the drawback concerning the limitations in terms of 1:1 detections. The procedure is based on the partitioning of Reciprocal Best Hits, with the further merging of partitions including members of the same paralogous classes (“SuperPartition of Orthologs” (SPOs)). 2) We then resort to the conservation profiles of the obtained clusters, allowing simple detection of SPOs containing duplicated members. Based on accepted evolutionary principles, such members can be further tagged as in-paralogs (co-orthologs) or out-paralogs.The method is illustrated and validated by extensive genomic analyses. The performances of the overall approach are characterized in global terms for three sets of species (Chlamydiae, Mycobacteria, Aspergilli), showing that at least 75% of the sets of orthologs contain at most one protein from a given species. The sets including more than one protein from a given species are shown to contain in-paralogs in proportions varying from 28% to 58%. The characterizations also show that the large majority of SPOs are associated with ancestral motifs, and accordingly not prone to chaining effects that might be triggered by multi-domain proteins. Further the SPO formulation is compared to other similarity based ortholog detection methods. Beyond core common results, significant differences are observed between various methods, which can be accounted for to a large extent on conceptual grounds, relative to the different merging schemes involved. Such comparisons highlight a major advantage of the SPO approach concerning the proper clustering of associated paralogs, which appear to be often dispatched spuriously into distinct orthologous classes.Finally the perspectives for future applications and elaborations of SPO-based compositional analyses are discussed.  相似文献   

14.
Since the initial work of Jukes and Cantor (1969), a number of procedures have been developed to estimate the expected number of nucleotide substitutions corresponding to a given observed level of nucleotide differentiation assuming particular evolutionary models. Unlike the proportion of different sites, the expected number of substitutions that would have occurred grows linearly with time and therefore has had great appeal as an evolutionary distance. Recently, however, a number of authors have tried to develop improved statistical approaches for generating and evaluating evolutionary distances (Schoniger and von Haeseler 1993; Goldstein and Polock 1994; Tajima and Takezaki 1994). These studies clearly show that the estimated number of nucleotide substitutions is generally not the best estimator for use in reconstruction of phylogenetic relationships. The reason for this is that there is often a large error associated with the estimation of this number. Therefore, even though its expectation is correct (i.e., on average the expected number of substitutions is proportional to time- -but see Tajima 1993), it is not expected to be as useful as estimators designed to have a lower variance.   相似文献   

15.
This paper deals with phylogenetic inference when the variability of substitution rates across sites (VRAS) is modeled by a gamma distribution. We show that underestimating VRAS, which results in underestimates for the evolutionary distances between sequences, usually improves the topological accuracy of phylogenetic tree inference by distance-based methods, especially when the molecular clock holds. We propose a method to estimate the gamma shape parameter value which is most suited for tree topology inference, given the sequences at hand. This method is based on the pairwise evolutionary distances between sequences and allows one to reconstruct the phylogeny of a high number of taxa (>1,000). Simulation results show that the topological accuracy is highly improved when using the gamma shape parameter value given by our method, compared with the true (unknown) value which was used to generate the data. Furthermore, when VRAS is high, the topological accuracy of our distance-based method is better than that of a maximum likelihood approach. Finally, a data set of Maoricicada species sequences is analyzed, which confirms the advantage of our method.  相似文献   

16.
Phylogenetic methods that use matrices of pairwise distances between sequences (e.g., neighbor joining) will only give accurate results when the initial estimates of the pairwise distances are accurate. For many different models of sequence evolution, analytical formulae are known that give estimates of the distance between two sequences as a function of the observed numbers of substitutions of various classes. These are often of a form that we call "log transform formulae". Errors in these distance estimates become larger as the time t since divergence of the two sequences increases. For long times, the log transform formulae can sometimes give divergent distance estimates when applied to finite sequences. We show that these errors become significant when t approximately 1/2 |lambda(max)|(-1) logN, where lambda(max) is the eigenvalue of the substitution rate matrix with the largest absolute value and N is the sequence length. Various likelihood-based methods have been proposed to estimate the values of parameters in rate matrices. If rate matrix parameters are known with reasonable accuracy, it is possible to use the maximum likelihood method to estimate evolutionary distances while keeping the rate parameters fixed. We show that errors in distances estimated in this way only become significant when t approximately 1/2 |lambda(1)|(-1) logN, where lambda(1) is the eigenvalue of the substitution rate matrix with the smallest nonzero absolute value. The accuracy of likelihood-based distance estimates is therefore much higher than those based on log transform formulae, particularly in cases where there is a large range of timescales involved in the rate matrix (e.g., when the ratio of transition to transversion rates is large). We discuss several practical ways of estimating the rate matrix parameters before distance calculation and hence of increasing the accuracy of distance estimates.  相似文献   

17.
Halachev MR  Loman NJ  Pallen MJ 《PloS one》2011,6(12):e28388
Among proteins, orthologs are defined as those that are derived by vertical descent from a single progenitor in the last common ancestor of their host organisms. Our goal is to compute a complete set of protein orthologs derived from all currently available complete bacterial and archaeal genomes. Traditional approaches typically rely on all-against-all BLAST searching which is prohibitively expensive in terms of hardware requirements or computational time (requiring an estimated 18 months or more on a typical server). Here, we present xBASE-Orth, a system for ongoing ortholog annotation, which applies a "divide and conquer" approach and adopts a pragmatic scheme that trades accuracy for speed. Starting at species level, xBASE-Orth carefully constructs and uses pan-genomes as proxies for the full collections of coding sequences at each level as it progressively climbs the taxonomic tree using the previously computed data. This leads to a significant decrease in the number of alignments that need to be performed, which translates into faster computation, making ortholog computation possible on a global scale. Using xBASE-Orth, we analyzed an NCBI collection of 1,288 bacterial and 94 archaeal complete genomes with more than 4 million coding sequences in 5 weeks and predicted more than 700 million ortholog pairs, clustered in 175,531 orthologous groups. We have also identified sets of highly conserved bacterial and archaeal orthologs and in so doing have highlighted anomalies in genome annotation and in the proposed composition of the minimal bacterial genome. In summary, our approach allows for scalable and efficient computation of the bacterial and archaeal ortholog annotations. In addition, due to its hierarchical nature, it is suitable for incorporating novel complete genomes and alternative genome annotations. The computed ortholog data and a continuously evolving set of applications based on it are integrated in the xBASE database, available at http://www.xbase.ac.uk/.  相似文献   

18.

Background  

Tissue microarray (TMA) technology has been developed to facilitate large, genome-scale molecular pathology studies. This technique provides a high-throughput method for analyzing a large cohort of clinical specimens in a single experiment thereby permitting the parallel analysis of molecular alterations (at the DNA, RNA, or protein level) in thousands of tissue specimens. As a vast quantity of data can be generated in a single TMA experiment a systematic approach is required for the storage and analysis of such data.  相似文献   

19.
20.
Myrmecochorous dispersal distances: a world survey   总被引:13,自引:0,他引:13  
Abstract. Myrmecochorous dispersal distances are reviewed; the seed dispersal curve generated by ants shows a characteristic peak at short distances and a long tail, a shape suited to small densities of safe sites. Mean global distance is of 0.96 m (n= 2524) with a range of 0.01–77 m. Data have been broken down by geography (Northern hemisphere v. Southern hemisphere), taxonomy (ant subfamilies) and ecology (vegetation: sclerophyllous v. mesophyllous). Although a statistical difference exists between dispersal curves from the Northern hemisphere and the Southern hemisphere, this may be an artefact of lack of data from mesophyllous myrmecochores from this hemisphere. The four ant subfamilies do show also numerical differences but could not be subjected to statistical analysis. A difference between the shape of dispersal curve for sclerophyllous myrmecochores and mesophyllous myrmecochores has also been detected. We hypothesize that this difference is related to the myrmecological communities from both types of vegetation: dispersing ants from sclerophyllous vegetation would have smaller nest densities and/or bigger foraging areas than dispersing ants from mesic environments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号