首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Accuracy of phylogenetic trees estimated from DNA sequence data   总被引:4,自引:1,他引:3  
The relative merits of four different tree-making methods in obtaining the correct topology were studied by using computer simulation. The methods studied were the unweighted pair-group method with arithmetic mean (UPGMA), Fitch and Margoliash's (FM) method, thd distance Wagner (DW) method, and Tateno et al.'s modified Farris (MF) method. An ancestral DNA sequence was assumed to evolve into eight sequences following a given model tree. Both constant and varying rates of nucleotide substitution were considered. Once the DNA sequences for the eight extant species were obtained, phylogenetic trees were constructed by using corrected (d) and uncorrected (p) nucleotide substitutions per site. The topologies of the trees obtained were then compared with that of the model tree. The results obtained can be summarized as follows: (1) The probability of obtaining the correct rooted or unrooted tree is low unless a large number of nucleotide differences exists between different sequences. (2) When the number of nucleotide substitutions per sequence is small or moderately large, the FM, DW, and MF methods show a better performance than UPGMA in recovering the correct topology. The former group of methods is particularly good for obtaining the correct unrooted tree. (3) When the number of substitutions per sequence is large, UPGMA is at least as good as the other methods, particularly for obtaining the correct rooted tree. (4) When the rate of nucleotide substitution varies with evolutionary lineage, the FM, DW, and MF methods show a better performance in obtaining the correct topology than UPGMA, except when a rooted tree is to be produced from data with a large number of nucleotide substitutions per sequence.(ABSTRACT TRUNCATED AT 250 WORDS)   相似文献   

3.
It is frequently true that molecular sequences do not evolve in a strictly clocklike manner. Instead, substitution rate may vary for a number of reasons, including changes in selection pressure and effective population size, as well as changes in mean generation time. Here we present two new methods for estimating stepwise changes in substitution rates when serially sampled molecular sequences are available. These methods are based on multiple rates with dated tips (MRDT) models and allow different rates to be estimated for different intervals of time. These intervals may correspond to the sampling intervals or to a priori--defined intervals that are not coincident with the times the serial samples are obtained. Two methods for obtaining estimates of multiple rates are described. The first is an extension of the phylogeny-based maximum-likelihood estimation procedure introduced by Rambaut. The second is a new parameterization of the pairwise distance least-squares procedure used by Drummond and Rodrigo. The utility of these methods is demonstrated on a genealogy of HIV sequences obtained at five different sampling times from a single patient over a period of 34 months.  相似文献   

4.
Terminal restriction fragment length polymorphism (T-RFLP) is a culture-independent method of obtaining a genetic fingerprint of the composition of a microbial community. Comparisons of the utility of different methods of (i) including peaks, (ii) computing the difference (or distance) between profiles, and (iii) performing statistical analysis were made by using replicated profiles of eubacterial communities. These samples included soil collected from three regions of the United States, soil fractions derived from three agronomic field treatments, soil samples taken from within one meter of each other in an alfalfa field, and replicate laboratory bioreactors. Cluster analysis by Ward's method and by the unweighted-pair group method using arithmetic averages (UPGMA) were compared. Ward's method was more effective at differentiating major groups within sets of profiles; UPGMA had a slightly reduced error rate in clustering of replicate profiles and was more sensitive to outliers. Most replicate profiles were clustered together when relative peak height or Hellinger-transformed peak height was used, in contrast to raw peak height. Redundancy analysis was more effective than cluster analysis at detecting differences between similar samples. Redundancy analysis using Hellinger distance was more sensitive than that using Euclidean distance between relative peak height profiles. Analysis of Jaccard distance between profiles, which considers only the presence or absence of a terminal restriction fragment, was the most sensitive in redundancy analysis, and was equally sensitive in cluster analysis, if all profiles had cumulative peak heights greater than 10,000 fluorescence units. It is concluded that T-RFLP is a sensitive method of differentiating between microbial communities when the optimal statistical method is used for the situation at hand. It is recommended that hypothesis testing be performed by redundancy analysis of Hellinger-transformed data and that exploratory data analysis be performed by cluster analysis using Ward's method to find natural groups or by UPGMA to identify potential outliers. Analyses can also be based on Jaccard distance if all profiles have cumulative peak heights greater than 10,000 fluorescence units.  相似文献   

5.
Terminal restriction fragment length polymorphism (T-RFLP) is a culture-independent method of obtaining a genetic fingerprint of the composition of a microbial community. Comparisons of the utility of different methods of (i) including peaks, (ii) computing the difference (or distance) between profiles, and (iii) performing statistical analysis were made by using replicated profiles of eubacterial communities. These samples included soil collected from three regions of the United States, soil fractions derived from three agronomic field treatments, soil samples taken from within one meter of each other in an alfalfa field, and replicate laboratory bioreactors. Cluster analysis by Ward's method and by the unweighted-pair group method using arithmetic averages (UPGMA) were compared. Ward's method was more effective at differentiating major groups within sets of profiles; UPGMA had a slightly reduced error rate in clustering of replicate profiles and was more sensitive to outliers. Most replicate profiles were clustered together when relative peak height or Hellinger-transformed peak height was used, in contrast to raw peak height. Redundancy analysis was more effective than cluster analysis at detecting differences between similar samples. Redundancy analysis using Hellinger distance was more sensitive than that using Euclidean distance between relative peak height profiles. Analysis of Jaccard distance between profiles, which considers only the presence or absence of a terminal restriction fragment, was the most sensitive in redundancy analysis, and was equally sensitive in cluster analysis, if all profiles had cumulative peak heights greater than 10,000 fluorescence units. It is concluded that T-RFLP is a sensitive method of differentiating between microbial communities when the optimal statistical method is used for the situation at hand. It is recommended that hypothesis testing be performed by redundancy analysis of Hellinger-transformed data and that exploratory data analysis be performed by cluster analysis using Ward's method to find natural groups or by UPGMA to identify potential outliers. Analyses can also be based on Jaccard distance if all profiles have cumulative peak heights greater than 10,000 fluorescence units.  相似文献   

6.
The bacterial, archaeal, and eukaryal diversity in fecal samples from ten Koreans were analyzed and compared by using the PCR-fingerprinting method, denaturing gradient gel electrophoresis (DGGE). The bacteria all belonged to the Firmicutes and Bacteroidetes phyla, which were known to be the dominant bacterial species in the human intestine. Most of the archaeal sequences belonged to the methane-producing archaea but several halophilic archarea-related sequences were also detected unexpectedly. While a small number of eukaryal sequences were also detected upon DGGE analysis, these sequences were related to fungi and stramenopiles (Blastocystis hominis). With regard to the bacterial and archaeal DGGE analysis, all ten samples had one and two prominent bands, respectively, but many individual-specific bands were also observed. However, only five of the ten samples had small eukaryal DGGE bands and none of these bands was observed in all five samples. Unweighted pair group method and arithmetic averages clustering algorithm (UPGMA) clustering analysis revealed that the archaeal and bacterial communities in the ten samples had relatively higher relatedness (the average Dice coefficient values were 68.9 and 59.2% for archaea and bacteria, respectively) but the eukaryal community showed low relatedness (39.6%).  相似文献   

7.
Determining the longitudinal molecular evolution of hepatitis B virus (HBV) is difficult due to HBV's genomic complexity and the need to study paired samples collected over long periods of time. In this study, serial samples were collected from eight hepatitis B virus e antigen-negative asymptomatic carriers of HBV genotype B in 1979 and 2004, thus providing a 25-year period to document the long-term molecular evolution of HBV. The rate, nature, and distribution of mutations that emerged over 25 years were determined by phylogenetic and linear regression analysis of full-length HBV genome sequences. Nucleotide hypervariability was observed within the polymerase and pre-S/S overlap region and within the core gene. The calculated mean number of nucleotide substitutions/site/year (7.9 x 10(-5)) was slightly higher than published estimates (1.5 x 10(-5) to 5 x 10(-5)). Nucleotide changes in the quasispecies population did not significantly alter the molecular evolutionary rate, based on linear regression analysis of evolutionary distances among serial clone pre-S region sequences. Therefore, the directly amplified or dominant sequence was sufficient to estimate the putative molecular evolutionary rate for these long-term serial samples. On average, the ratio of synonymous (dS) to nonsynonymous (dN) substitutions was highest for the polymerase-coding region and lowest for the core-coding region. The low dS/dN ratios observed within the core suggest that selection occurs within this gene region, possibly as an immune evasion strategy. The results of this study suggest that HBV sequence divergence may occur more rapidly than previously estimated, in a host immune phase-dependent manner.  相似文献   

8.
We present in this paper a simple method for estimating the mutation rate per site per year which also yields an estimate of the length of a generation when mutation rate per site per generation is known. The estimator, which takes advantage of DNA polymorphisms in longitudinal samples, is unbiased under a number of population models, including population structure and variable population size over time. We apply the new method to a longitudinal sample of DNA sequences of the env gene of human immunodeficiency virus type 1 (HIV-1) from a single patient and obtain 1.62 x 10(-2) as the mutation rate per site per year for HIV-1. Using an independent data set to estimate the mutation rate per generation, we obtain 1.8 days as the length of a generation of HIV-1, which agrees well with recent estimates based on viral load data. Our estimate of generation time differs considerably from a recent estimate by Rodrigo et al. when the same mutation rate per site per generation is used. Some factors that may contribute to the difference among different estimators are discussed.  相似文献   

9.
徐立业  李玉 《生物信息学》2007,5(4):160-162
对于一组给定的DNA或蛋白质序列,UPGMA算法构建的二叉进化树可能是不惟一的,其具体拓扑结构与序列输入顺序相关,这一现象通常被称为"tied trees"。提出了UPGMA的一种改进算法——不加权算术平均组群方法(UMGMA),用以解决UPGMA树的不惟一问题。在UPGMA树惟一时,该方法产生的进化树与UPGMA树相同;而在UPGMA树不惟一时,该方法可以产生一棵惟一的、与序列输入顺序无关的多叉进化树,而且该算法还具有一个可调的容差参数,来控制生成进化树的主要分枝结构,这对于突出大规模进化树的总体脉络具有重要意义。  相似文献   

10.
Wang J 《Genetics》2006,173(3):1679-1692
A variety of estimators have been developed to use genetic marker information in inferring the admixture proportions (parental contributions) of a hybrid population. The majority of these estimators used allele frequency data, ignored molecular information that is available in markers such as microsatellites and DNA sequences, and assumed that mutations are absent since the admixture event. As a result, these estimators may fail to deliver an estimate or give rather poor estimates when admixture is ancient and thus mutations are not negligible. A previous molecular estimator based its inference of admixture proportions on the average coalescent times between pairs of genes taken from within and between populations. In this article I propose an estimator that considers the entire genealogy of all of the sampled genes and infers admixture proportions from the numbers of segregating sites in DNA sequence samples. By considering the genealogy of all sequences rather than pairs of sequences, this new estimator also allows the joint estimation of other interesting parameters in the admixture model, such as admixture time, divergence time, population size, and mutation rate. Comparative analyses of simulated data indicate that the new coalescent estimator generally yields better estimates of admixture proportions than the previous molecular estimator, especially when the parental populations are not highly differentiated. It also gives reasonably accurate estimates of other admixture parameters. A human mtDNA sequence data set was analyzed to demonstrate the method, and the analysis results are discussed and compared with those from previous studies.  相似文献   

11.
测定了7个不同地理种群的亚洲小车蝗Oedaleus asiaticus(Bienko)、1个近缘种及2个外群种共31个样本的mtDNA ND1基因序列,比较其同源性,计算核苷酸组成,并以槌角蝗科的宽须蚁蝗Myrmeleotettix palpalis和斑腿蝗科的鼓翅皱膝蝗Angaracris barabensis作外群...  相似文献   

12.
Ren F  Ogishima S  Tanaka H 《Gene》2003,317(1-2):89-95
A new method for reconstructing phylogenetic relationships of within-host (patient) viral evolution from noncontemporaneous samples is presented. This method has two important features: noncontemporaneous viral samples can be dealt with by a simple computing algorithm, and both neutral and adaptive evolution patterns occurring during the process of viral evolution can be estimated. In our previous study, we proposed a preliminary formulation of this algorithm that was based on the maximum likelihood method. However, that preliminary formulation was difficult to use because the calculation of the likelihood required an extremely large amount of time and the number of possible tree topologies increased exponentially according to the increase in the number of viral variants. In this paper, we propose another new algorithm, referred to as a distance-based sequential-linking algorithm, in which the neighbor-joining method is employed for reconstruction of the longitudinal phylogenetic tree from serial viral samples. This algorithm is applied to a longitudinal data set of the env gene (V3 region) of human immunodeficiency virus type 1 (HIV-1) obtained over 7 years after the infection of a single patient. The results suggest that this method can successfully reconstruct a longitudinal phylogenetic tree from noncontemporaneous viral samples within a reasonable calculation time. This revised method proved to be a useful tool for estimating the dynamic process of within-host viral evolution.  相似文献   

13.
Simple sequence repeat (SSR) markers generated from expressed sequence tag (EST) sequences represent useful tools for genotyping and their development is relatively easy because of the public availability of EST databases. We report design and application of EST–SSRs to assess the level of genetic diversity among thirty-five asparagus cultivars and to fingerprint DePaoli, a new variety released by University of California, Riverside. DNA was isolated from bulks of pooled cladophylls coming from five plants of each variety to reduce the number of DNA extractions and PCR reactions. Allele frequencies were estimated from the intensity of the bands in two bulks and two individual plant samples for each variety. Although asparagus varieties derive from a limited germplasm pool, eight EST–SSR loci differentiated all of the analyzed cultivars. Moreover, UPGMA (unweighted pair group method with arithmetic mean) and neighbor-joining trees, as well as principal components analysis separated the cultivars into clusters corresponding to the geographical areas where they originated.  相似文献   

14.
Accuracy of estimated phylogenetic trees from molecular data   总被引:2,自引:0,他引:2  
Summary The accuracies and efficiencies of four different methods for constructing phylogenetic trees from molecular data were examined by using computer simulation. The methods examined are UPGMA, Fitch and Margoliash's (1967) (F/M) method, Farris' (1972) method, and the modified Farris method (Tateno, Nei, and Tajima, this paper). In the computer simulation, eight OTUs (32 OTUs in one case) were assumed to evolve according to a given model tree, and the evolutionary change of a sequence of 300 nucleotides was followed. The nucleotide substitution in this sequence was assumed to occur following the Poisson distribution, negative binomial distribution or a model of temporally varying rate. Estimates of nucleotide substitutions (genetic distances) were then computed for all pairs of the nucleotide sequences that were generated at the end of the evolution considered, and from these estimates a phylogenetic tree was reconstructed and compared with the true model tree. The results of this comparison indicate that when the coefficient of variation of branch length is large the Farris and modified Farris methods tend to be better than UPGMA and the F/M method for obtaining a good topology. For estimating the number of nucleotide substitutions for each branch of the tree, however, the modified Farris method shows a better performance than the Farris method. When the coefficient of variation of branch length is small, however, UPGMA shows the best performance among the four methods examined. Nevertheless, any tree-making method is likely to make errors in obtaining the correct topology with a high probability, unless all branch lengths of the true tree are sufficiently long. It is also shown that the agreement between patristic and observed genetic distances is not a good indicator of the goodness of the tree obtained.  相似文献   

15.
Principal Coordinate analysis (PCO) was applied to the comparison of protein sequences. A similarity matrix was derived from a dataset containing 21 c-type cytochrome sequences and this was analysed using PCO to produce a plot of the first three principal axes. The relationships indicated from this plot are considered in conjuction with those derived by cluster analysis using the UPGMA method, and the advantages offered by a non-hierarcical method of sequence comparison discussed.  相似文献   

16.
Longitudinal samples of DNA sequences are the DNA sequences sampled from the same population at different time points. For fast evolving organisms, e.g. RNA virus, these kind of samples have increasingly been used to study the evolutionary process in action. Longitudinal samples provide some interesting new summary statistics of genetic variation, such as the frequency of mutation of size i in one sample and size j in another, the average number of mutations accumulated since the common ancestor of two sequences each from a different sample, and number of private, shared and fixed mutations within samples. To make the results more applicable, we used in this study a general two-sample model, which assumes two longitudinal samples were taken from the same measurably evolving population. Inspired by the HIV study, we also studied a two-sample-two-stage model, which is a special case of two-sample model and assumes a treatment after the first sampling instantaneously changes the population size. We derived the formulas for calculating statistical properties, e.g. expectations, variances and covariances, of these new summary statistics under the two models. Potential applications of these results were discussed.  相似文献   

17.
Summary The effects of temporal (among different branches of a phylogeny) and spatial (among different nucleotide sites within a gene) nonuniformities of nucleotide substitution rates on the construction of phylogenetic trees from nucleotide sequences are addressed. Spatial nonuniformity may be estimated by using Shannon's (1948) entropy formula to measure the Relative Nucleotide Variability (RNV) at each nucleotide site in an aligned set of sequences; this is demonstrated by a comparative analysis of 5S rRNAs. New methods of constructing phylogenetic trees are proposed that augment the Unweighted Pair-Group Using Arithmetic Averages (UPGMA) algorithm by estimating and compensating for both spatial and temporal nonuniformity in substitution rates. These methods are evaluated by computer simulations of 5S rRNA evolution that include both kinds of nonuniformities. It was found that the proposed Reference Ratio Method improved both the ability to reconstruct the correct topology of a tree and also the estimation of branch lengths as compared to UPGMA. A previous method (Farris et al. 1970; Klotz et al. 1979; Li 1981) was found to be less successful in reconstructing topologies when there is high probability of multiple mutations at some sites. Phylogenetic analyses of 5S rRNA sequences support the endosymbiotic origins of both chloroplasts and mitochondria, even though the latter exhibit an accelerated rate of nucleotide substitution. Phylogenetic trees also reveal an adaptive radiation within the eubacteria and another within the eukaryotes for the origins of most major phyla within each group during the Precambrian era.  相似文献   

18.
A molecular method based on PCR-restriction fragment length polymorphism (RFLP) analysis of internal transcribed spacer (ITS) ribosomal DNA sequences was designed to rapidly identify fungal species, with members of the genus Pleurotus as an example. Based on the results of phylogenetic analysis of ITS sequences from Pleurotus, a PCR-RFLP endonuclease autoscreening (PRE Auto) program was developed to screen restriction endonucleases for discriminating multiple sequences from different species. The PRE Auto program analyzes the endonuclease recognition sites and calculates the sizes of the fragments in the sequences that are imported into the program in groups according to species recognition. Every restriction endonuclease is scored through the calculation of the average coefficient for the sequence groups and the average coefficient for the sequences within a group, and then virtual electrophoresis maps for the selected restriction enzymes, based on the results of the scoring system, are displayed for the rapid determination of the candidate endonucleases. A total of 85 haplotypes representing 151 ITS sequences were used for the analysis, and 2,992 restriction endonucleases were screened to find the candidates for the identification of species. This method was verified by an experiment with 28 samples representing 12 species of Pleurotus. The results of the digestion by the restriction enzymes showed the same patterns of DNA fragments anticipated by the PRE Auto program, apart from those for four misidentified samples. ITS sequences from 14 samples (of which nine sequences were obtained in this study), including four originally misidentified samples, confirmed the species identities revealed by the PCR-RFLP analysis. The method developed here can be used for the identification of species of other living microorganisms.  相似文献   

19.
Cerrado is a savanna ecosystem of central and southeastern Brazil. Many woody species of cerrado have thick layers of cork. The present work aimed to characterize, by GC/MS analysis, the constituents of n-hexane extracts from the cork of common species from cerrado. Cork samples from 31 individuals, corresponding to 14 species and six families, were analyzed. Similarities and differences were noticed between cork and cuticular waxes regarding profiles of lipophilic constituents. The distribution of cork constituents was analyzed using the UPGMA clustering method and DICE coefficient. All clusters in the dendrogram obtained comprise individuals from a same species, suggesting that the distribution of lipophilic cork constituents is useful for species characterization and possibly also for species identification, resembling results commonly obtained with molecular markers. Seven samples of Bignoniaceae, corresponding to two genera and seven species, emerged in a common cluster, in an arrangement in accordance with the recent segregation of Tabebuia species to a new genus Handroanthus. The markers analyzed were not efficient regarding characterization of other families.  相似文献   

20.
不同产区太子参的rDNA ITS区序列的比较   总被引:14,自引:2,他引:14  
使用1对引物18SPl和26SP2对采自14个产区的太子参[Pseudostellaria heterophylla(Miq.)Pax ex Pax et Hoffm.]进行ITS基因的PCR扩增和测序。序列分析结果表明,14个产区太子参的ITSl片段长度为219—222bp,ITS2片段长度为235~236bp,5.8S片段长度为155—157bp.除江苏宜兴,江苏句容马梗,江苏南京老鹰山和江苏溧阳等4个产区的ITS序列碱基完全一致外,其他10个产区的ITS序列则有不同的变异,碱基变异数目(包括5.8S编码区)为1—17个。使用UPGMA法重建系统发生树,从分子生物学角度说明了它们的变异程度,为利用ITS区序列的差异鉴别不同产区的太子参提供了依据。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号