期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree

Gadagkar SR Rosenberg MS Kumar S 《Journal of experimental zoology. Part B. Molecular and developmental evolution》2005,304(1):64-74

Phylogenetic trees from multiple genes can be obtained in two fundamentally different ways. In one, gene sequences are concatenated into a super-gene alignment, which is then analyzed to generate the species tree. In the other, phylogenies are inferred separately from each gene, and a consensus of these gene phylogenies is used to represent the species tree. Here, we have compared these two approaches by means of computer simulation, using 448 parameter sets, including evolutionary rate, sequence length, base composition, and transition/transversion rate bias. In these simulations, we emphasized a worst-case scenario analysis in which 100 replicate datasets for each evolutionary parameter set (gene) were generated, and the replicate dataset that produced a tree topology showing the largest number of phylogenetic errors was selected to represent that parameter set. Both randomly selected and worst-case replicates were utilized to compare the consensus and concatenation approaches primarily using the neighbor-joining (NJ) method. We find that the concatenation approach yields more accurate trees, even when the sequences concatenated have evolved with very different substitution patterns and no attempts are made to accommodate these differences while inferring phylogenies. These results appear to hold true for parsimony and likelihood methods as well. The concatenation approach shows >95% accuracy with only 10 genes. However, this gain in accuracy is sometimes accompanied by reinforcement of certain systematic biases, resulting in spuriously high bootstrap support for incorrect partitions, whether we employ site, gene, or a combined bootstrap resampling approach. Therefore, it will be prudent to report the number of individual genes supporting an inferred clade in the concatenated sequence tree, in addition to the bootstrap support. 相似文献

2.

Incomplete Lineage Sorting Is Common in Extant Gibbon Genera

Jeffrey D. Wall Sung K. Kim Francesca Luca Lucia Carbone Alan R. Mootnick Pieter J. de Jong Anna Di Rienzo 《PloS one》2013,8(1)

We sequenced reduced representation libraries by means of Illumina technology to generate over 1.5 Mb of orthologous sequence from a representative of each of the four extant gibbon genera (Nomascus, Hylobates, Symphalangus, and Hoolock). We used these data to assess the evolutionary relationships between the genera by evaluating the likelihoods of all possible bifurcating trees involving the four taxa. Our analyses provide weak support for a tree with Nomascus and Hylobates as sister taxa and with Hoolock and Symphalangus as sister taxa, though bootstrap resampling suggests that other phylogenetic scenarios are also possible. This uncertainty is due to short internal branch lengths and extensive incomplete lineage sorting across taxa. The true phylogenetic relationships among gibbon genera will likely require a more extensive whole-genome sequence analysis. 相似文献

3.

Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences: II. Four taxa without a molecular clock

Andrey Zharkikh Wen-Hsiung Lit 《Journal of molecular evolution》1992,35(4):356-366

Summary The statistical properties of sample estimation and bootstrap estimation of phylogenetic variability from a sample of nucleotide sequences were studied by considering model trees of three taxa with an outgroup. The cases of constant and varying rates of nucleotide substitution were compared. From sequences obtained by simulation, phylogenetic trees were constructed by using the maximum parsimony (MP) and neighbor joining (NJ) methods. The effectiveness and consistency of the MP method were studied in terms of proportions of informative sites. The results of simulation showed that bootstrap estimation of the confidence level for an inferred phylogeny can be used even under unequal rates of evolution if the rate differences are not large so that the MP method is not misleading. The condition under which the MP method becomes misleading (inconsistent) is more stringent for slowly evolving sequences than for rapidly evolving ones, and it also depends on the length of the internal branch. If the rate differences are large so that the MP method becomes consistently misleading, then bootstrap estimation will reinforce an erroneous conclusion on topology. Similar conclusions apply to the NJ method with uncorrected distances. The NJ method with corrected distances performs poorly when the sequence length is short but can avoid the inconsistency problem if the sequence length is long and if the distances can be estimated accurately.Offprint requests to: W.-H. Li 相似文献

4.

Convergence in ascospore discharge mechanism among pyrenomycete fungi based on 18S ribosomal RNA gene sequence. 总被引：3，自引：0，他引：3

M L Berbee J W Taylor 《Molecular phylogenetics and evolution》1992,1(1):59-71

Fungi of the class Pyrenomycetes (Ascomycotina) form a morphological series ranging from those that shoot ascospores (sexual spores) forcibly from the ascus (spore sac) to fungi that ooze ascospores or have no obvious mechanism for ascospore release. Did forcible ascospore discharge evolve within these pyrenomycetes, or has it been lost in the group? We determined the sequences of the 18S ribosomal RNA gene from three fungi and used these, along with six sequences from our previous work and three sequences from GenBank, to infer the phylogeny of 12 ascomycetes with various ascospore discharge mechanisms. The 1720 base pairs of sequence data per fungus yielded 361 variable sites, 198 phylogenetically informative sites, and a single most parsimonious tree requiring 562 nucleotide changes. The tree shows that the capacity to shoot ascospores into the air has been lost or, less probably, gained repeatedly and independently. Species lacking forcible ascospore discharge are intercalated among three lineages of species with forcible discharge. In this tree, seven of the nine internal branches appeared in 95% or more of 500 bootstrap replicates. A tree uniting the fungi with forcible ascospore discharge into a monophyletic group required 45 additional steps and fit significantly less well with the data than the most parsimonious tree, based on a maximum likelihood test. Two of the fungi whose sequence we determined, Pseudallescheria boydii and Sporothrix schenckii, are not closely related to one another, even though both are human pathogens and both are from pyrenomycete lineages lacking forcible ascospore discharge. Using the well-resolved, most parsimonious tree, we inferred base substitution patterns in the 18S rRNA. The transition-to-transversion ratio was 1.9. Of all 12 possible substitutions, 29% were from U to C. At sites corresponding to yeast stem positions, A to G transitions were frequent, perhaps compensating for some of the U to C changes, and maintaining secondary structure base pairing (A to G:U to C = 3:4). In loop or bulge positions without secondary structure base pairing, U to C transitions were still frequent, but A to G transitions were rare (A to G:U to C = 1:5). 相似文献

5.

Estimating effective population size from samples of sequences: a bootstrap Monte Carlo integration method.

J Felsenstein 《Genetical research》1992,60(3):209-220

We would like to use maximum likelihood to estimate parameters such as the effective population size N(e) or, if we do not know mutation rates, the product 4N(e) mu of mutation rate per site and effective population size. To compute the likelihood for a sample of unrecombined nucleotide sequences taken from a random-mating population it is necessary to sum over all genealogies that could have led to the sequences, computing for each one the probability that it would have yielded the sequences, and weighting each one by its prior probability. The genealogies vary in tree topology and in branch lengths. Although the likelihood and the prior are straightforward to compute, the summation over all genealogies seems at first sight hopelessly difficult. This paper reports that it is possible to carry out a Monte Carlo integration to evaluate the likelihoods approximately. The method uses bootstrap sampling of sites to create data sets for each of which a maximum likelihood tree is estimated. The resulting trees are assumed to be sampled from a distribution whose height is proportional to the likelihood surface for the full data. That it will be so is dependent on a theorem which is not proven, but seems likely to be true if the sequences are not short. One can use the resulting estimated likelihood curve to make a maximum likelihood estimate of the parameter of interest, N(e) or of 4N(e) mu. The method requires at least 100 times the computational effort required for estimation of a phylogeny by maximum likelihood, but is practical on today's work stations. The method does not at present have any way of dealing with recombination. 相似文献

6.

Statistical measures of uncertainty for branches in phylogenetic trees inferred from molecular sequences by using model-based methods

Wróbel B 《Journal of applied genetics》2008,49(1):49-67

In recent years, the emphasis of theoretical work on phylogenetic inference has shifted from the development of new tree inference methods to the development of methods to measure the statistical support for the topologies. This paper reviews 3 approaches to assign support values to branches in trees obtained in the analysis of molecular sequences: the bootstrap, the Bayesian posterior probabilities for clades, and the interior branch tests. In some circumstances, these methods give different answers. It should not be surprising: their assumptions are different. Thus the interior branch tests assume that a given topology is true and only consider if a particular branch length is longer than zero. If a tree is incorrect, a wrong branch (a low bootstrap or Bayesian support may be an indication) may have a non-zero length. If the substitution model is oversimplified, the length of a branch may be overestimated, and the Bayesian support for the branch may be inflated. The bootstrap, on the other hand, approximates the variance of the data under the real model of sequence evolution, because it involves direct resampling from this data. Thus the discrepancy between the Bayesian support and the bootstrap support may signal model inaccuracy. In practical application, use of all 3 methods is recommended, and if discrepancies are observed, then a careful analysis of their potential origins should be made. 相似文献

7.

Interior-branch and bootstrap tests of phylogenetic trees 总被引：19，自引：3，他引：16

Sitnikova T; Rzhetsky A; Nei M 《Molecular biology and evolution》1995,12(2):319-333

We have compared statistical properties of the interior-branch and bootstrap tests of phylogenetic trees when the neighbor-joining tree- building method is used. For each interior branch of a predetermined topology, the interior-branch and bootstrap tests provide the confidence values, PC and PB, respectively, that indicate the extent of statistical support of the sequence cluster generated by the branch. In phylogenetic analysis these two values are often interpreted in the same way, and if PC and PB are high (say, > or = 0.95), the sequence cluster is regarded as reliable. We have shown that PC is in fact the complement of the P-value used in the standard statistical test, but PB is not. Actually, the bootstrap test usually underestimates the extent of statistical support of species clusters. The relationship between the confidence values obtained by the two tests varies with both the topology and expected branch lengths of the true (model) tree. The most conspicuous difference between PC and PB is observed when the true tree is starlike, and there is a tendency for the difference to increase as the number of sequences in the tree increases. The reason for this is that the bootstrap test tends to become progressively more conservative as the number of sequences in the tree increases. Unlike the bootstrap, the interior-branch test has the same statistical properties irrespective of the number of sequences used when a predetermined tree is considered. Therefore, the interior-branch test appears to be preferable to the bootstrap test as long as unbiased estimators of evolutionary distances are used. However, when the interior-branch is applied to a tree estimated from a given data set, PC may give an overestimate of statistical confidence. For this case, we developed a method for computing a modified version (P'C) of the PC value and showed that this P'C tends to give a conservative estimate of statistical confidence, though it is not as conservative as PB. In this paper we have introduced a model in which evolutionary distances between sequences follow a multivariate normal distribution. This model allowed us to study the relationships between the two tests analytically. 相似文献

8.

Tie trees generated by distance methods of phylogenetic reconstruction 总被引：2，自引：0，他引：2

Takezaki N 《Molecular biology and evolution》1998,15(6):727-737

In examining genetic data in recent publications, Backeljau et al. showed cases in which two or more different trees (tie trees) were constructed from a single data set for the neighbor-joining (NJ) method and the unweighted pair group method with arithmetic mean (UPGMA). However, it is still unclear how often and under what conditions tie trees are generated. Therefore, I examined these problems by computer simulation. Examination of cases in which tie trees occur shows that tie trees can appear when no substitutions occur along some interior branch(es) on a tree. However, even when some substitutions occur along interior branches, tie trees can appear by chance if parallel or backward substitutions occur at some sites. The simulation results showed that tie trees occur relatively frequently for sequences with low divergence levels or with small numbers of sites. For such data, UPGMA sometimes produced tie trees quite frequently, whereas tie trees for the NJ method were generally rare. In the simulation, bootstrap values for clusters (tie clusters) that differed among tie trees were mostly low (< 60%). With a small probability, relatively high bootstrap values (at most 70%-80%) appeared for tie clusters. The bias of the bootstrap values caused by an input order of sequence can be avoided if one of the different paths in the cycles of making an NJ or UPGMA tree is chosen at random in each bootstrap replication. 相似文献

9.

Branch support via resampling: an empirical study

John V. Freudenstein Jerrold I. Davis 《Cladistics : the international journal of the Willi Hennig Society》2010,26(6):643-656

The success of resampling approaches to branch support depends on the effectiveness of the underlying tree searches. Two primary factors are identified as key: the depth of tree search and the number of trees saved per resampling replicate. Two datasets were explored for a range of search parameters using jackknifing. Greater depth of tree search tends to increase support values because shorter trees conflict less with each other, while increasing numbers of trees saved tends to reduce support values because of conflict that reduces structure in the replicate consensus. Although a relatively small amount of branch swapping will achieve near‐accurate values for a majority of clades, some clades do not yield accurate values until more extensive searches are performed. This means that in order to maximize the accuracy of resampling analyses, one should employ as extensive a search strategy as possible, and save as many trees per replicate as possible. Strict consensus summary of resampling replicates is preferable to frequency‐within‐replicates summary because it is a more conservative approach to the reporting of replicate results. Jackknife analysis is preferable to bootstrap because of its closer relationship to the original data.© The Willi Hennig Society 2010. 相似文献

10.

The bootstrapped ordination re-examined

Valrio DePatta Pillar 《植被学杂志》1999,10(6):895-902

Abstract. A method is described to determine the number of significant dimensions in metric ordination of a sample. The method is probabilistic, based on bootstrap resampling. An iterative algorithm takes bootstrap samples with replacement from the sample. It finds in each bootstrap sample ordination coordinates and computes, after Procrustean adjustments, the correlation between observed and bootstrap ordination scores. It compares this correlation to the same parameter generated in a parallel bootstrapped ordination of randomly permuted data, which upon many iterations will generate a probability. The method is assessed in principal coordinates analysis of simulated data sets that have varying number of variables and correlation levels, uniform or patterned correlation structure. The results suggest the method is more reliable than other available methods in recovering the true intrinsic dimensionality. Examples with grassland data illustrate utility. 相似文献

11.

Relative efficiencies of the maximum-parsimony and distance-matrix methods of phylogeny construction for restriction data. 总被引：4，自引：0，他引：4

J Lin M Nei 《Molecular biology and evolution》1991,8(3):356-365

The relative efficiencies of the maximum-parsimony (MP), UPGMA, and neighbor-joining (NJ) methods in obtaining the correct tree (topology) for restriction-site and restriction-fragment data were studied by computer simulation. In this simulation, six DNA sequences of 16,000 nucleotides were assumed to evolve following a given model tree. The recognition sequences of 20 different six-base restriction enzymes were used to identify the restriction sites of the DNA sequences generated. The restriction-site data and restriction-fragment data thus obtained were used to reconstruct a phylogenetic tree, and the tree obtained was compared with the model tree. This process was repeated 300 times. The results obtained indicate that when the rate of nucleotide substitution is constant the probability of obtaining the correct tree (Pc) is generally higher in the NJ method than in the MP method. However, if we use the average topological deviation from the model tree (dT) as the criterion of comparison, the NJ and MP methods are nearly equally efficient. When the rate of nucleotide substitution varies with evolutionary lineage, the NJ method is better than the MP method, whether Pc or dT is used as the criterion of comparison. With 500 nucleotides and when the number of nucleotide substitutions per site was very small, restriction-site data were, contrary to our expectation, more useful than sequence data. Restriction-fragment data were less useful than restriction-site data, except when the sequence divergence was very small. UPGMA seems to be useful only when the rate of nucleotide substitution is constant and sequence divergence is high. 相似文献

12.

Theoretical foundation of the minimum-evolution method of phylogenetic inference 总被引：26，自引：5，他引：21

Rzhetsky A; Nei M 《Molecular biology and evolution》1993,10(5):1073-1095

The minimum-evolution (ME) method of phylogenetic inference is based on the assumption that the tree with the smallest sum of branch length estimates is most likely to be the true one. In the past this assumption has been used without mathematical proof. Here we present the theoretical basis of this method by showing that the expectation of the sum of branch length estimates for the true tree is smallest among all possible trees, provided that the evolutionary distances used are statistically unbiased and that the branch lengths are estimated by the ordinary least-squares method. We also present simple mathematical formulas for computing branch length estimates and their standard errors for any unrooted bifurcating tree, with the least-squares approach. As a numerical example, we have analyzed mtDNA sequence data obtained by Vigilant et al. and have found the ME tree for 95 human and 1 chimpanzee (outgroup) sequences. The tree was somewhat different from the neighbor-joining tree constructed by Tamura and Nei, but there was no statistically significant difference between them. 相似文献

13.

Quartet-mapping, a generalization of the likelihood-mapping procedure. 总被引：5，自引：0，他引：5

K Nieselt-Struwe A von Haeseler 《Molecular biology and evolution》2001,18(7):1204-1219

Likelihood-mapping (LM) was suggested as a method of displaying the phylogenetic content of an alignment. However, statistical properties of the method have not been studied. Here we analyze the special case of a four-species tree generated under a range of evolution models and compare the results with those of a natural extension of the likelihood-mapping approach, geometry-mapping (GM), which is based on the method of statistical geometry in sequence space. The methods are compared in their abilities to indicate the correct topology. The performance of both methods in detecting the star topology is especially explored. Our results show that LM tends to reject a star tree more often than GM. When assumptions about the evolutionary model of the maximum-likelihood reconstruction are not matched by the true process of evolution, then LM shows a tendency to favor one tree, whereas GM correctly detects the star tree except for very short outer branch lengths with a statistical significance of >0.95 for all models. LM, on the other hand, reconstructs the correct bifurcating tree with a probability of >0.95 for most branch length combinations even under models with varying substitution rates. The parameter domain for which GM recovers the true tree is much smaller. When the exterior branch lengths are larger than a (analytically derived) threshold value depending on the tree shape (rather than the evolutionary model), GM reconstructs a star tree rather than the true tree. We suggest a combined approach of LM and GM for the evaluation of starlike trees. This approach offers the possibility of testing for significant positive interior branch lengths without extensive statistical and computational efforts. 相似文献

14.

Inferring genome trees by using a filter to eliminate phylogenetically discordant sequences and a distance matrix based on mean normalized BLASTP scores

下载免费PDF全文

Clarke GD Beiko RG Ragan MA Charlebois RL 《Journal of bacteriology》2002,184(8):2072-2080

Darwin's paradigm holds that the diversity of present-day organisms has arisen via a process of genetic descent with modification, as on a bifurcating tree. Evidence is accumulating that genes are sometimes transferred not along lineages but rather across lineages. To the extent that this is so, Darwin's paradigm can apply only imperfectly to genomes, potentially complicating or perhaps undermining attempts to reconstruct historical relationships among genomes (i.e., a genome tree). Whether most genes in a genome have arisen via treelike (vertical) descent or by lateral transfer across lineages can be tested if enough complete genome sequences are used. We define a phylogenetically discordant sequence (PDS) as an open reading frame (ORF) that exhibits patterns of similarity relationships statistically distinguishable from those of most other ORFs in the same genome. PDSs represent between 6.0 and 16.8% (mean, 10.8%) of the analyzable ORFs in the genomes of 28 bacteria, eight archaea, and one eukaryote (Saccharomyces cerevisiae). In this study we developed and assessed a distance-based approach, based on mean pairwise sequence similarity, for generating genome trees. Exclusion of PDSs improved bootstrap support for basal nodes but altered few topological features, indicating that there is little systematic bias among PDSs. Many but not all features of the genome tree from which PDSs were excluded are consistent with the 16S rRNA tree. 相似文献

15.

Spurious 99% bootstrap and jackknife support for unsupported clades

Simmons MP Freudenstein JV 《Molecular phylogenetics and evolution》2011,61(1):177-191

Quantifying branch support using the bootstrap and/or jackknife is generally considered to be an essential component of rigorous parsimony and maximum likelihood phylogenetic analyses. Previous authors have described how application of the frequency-within-replicates approach to treating multiple equally optimal trees found in a given bootstrap pseudoreplicate can provide apparent support for otherwise unsupported clades. We demonstrate how a similar problem may occur when a non-representative subset of equally optimal trees are held per pseudoreplicate, which we term the undersampling-within-replicates artifact. We illustrate the frequency-within-replicates and undersampling-within-replicates bootstrap and jackknife artifacts using both contrived and empirical examples, demonstrate that the artifacts can occur in both parsimony and likelihood analyses, and show that the artifacts occur in outputs from multiple different phylogenetic-inference programs. Based on our results, we make the following five recommendations, which are particularly relevant to supermatrix analyses, but apply to all phylogenetic analyses. First, when two or more optimal trees are found in a given pseudoreplicate they should be summarized using the strict-consensus rather than frequency-within-replicates approach. Second jackknife resampling should be used rather than bootstrap resampling. Third, multiple tree searches while holding multiple trees per search should be conducted in each pseudoreplicate rather than conducting only a single search and holding only a single tree. Fourth, branches with a minimum possible optimized length of zero should be collapsed within each tree search rather than collapsing branches only if their maximum possible optimized length is zero. Fifth, resampling values should be mapped onto the strict consensus of all optimal trees found rather than simply presenting the ≥ 50% bootstrap or jackknife tree or mapping the resampling values onto a single optimal tree. 相似文献

16.

Evolutionary Relationships Among the Eukaryotic Crown Taxa Taking into Account Site-to-Site Rate Variation in 18S rRNA

Yves Van de Peer Rupert De Wachter 《Journal of molecular evolution》1997,45(6):619-630

In this study we constructed a bootstrapped distance tree of 500 small subunit ribosomal RNA sequences from organisms belonging to the so-called crown of eukaryote evolution. Taking into account the substitution rate of the individual nucleotides of the rRNA sequence alignment, our results suggest that (1) animals, true fungi, and choanoflagellates share a common origin: The branch joining these taxa is highly supported by bootstrap analysis (bootstrap support [BS] > 90%), (2) stramenopiles and alveolates are sister groups (BS = 75%), (3) within the alveolates, dinoflagellates and apicomplexans share a common ancestor BS > 95%), while in turn they both share a common origin with the ciliates (BS > 80%), and (4) within the stramenopiles, heterokont algae, hyphochytriomycetes, and oomycetes form a monophyletic grouping well supported by bootstrap analysis (BS > 85%), preceded by the well-supported successive divergence of labyrinthulomycetes and bicosoecids. On the other hand, many evolutionary relationships between crown taxa are still obscure on the basis of 18S rRNA. The branching order between the animal-fungal-choanoflagellates clade and the chlorobionts, the alveolates and stramenopiles, red algae, and several smaller groups of organisms remains largely unresolved. When among-site rate variation is not considered, the inferred tree topologies are inferior to those where the substitution rate spectrum for the 18S rRNA is taken into account. This is primarily indicated by the erroneous branching of fast-evolving sequences. Moreover, when different substitution rates among sites are not considered, the animals no longer appear as a monophyletic grouping in most distance trees. Received: 11 June 1997 / Accepted: 21 July 1997 相似文献

17.

Genetic and phylogenetic studies of Chinese native sheep breeds (Ovis aries) based on mtDNA D-loop sequences

《Small Ruminant Research》2007,73(2-3):232-236

To determine the genetic diversity and the origin of Chinese sheep, we analyzed 83 complete sequences of mtDNA D-loop from nine Chinese sheep breeds and a foreign breed, together with nine sheep and cattle available sequences from GenBank. The length of the sequences was considerably variable between 1103 and 1225 bp. The hapolotype diversity was 92.7%. The nucleotide diversity was 3.058%. And the mean nucleotide composition of the 83 sequences was 32.9% A, 29.8% T, 22.9% C and 14.4% G, respectively. The NJ phylogenetic tree (the number of replications of bootstrap test is 1000) revealed that there were three distinct major domestic sheep lineages (termed as lineages A–C) in the 10 breeds. The result indicated that Chinese native sheep breeds derive from three different maternal sources. The mismatch distribution analysis showed that the Fs values were −25.15, −12.28, −8.60 for the lineages A–C, respectively (P < 0.01), which suggested that atleast one population expansion events occur in the demographic history of Chinese sheep breeds. 相似文献

18.

Linkage of Viral Sequences among HIV-Infected Village Residents in Botswana: Estimation of Linkage Rates in the Presence of Missing Data

Nicole Bohme Carnegie Rui Wang Vladimir Novitsky Victor De Gruttola 《PLoS computational biology》2014,10(1)

Linkage analysis is useful in investigating disease transmission dynamics and the effect of interventions on them, but estimates of probabilities of linkage between infected people from observed data can be biased downward when missingness is informative. We investigate variation in the rates at which subjects'' viral genotypes link across groups defined by viral load (low/high) and antiretroviral treatment (ART) status using blood samples from household surveys in the Northeast sector of Mochudi, Botswana. The probability of obtaining a sequence from a sample varies with viral load; samples with low viral load are harder to amplify. Pairwise genetic distances were estimated from aligned nucleotide sequences of HIV-1C env gp120. It is first shown that the probability that randomly selected sequences are linked can be estimated consistently from observed data. This is then used to develop estimates of the probability that a sequence from one group links to at least one sequence from another group under the assumption of independence across pairs. Furthermore, a resampling approach is developed that accounts for the presence of correlation across pairs, with diagnostics for assessing the reliability of the method. Sequences were obtained for 65% of subjects with high viral load (HVL, n = 117), 54% of subjects with low viral load but not on ART (LVL, n = 180), and 45% of subjects on ART (ART, n = 126). The probability of linkage between two individuals is highest if both have HVL, and lowest if one has LVL and the other has LVL or is on ART. Linkage across groups is high for HVL and lower for LVL and ART. Adjustment for missing data increases the group-wise linkage rates by 40–100%, and changes the relative rates between groups. Bias in inferences regarding HIV viral linkage that arise from differential ability to genotype samples can be reduced by appropriate methods for accommodating missing data. 相似文献

19.

Performance of the relative-rate test under nonstationary models of nucleotide substitution. 总被引：1，自引：0，他引：1

N J Tourasse W H Li 《Molecular biology and evolution》1999,16(8):1068-1078

Relative-rate tests have previously been developed to compare the substitution rates of two sequences or two groups of sequences. These tests usually assume that the process of nucleotide substitution is stationary and the same for all lineages, i.e., uniform. In this study, we conducted simulations to assess the performance of the relative-rate tests when the molecular-clock (MC) hypothesis is true (i.e., there is no rate difference between lineages), but the stationarity and uniformity assumptions are violated. Kimura's and bias-corrected LogDet distances were used. We found that the computation of the variances and covariances of LogDet distances had to be modified, because the constraint that the sum of the frequencies of the 16 nucleotide pair types is equal to 1 must be imposed. Comparison of the rates of two single sequences (Wu and Li's test) or two groups of sequences (Li and Bousquet's test) gave similar results. When the sequences are long (> or = 500 nt), the test based on LogDet distances and their appropriate variances and covariances is appropriate even when the substitution process is not stationary and/or not uniform. That is, at the 5% significance level, the test rejects the MC hypothesis in about 5% of the simulation replicates. In contrast, if the sequences are short (< or = 200 bases) and highly divergent, the LogDet test is very conservative due to overestimation of the variances of the distances. When the uniformity assumption is violated, the relative-rate test based on Kimura's distances can be severely misleading because of differences in base composition between sequences. However, if the uniformity assumption held and so the base frequencies remained similar among sequences, the rate of rejection turned out to be close to 5%, especially with short sequences. Under such conditions, the test using Kimura's distances performs better than the LogDet test. The reason seems to be that these distances are less affected by a reduction in the number of sites than the LogDet distances because they depend on only two parameters. 相似文献

20.

Substitution Model of Sequence Evolution for the Human Immunodeficiency Virus Type 1 Subtype B gp120 Gene over the C2-V5 Region

Jon P. Anderson Allen G. Rodrigo Gerald H. Learn Yang Wang Hillard Weinstock Marcia L. Kalish Kenneth E. Robbins Leroy Hood James I. Mullins 《Journal of molecular evolution》2001,53(1):55-62

Phylogenetic analyses frequently rely on models of sequence evolution that detail nucleotide substitution rates, nucleotide frequencies, and site-to-site rate heterogeneity. These models can influence hypothesis testing and can affect the accuracy of phylogenetic inferences. Maximum likelihood methods of simultaneously constructing phylogenetic tree topologies and estimating model parameters are computationally intensive, and are not feasible for sample sizes of 25 or greater using personal computers. Techniques that initially construct a tree topology and then use this non-maximized topology to estimate ML substitution rates, however, can quickly arrive at a model of sequence evolution. The accuracy of this two-step estimation technique was tested using simulated data sets with known model parameters. The results showed that for a star-like topology, as is often seen in human immunodeficiency virus type 1 (HIV-1) subtype B sequences, a random starting topology could produce nucleotide substitution rates that were not statistically different than the true rates. Samples were isolated from 100 HIV-1 subtype B infected individuals from the United States and a 620 nt region of the env gene was sequenced for each sample. The sequence data were used to obtain a substitution model of sequence evolution specific for HIV-1 subtype B env by estimating nucleotide substitution rates and the site-to-site heterogeneity in 100 individuals from the United States. The method of estimating the model should provide users of large data sets with a way to quickly compute a model of sequence evolution, while the nucleotide substitution model we identified should prove useful in the phylogenetic analysis of HIV-1 subtype B env sequences. Received: 4 October 2000 / Accepted: 1 March 2001 相似文献