At the early stage of infection, human immunodeficiency virus (HIV)-1 predominantly uses the CCR5 coreceptor for host cell entry. The subsequent emergence of HIV variants that use the CXCR4 coreceptor in roughly half of all infections is associated with an accelerated decline of CD4+ T-cells and rate of progression to AIDS. The presence of a ‘fitness valley’ separating CCR5- and CXCR4-using genotypes is postulated to be a biological determinant of whether the HIV coreceptor switch occurs. Using phylogenetic methods to reconstruct the evolutionary dynamics of HIV within hosts enables us to discriminate between competing models of this process. We have developed a phylogenetic pipeline for the molecular clock analysis, ancestral reconstruction, and visualization of deep sequence data. These data were generated by next-generation sequencing of HIV RNA extracted from longitudinal serum samples (median 7 time points) from 8 untreated subjects with chronic HIV infections (Amsterdam Cohort Studies on HIV-1 infection and AIDS). We used the known dates of sampling to directly estimate rates of evolution and to map ancestral mutations to a reconstructed timeline in units of days. HIV coreceptor usage was predicted from reconstructed ancestral sequences using the geno2pheno algorithm. We determined that the first mutations contributing to CXCR4 use emerged about 16 (per subject range 4 to 30) months before the earliest predicted CXCR4-using ancestor, which preceded the first positive cell-based assay of CXCR4 usage by 10 (range 5 to 25) months. CXCR4 usage arose in multiple lineages within 5 of 8 subjects, and ancestral lineages following alternate mutational pathways before going extinct were common. We observed highly patient-specific distributions and time-scales of mutation accumulation, implying that the role of a fitness valley is contingent on the genotype of the transmitted variant.  相似文献   



Coreceptor switch from CCR5 to CXCR4 is associated with HIV disease progression. The molecular and evolutionary mechanisms underlying the CCR5 to CXCR4 switch are the focus of intense recent research. We studied the HIV-1 tropism dynamics in relation to coreceptor usage, the nature of quasispecies from ultra deep sequencing (UDPS) data and their phylogenetic relationships.


Here, we characterized C2-V3-C3 sequences of HIV obtained from 19 patients followed up for 54 to 114 months using UDPS, with further genotyping and phylogenetic analysis for coreceptor usage. HIV quasispecies diversity and variability as well as HIV plasma viral load were measured longitudinally and their relationship with the HIV coreceptor usage was analyzed. The longitudinal UDPS data were submitted to phylogenetic analysis and sampling times and coreceptor usage were mapped onto the trees obtained.


Although a temporal viral genetic structuring was evident, the persistence of several viral lineages evolving independently along the infection was statistically supported, indicating a complex scenario for the evolution of viral quasispecies. HIV X4-using variants were present in most of our patients, exhibiting a dissimilar inter- and intra-patient predominance as the component of quasispecies even on antiretroviral therapy. The viral populations from some of the patients studied displayed evidences of the evolution of X4 variants through fitness valleys, whereas for other patients the data favored a gradual mode of emergence.


CXCR4 usage can emerge independently, in multiple lineages, along the course of HIV infection. The mode of emergence, i.e. gradual or through fitness valleys seems to depend on both virus and patient factors. Furthermore, our analyses suggest that, besides becoming dominant after population-level switches, minor proportions of X4 viruses might exist along the infection, perhaps even at early stages of it. The fate of these minor variants might depend on both viral and host factors.  相似文献   

matK gene, which is located in the chloroplast genome and evolves more quickly than the rbcL gene. A total of 31 species representing 31 of the 59 genera in the family were examined in this study. We also used 21 species from another ten families of Asparagales, four species from three families of Liliales and Acorus as outgroups. We obtained partial sequences of matK with lengths of 1,109–1,148 bp, corresponding to positions 230 to 1,343 of the Oryza sativa matK gene. The pairwise percentage sequence divergence ranged from 0 to 19.1% for all the species examined except Acorus, and 0 to 4.6% within Amaryllidaceae. Two methods of phylogenetic analysis, the Maximum Parsimony and Neighbor-Joining methods, were used. The trees obtained from these two analyses were fundamentally consistent. In both trees, the Amaryllidaceae sensu Dahlgren et al. formed a well-supported monophyletic clade with 100% bootstrap support. Amaryllidaceae were included in the Asparagales; however, its phylogenetic position within the Asparagales was not clearly resolved. Judging from the NJ tree, Agapanthus might be a sister group of the Amaryllidaceae, although bootstrap support for this was low. Character-state mapping was used to infer a center of origin and the biogeographic history of Amaryllidaceae. The result supports the hypothesis that the family evolved in Africa and subsequently spread to other continents, further suggesting that South America is the center of secondary diversification. Received 6 January 1999/ Accepted in revised form 8 April 1999  相似文献   

Regulatory networks play a central role in cellular behavior and decision making. Learning these regulatory networks is a major task in biology, and devising computational methods and mathematical models for this task is a major endeavor in bioinformatics. Boolean networks have been used extensively for modeling regulatory networks. In this model, the state of each gene can be either ‘on’ or ‘off’ and that next-state of a gene is updated, synchronously or asynchronously, according to a Boolean rule that is applied to the current-state of the entire system. Inferring a Boolean network from a set of experimental data entails two main steps: first, the experimental time-series data are discretized into Boolean trajectories, and then, a Boolean network is learned from these Boolean trajectories. In this paper, we consider three methods for data discretization, including a new one we propose, and three methods for learning Boolean networks, and study the performance of all possible nine combinations on four regulatory systems of varying dynamics complexities. We find that employing the right combination of methods for data discretization and network learning results in Boolean networks that capture the dynamics well and provide predictive power. Our findings are in contrast to a recent survey that placed Boolean networks on the low end of the “faithfulness to biological reality” and “ability to model dynamics” spectra. Further, contrary to the common argument in favor of Boolean networks, we find that a relatively large number of time points in the time-series data is required to learn good Boolean networks for certain data sets. Last but not least, while methods have been proposed for inferring Boolean networks, as discussed above, missing still are publicly available implementations thereof. Here, we make our implementation of the methods available publicly in open source at http://bioinfo.cs.rice.edu/.  相似文献   

The GenBank database contains essentially all of the nucleotide sequence data generated for published molecular systematic studies, but for the majority of taxa these data remain sparse. GenBank has value for phylogenetic methods that leverage data–mining and rapidly improving computational methods, but the limits imposed by the sparse structure of the data are not well understood. Here we present a tree representing 13,093 land plant genera—an estimated 80% of extant plant diversity—to illustrate the potential of public sequence data for broad phylogenetic inference in plants, and we explore the limits to inference imposed by the structure of these data using theoretical foundations from phylogenetic data decisiveness. We find that despite very high levels of missing data (over 96%), the present data retain the potential to inform over 86.3% of all possible phylogenetic relationships. Most of these relationships, however, are informed by small amounts of data—approximately half are informed by fewer than four loci, and more than 99% are informed by fewer than fifteen. We also apply an information theoretic measure of branch support to assess the strength of phylogenetic signal in the data, revealing many poorly supported branches concentrated near the tips of the tree, where data are sparse and the limiting effects of this sparseness are stronger. We argue that limits to phylogenetic inference and signal imposed by low data coverage may pose significant challenges for comprehensive phylogenetic inference at the species level. Computational requirements provide additional limits for large reconstructions, but these may be overcome by methodological advances, whereas insufficient data coverage can only be remedied by additional sampling effort. We conclude that public databases have exceptional value for modern systematics and evolutionary biology, and that a continued emphasis on expanding taxonomic and genomic coverage will play a critical role in developing these resources to their full potential.  相似文献   

通过线粒体matR基因序列分析探讨了山茶科的分类学范围和系统演化关系。结果显示,传统山茶科的两个核心——山茶亚科(Theoideae或Camellioideae)和厚皮香亚科(Ternstroemioideae)不构成姐妹群关系,山茶亚科是一个支持率很高的单系类群,厚皮香亚科没有形成单系;山茶亚科下可区分出3个明显的分支,基部的分支由紫茎属(Stewartia)和舟柄茶属(Hartia)组成,木荷属(Schima)、美洲荷属(Franklirda)和美国大头茶属(Gordonia)构成第2个分支,该分支与由山茶属(Camellia)、核果茶属(Pyrenaria)、多瓣核果茶属(Parapyrenaria)、石笔木属(Tutcheria)、大头荣属(Polyspora)和圆籽荷属(Aptersperma)组成的第3个分支互为姐妹群。研究结果很好地支持了Prince和Parks等学者提出的的狭义山茶科(仅含山茶亚科)和狭义大头茶属的概念以及科下3个族(紫茎族Stewartieae、大头茶族Gordonieae和山茶族Theeae)的划分。但本研究更为清晰地揭示了科下3个族间的系统关系,即紫茎族是最基部的分支,山茶族与大头茶族间有更近的亲缘关系。同时,本文认为,厚皮香(亚)科是否为单系类群值得进一步研究。  相似文献   

Hepatitis C virus (HCV) remains a challenging public health problem worldwide. The identification of viral variants establishing de novo infections and definition of the phenotypic requirements for transmission would facilitate the design of preventive strategies. We explored the transmission of HCV variants in three cases of acute hepatitis following needlestick accidents. We used single-genome amplification of glycoprotein E1E2 gene sequences to map the genetic bottleneck upon transmission accurately. We found that infection was likely established by a single variant in two cases and six variants in the third case. Studies of donor samples showed that the transmitted variant E1E2 amino acid sequences were identical or closely related to those of variants from the donor virus populations. The transmitted variants harbored a common signature site at position 394, within hypervariable region 1 of E2, together with additional signature amino acids specific to each transmission pair. Surprisingly, these E1E2 variants conferred no greater capacity for entry than the E1E2 derived from nontransmitted variants in lentiviral pseudoparticle assays. Mutants escaping the antibodies of donor sera did not predominate among the transmitted variants either. The fitness parameters affecting the selective outgrowth of HCV variants after transmission in an immunocompetent host may thus be more complex than those suggested by mouse models. Human antibodies directed against HCV envelope effectively cross-neutralized the lentiviral particles bearing E1E2 derived from transmitted variants. These findings provide insight into the molecular mechanisms underlying HCV transmission and suggest that viral entry is a potential target for the prevention of HCV infection.  相似文献   

Diversity and phylogenetic relationships of New Zealand representatives of the red algal order Gelidiales have been examined using rbcL sequence data. Extensive field collections have been made from throughout the New Zealand region. Six genera have been reported previously from New Zealand (Capreolia, Gelidium, Pterocladia, Pterocladiella, Pterocladiastrum, Ptilophora). This research has revealed species with very restricted local distributions, as well as the discovery of several undescribed, cryptic taxa. The common and widespread Gelidium caulacantheum is confirmed to be more closely related to Capreolia than to other species of Gelidium. The generic concept of Capreolia, based on life history characters, will need to be modified to accommodate additional species possessing “Gelidium” life histories. A species endemic to New Zealand, Gelidium ceramoides, has been found to differ significantly from all other members of the Gelidiales and requires reclassification in another genus and order. Examination of field collections and herbarium specimens in addition to molecular sequence data have led us to conclude that specimens previously placed in the genera Ptilophora and Pterocladiastrum belong within Pterocladia lucida.  相似文献   

Next-generation sequencing has made possible the detection of rare variant (RV) associations with quantitative traits (QT). Due to high sequencing cost, many studies can only sequence a modest number of selected samples with extreme QT. Therefore association testing in individual studies can be underpowered. Besides the primary trait, many clinically important secondary traits are often measured. It is highly beneficial if multiple studies can be jointly analyzed for detecting associations with commonly measured traits. However, analyzing secondary traits in selected samples can be biased if sample ascertainment is not properly modeled. Some methods exist for analyzing secondary traits in selected samples, where some burden tests can be implemented. However p-values can only be evaluated analytically via asymptotic approximations, which may not be accurate. Additionally, potentially more powerful sequence kernel association tests, variable selection-based methods, and burden tests that require permutations cannot be incorporated. To overcome these limitations, we developed a unified method for analyzing secondary trait associations with RVs (STAR) in selected samples, incorporating all RV tests. Statistical significance can be evaluated either through permutations or analytically. STAR makes it possible to apply more powerful RV tests to analyze secondary trait associations. It also enables jointly analyzing multiple cohorts ascertained under different study designs, which greatly boosts power. The performance of STAR and commonly used RV association tests were comprehensively evaluated using simulation studies. STAR was also implemented to analyze a dataset from the SardiNIA project where samples with extreme low-density lipoprotein levels were sequenced. A significant association between LDLR and systolic blood pressure was identified, which is supported by pharmacogenetic studies. In summary, for sequencing studies, STAR is an important tool for detecting secondary-trait RV associations.  相似文献   

Next generation sequencing of mitochondrial DNA (mtDNA) facilitates studies into the metabolic characteristics of production animals and their relation to production traits. Sequence analysis of mtDNA from pure-bred swine with highly disparate production characteristics (Mangalica Blonde, Mangalica Swallow-bellied, Meishan, Turopolje, and Yorkshire) was initiated to evaluate the influence of mtDNA polymorphisms on mitochondrial function. Herein, we report the complete mtDNA sequences of five Sus scrofa breeds and evaluate their position within the phylogeny of domestic swine. Phenotypic traits of Yorkshire, Mangalica Blonde, and Swallow-belly swine are presented to demonstrate their metabolic characteristics. Our data support the division of European and Asian breeds noted previously and confirm European ancestry of Mangalica and Turopolje breeds. Furthermore, mtDNA differences between breeds suggest function-altering changes in proteins involved in oxidative phosphorylation such as ATP synthase 6 (MT-ATP6), cytochrome oxidase I (MT-CO1), cytochrome oxidase III (MT-CO3), and cytochrome b (MT-CYB), supporting the hypothesis that mtDNA polymorphisms contribute to differences in metabolic traits between swine breeds. Our sequence data form the basis for future research into the roles of mtDNA in determining production traits in domestic animals. Additionally, such studies should provide insight into how mtDNA haplotype influences the extreme adiposity observed in Mangalica breeds.  相似文献   

Testing fit of data to model is fundamentally important to any science, but publications in the field of phylogenetics rarely do this. Such analyses discard fundamental aspects of science as prescribed by Karl Popper. Indeed, not without cause, Popper (Unended quest: an intellectual autobiography. Fontana, London, 1976) once argued that evolutionary biology was unscientific as its hypotheses were untestable. Here we trace developments in assessing fit from Penny et al. (Nature 297:197–200, 1982) to the present. We compare the general log-likelihood ratio (the G or G 2 statistic) statistic between the evolutionary tree model and the multinomial model with that of marginalized tests applied to an alignment (using placental mammal coding sequence data). It is seen that the most general test does not reject the fit of data to model (P ~ 0.5), but the marginalized tests do. Tests on pairwise frequency (F) matrices, strongly (P < 0.001) reject the most general phylogenetic (GTR) models commonly in use. It is also clear (P < 0.01) that the sequences are not stationary in their nucleotide composition. Deviations from stationarity and homogeneity seem to be unevenly distributed amongst taxa; not necessarily those expected from examining other regions of the genome. By marginalizing the 4 t patterns of the i.i.d. model to observed and expected parsimony counts, that is, from constant sites, to singletons, to parsimony informative characters of a minimum possible length, then the likelihood ratio test regains power, and it too rejects the evolutionary model with P ≪ 0.001. Given such behavior over relatively recent evolutionary time, readers in general should maintain a healthy skepticism of results, as the scale of the systematic errors in published trees may really be far larger than the analytical methods (e.g., bootstrap) report.  相似文献   

目的:构建出地衣植物核糖体rDNA(nrDNA)的ITS序列的系统发育树并探讨地衣植物的DNA条形码.方法:以黑龙江五大连池风景区的地衣植物为材料,采用特异性引物对地衣植物的ITS序列进行Pcr扩增,直接对其Pcr产物进行测序,利用MEGA4.0软件建立地衣植物的ITS序列的系统发育树.结果:根据系统发育分析得出一致性指数CI和维持性指数RI分别为0 5356和0.6602,相同属地衣的样本间即种内的遗传距离 和不同属的样本间即种间的遗传距离(K-2-P)平均值分别为0.030和0.600,种间距离大于种内距离.结论:根据地衣植物样本间的遗传距离(K-2-P)的分析,得出核糖体rDNA的ITS基因对地衣近缘属的分类鉴定上具有一定的参考价值,建议作为地衣分类鉴定的条形码的测试片段.  相似文献   

Gene set methods aim to assess the overall evidence of association of a set of genes with a phenotype, such as disease or a quantitative trait. Multiple approaches for gene set analysis of expression data have been proposed. They can be divided into two types: competitive and self-contained. Benefits of self-contained methods include that they can be used for genome-wide, candidate gene, or pathway studies, and have been reported to be more powerful than competitive methods. We therefore investigated ten self-contained methods that can be used for continuous, discrete and time-to-event phenotypes. To assess the power and type I error rate for the various previously proposed and novel approaches, an extensive simulation study was completed in which the scenarios varied according to: number of genes in a gene set, number of genes associated with the phenotype, effect sizes, correlation between expression of genes within a gene set, and the sample size. In addition to the simulated data, the various methods were applied to a pharmacogenomic study of the drug gemcitabine. Simulation results demonstrated that overall Fisher''s method and the global model with random effects have the highest power for a wide range of scenarios, while the analysis based on the first principal component and Kolmogorov-Smirnov test tended to have lowest power. The methods investigated here are likely to play an important role in identifying pathways that contribute to complex traits.  相似文献   

To aid in future efforts to accurately reconstruct the vertebrate tree, a quantitative measure of phylogenetic informativeness was applied to nucleotide and amino acid sequences for a set of 11 genes. We identified orthologues and assembled published fossil-calibrated divergence times between taxa that had been sequenced for each gene. Rates of molecular evolution for each site were estimated to characterize the molecular evolutionary pattern of genes and to calculate the phylogenetic informativeness. The fast-evolving gene albumin yielded the highest informativeness over the period from 60 million years ago to 500 million years ago. In contrast, calmodulin yielded the lowest informativeness, presumably because functional constraint minimized substitutions in the amino acid sequence. The gene c-myc showed an intermediate level of informativeness. The nucleotide sequence of cytochrome b showed extremely high utility for recent epochs, but low utility for times before 100 million years ago. We ranked nine other genes for their utility during the epochs of the divergence of the muroid rodents, early placental mammals, early vertebrates, and early metazoa, yielding results consistent with, but more precise than, previous studies. Interestingly, DNA sequence always exceeded amino acid sequence in informativeness over all time scales, yet support values were at best moderately higher. For epochs not subject to strong phylogenetic conflict due to convergence, we advocate gleaning the additional power of the threefold increase in number of characters that is present for DNA sequences over resorting to the less noisy but less informative amino acid sequences.  相似文献   

HIV RNA viral load (VL) is a pivotal outcome variable in studies of HIV infected persons. We propose and investigate two frameworks for analyzing VL: (1) a single-measure VL (SMVL) per participant and (2) repeated measures of VL (RMVL) per participant. We compared these frameworks using a cohort of 720 HIV patients in care (4,679 post-enrollment VL measurements). The SMVL framework analyzes a single VL per participant, generally captured within a “window” of time. We analyzed three SMVL methods where the VL binary outcome is defined as suppressed or not suppressed. The omit-participant method uses a 8-month “window” (-6/+2 months) around month 24 to select the participant’s VL closest to month 24 and removes participants from the analysis without a VL in the “window”. The set-to-failure method expands on the omit-participant method by including participants without a VL within the “window” and analyzes them as not suppressed. The closest-VL method analyzes each participant’s VL measurement closest to month 24. We investigated two RMVL methods: (1) repeat-binary classifies each VL measurement as suppressed or not suppressed and estimates the proportion of participants suppressed at month 24, and (2) repeat-continuous analyzes VL as a continuous variable to estimate the change in VL across time, and geometric mean (GM) VL and proportion of participants virally suppressed at month 24. Results indicated the RMVL methods have more precision than the SMVL methods, as evidenced by narrower confidence intervals for estimates of proportion suppressed and risk ratios (RR) comparing demographic strata. The repeat-continuous method had the most precision and provides more information than other considered methods. We generally recommend using the RMVL framework when there are repeated VL measurements per participant because it utilizes all available VL data, provides additional information, has more statistical power, and avoids the subjectivity of defining a “window.”  相似文献   

基于nrDNAITS序列数据的兰属系统发育关系的初步分析(英)   总被引:6,自引:0,他引:6  
现存的兰属分类系统是基于宏观形态学性状、尤其是花粉块的数目以及唇瓣与蕊柱的愈合程度而建立的。兰属因此而划分为 3个亚属 :兰亚属 (subgenusCymbidium) ,大花亚属 (subgenusCyperorchis)和建兰亚属 (subgenusJensoa)。本文运用PCR扩增和直接测序的方法分析兰属 (Cymbidium) 2 7种、3个栽培品种以及 3个外类群的核DNAITS区段序列。通过最简约性分析产生的ITS系统发育树表明 ,兰属的 3个亚属均可能为不自然的类群。大花亚属表现为一复系群 ,兰亚属的冬凤兰 (C .dayanum )隐藏于其中 ;建兰亚属为一并系群 ,它的成员之一兔耳兰(C .lancifolium)偏离出去而成为兰属一最基部的分支 ;兰亚属为一复系群 ,它分为几支而分别与另两个亚属组合在一起。由于兰属ITS序列位点变异率较低 ,最简约性分析产生的几支主要分支均得不到Bootstrap分析的高度支持 ,各亚属内组之间的关系也不明确。研究兰属的系统发育关系还需要新的数据。  相似文献   

现存的兰属分类系统是基于宏观形态学性状、尤其是花粉块的数目以及唇瓣与蕊柱的愈合程度而建立的.兰属因此而划分为3个亚属:兰亚属 (subgenus Cymbidium),大花亚属(subgenus Cyperorchis) 和建兰亚属 (subgenus Jensoa).本文运用PCR扩增和直接测序的方法分析兰属 (Cymbidium) 27种、3个栽培品种以及3个外类群的核DNA ITS 区段序列.通过最简约性分析产生的ITS系统发育树表明,兰属的3个亚属均可能为不自然的类群.大花亚属表现为一复系群,兰亚属的冬凤兰 (C.dayanum) 隐藏于其中;建兰亚属为一并系群,它的成员之一兔耳兰 (C.lancifolium) 偏离出去而成为兰属一最基部的分支;兰亚属为一复系群,它分为几支而分别与另两个亚属组合在一起.由于兰属ITS序列位点变异率较低,最简约性分析产生的几支主要分支均得不到Bootstrap分析的高度支持,各亚属内组之间的关系也不明确.研究兰属的系统发育关系还需要新的数据.  相似文献   

Rooted phylogenetic networks are primarily used to represent conflicting evolutionary information and describe the reticulate evolutionary events in phylogeny. So far a lot of methods have been presented for constructing rooted phylogenetic networks, of which the methods based on the decomposition property of networks and by means of the incompatible graph (such as the CASS, the LNETWORK and the BIMLR) are more efficient than other available methods. The paper will discuss and compare these methods by both the practical and artificial datasets, in the aspect of the running time of the methods and the effective of constructed phylogenetic networks. The results show that the LNETWORK can construct much simper networks than the others.  相似文献   

