首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Absolute fast converging phylogenetic reconstruction methods are provably guaranteed to recover the true tree with high probability from sequences that grow only polynomially in the number of leaves, once the edge lengths are bounded arbitrarily from above and below. Only a few methods have been determined to be absolute fast converging; these have all been developed in just the last few years, and most are polynomial time. In this paper, we compare pre-existing fast converging methods as well as some new polynomial time methods that we have developed. Our study, based upon simulating evolution under a wide range of model conditions, establishes that our new methods outperform both neighbor joining and the previous fast converging methods, returning very accurate large trees, when these other methods do poorly.  相似文献   

2.
《Genomics》2021,113(2):728-739
Candida albicans and non-albicans Candida spp. are major cause of systemic mycoses. Antifungal drugs such as azoles and polyenes are not efficient to successfully eradicate Candida infection owing to their fungistatic nature or low bioavailability. Here, we have adopted a comprehensive computational workflow for identification, prioritization and validation of targets from proteomes of Candida albicans and Candida tropicalis. The protocol involves identification of essential drug-target candidates using subtractive genomics, protein-protein interaction network properties and systems biology based methods. The essentiality of the novel metabolic and non-metabolic targets was established by performing in silico gene knockouts, under aerobic as well as anaerobic conditions, and in vitro drug inhibition assays respectively. Deletion of twelve genes that are involved in amino acid, secondary metabolite, and carbon metabolism showed zero growth in metabolic model under simulated conditions. The algorithm, used in this study, can be downloaded from http://pbit.bicnirrh.res.in/offline.php and executed locally.  相似文献   

3.
MOTIVATION: The task of engineering a protein to perform a target biological function is known as protein design. A commonly used paradigm casts this functional design problem as a structural one, assuming a fixed backbone. In probabilistic protein design, positional amino acid probabilities are used to create a random library of sequences to be simultaneously screened for biological activity. Clearly, certain choices of probability distributions will be more successful in yielding functional sequences. However, since the number of sequences is exponential in protein length, computational optimization of the distribution is difficult. RESULTS: In this paper, we develop a computational framework for probabilistic protein design following the structural paradigm. We formulate the distribution of sequences for a structure using the Boltzmann distribution over their free energies. The corresponding probabilistic graphical model is constructed, and we apply belief propagation (BP) to calculate marginal amino acid probabilities. We test this method on a large structural dataset and demonstrate the superiority of BP over previous methods. Nevertheless, since the results obtained by BP are far from optimal, we thoroughly assess the paradigm using high-quality experimental data. We demonstrate that, for small scale sub-problems, BP attains identical results to those produced by exact inference on the paradigmatic model. However, quantitative analysis shows that the distributions predicted significantly differ from the experimental data. These findings, along with the excellent performance we observed using BP on the smaller problems, suggest potential shortcomings of the paradigm. We conclude with a discussion of how it may be improved in the future.  相似文献   

4.
Phylogenetic trees underlie our understanding of yeast evolution and are also proving instrumental in the development of a more robust yeast classification system based upon natural (i.e. evolutionary) relationships. In an effort to refine/improve taxonomic resolution, recent studies have focused on the use of multigene rather than single gene sequencing. Nevertheless, searches to determine 'the tree' remain problematic, as they can often overlook conflicts in the dataset. In such instances, phylogenetic networks such as neighbor-nets and consensus networks can provide a more useful and indeed more informative alternative means of analysis. In this study, we have used the latter two phylogenetic network techniques to reanalyze the multigene sequence dataset of Kurtzman & Robnett, which was used to redefine the taxonomy of the family Saccharomycetaceae. Results from our analyses show that, in general, established clades are robust. However, they also reveal conflict between mitochondrial- and nuclear-encoded genes and indicate the existence of complex patterns of hybridization and introgression not detected in the original study. These patterns are discussed in relation to how they may impact upon the current classification of this group of yeasts.  相似文献   

5.
To understand patterns and processes of the diversification of life, we require an accurate understanding of taxon interrelationships. Recent studies have suggested that analyses of morphological character data using the Bayesian and maximum likelihood Mk model provide phylogenies of higher accuracy compared to parsimony methods. This has proved controversial, particularly studies simulating morphology‐data under Markov models that assume shared branch lengths for characters, as it is claimed this leads to bias favouring the Bayesian or maximum likelihood Mk model over parsimony models which do not explicitly make this assumption. We avoid these potential issues by employing a simulation protocol in which character states are randomly assigned to tips, but datasets are constrained to an empirically realistic distribution of homoplasy as measured by the consistency index. Datasets were analysed with equal weights and implied weights parsimony, and the maximum likelihood and Bayesian Mk model. We find that consistent (low homoplasy) datasets render method choice largely irrelevant, as all methods perform well with high consistency (low homoplasy) datasets, but the largest discrepancies in accuracy occur with low consistency datasets (high homoplasy). In such cases, the Bayesian Mk model is significantly more accurate than alternative models and implied weights parsimony never significantly outperforms the Bayesian Mk model. When poorly supported branches are collapsed, the Bayesian Mk model recovers trees with higher resolution compared to other methods. As it is not possible to assess homoplasy independently of a tree estimate, the Bayesian Mk model emerges as the most reliable approach for categorical morphological analyses.  相似文献   

6.
The use of fluorescent protein tags has had a huge impact on cell biological studies in virtually every experimental system. Incorporation of coding sequence for fluorescent proteins such as green fluorescent protein (GFP) into genes at their endogenous chromosomal position is especially useful for generating GFP-fusion proteins that provide accurate cellular and subcellular expression data. We tested modifications of a transposon-based protein trap screening procedure in Drosophila to optimize the rate of recovering useful protein traps and their analysis. Transposons carrying the GFP-coding sequence flanked by splice acceptor and donor sequences were mobilized, and new insertions that resulted in production of GFP were captured using an automated embryo sorter. Individual stocks were established, GFP expression was analyzed during oogenesis, and insertion sites were determined by sequencing genomic DNA flanking the insertions. The resulting collection includes lines with protein traps in which GFP was spliced into mRNAs and embedded within endogenous proteins or enhancer traps in which GFP expression depended on splicing into transposon-derived RNA. We report a total of 335 genes associated with protein or enhancer traps and a web-accessible database for viewing molecular information and expression data for these genes.  相似文献   

7.
Exploring the plant transcriptome through phylogenetic profiling   总被引:5,自引:0,他引:5       下载免费PDF全文
Publicly available protein sequences represent only a small fraction of the full catalog of genes encoded by the genomes of different plants, such as green algae, mosses, gymnosperms, and angiosperms. By contrast, an enormous amount of expressed sequence tags (ESTs) exists for a wide variety of plant species, representing a substantial part of all transcribed plant genes. Integrating protein and EST sequences in comparative and evolutionary analyses is not straightforward because of the heterogeneous nature of both types of sequence data. By combining information from publicly available EST and protein sequences for 32 different plant species, we identified more than 250,000 plant proteins organized in more than 12,000 gene families. Approximately 60% of the proteins are absent from current sequence databases but provide important new information about plant gene families. Analysis of the distribution of gene families over different plant species through phylogenetic profiling reveals interesting insights into plant gene evolution, and identifies species- and lineage-specific gene families, orphan genes, and conserved core genes across the green plant lineage. We counted a similar number of approximately 9,500 gene families in monocotyledonous and eudicotyledonous plants and found strong evidence for the existence of at least 33,700 genes in rice (Oryza sativa). Interestingly, the larger number of genes in rice compared to Arabidopsis (Arabidopsis thaliana) can partially be explained by a larger amount of species-specific single-copy genes and species-specific gene families. In addition, a majority of large gene families, typically containing more than 50 genes, are bigger in rice than Arabidopsis, whereas the opposite seems true for small gene families.  相似文献   

8.
The ability to translate vast amounts of information, as obtained from lipidomic analysis, into the knowledge and understanding of biological phenomena is an important challenge faced by the lipidomics community. While many of the informatics and computational tools from other domains such as bioinformatics and metabolomics are also applicable to lipidomics data processing and analysis, new solutions and strategies are needed for the studies of lipidomes at the systems level. This is due to enormous functional and structural diversity of lipids as well as because of their complex regulation at multiple spatial and temporal scales. In order to better understand the lipidomes at the physiological level, lipids need to be modeled not only at the level of biological pathways but also at the level of the biophysical systems they are part of, such as cellular membranes or lipoprotein particles. Herein the current state, recent advances and new opportunities in the field of lipid bioinformatics are reviewed.  相似文献   

9.
Recent advances in mass spectrometry (MS)-based techniques for lipidomic analysis have empowered us with the tools that afford studies of lipidomes at the systems level. However, these techniques pose a number of challenges for lipidomic raw data processing, lipid informatics, and the interpretation of lipidomic data in the context of lipid function and structure. Integration of lipidomic data with other systemic levels, such as genomic or proteomic, in the context of molecular pathways and biophysical processes provides a basis for the understanding of lipid function at the systems level. The present report, based on the limited literature, is an update on a young but rapidly emerging field of lipid informatics and related pathway reconstruction strategies.  相似文献   

10.
Vasco DA 《Genetics》2008,179(2):951-963
The estimation of ancestral and current effective population sizes in expanding populations is a fundamental problem in population genetics. Recently it has become possible to scan entire genomes of several individuals within a population. These genomic data sets can be used to estimate basic population parameters such as the effective population size and population growth rate. Full-data-likelihood methods potentially offer a powerful statistical framework for inferring population genetic parameters. However, for large data sets, computationally intensive methods based upon full-likelihood estimates may encounter difficulties. First, the computational method may be prohibitively slow or difficult to implement for large data. Second, estimation bias may markedly affect the accuracy and reliability of parameter estimates, as suggested from past work on coalescent methods. To address these problems, a fast and computationally efficient least-squares method for estimating population parameters from genomic data is presented here. Instead of modeling genomic data using a full likelihood, this new approach uses an analogous function, in which the full data are replaced with a vector of summary statistics. Furthermore, these least-squares estimators may show significantly less estimation bias for growth rate and genetic diversity than a corresponding maximum-likelihood estimator for the same coalescent process. The least-squares statistics also scale up to genome-sized data sets with many nucleotides and loci. These results demonstrate that least-squares statistics will likely prove useful for nonlinear parameter estimation when the underlying population genomic processes have complex evolutionary dynamics involving interactions between mutation, selection, demography, and recombination.  相似文献   

11.
Aim At broad geographical scales, species richness is a product of three basic processes: speciation, extinction and migration. However, determining which of these processes predominates is a major challenge. Whilst palaeontological studies can provide information on speciation and extinction rates, data are frequently lacking. Here we use a recent dated phylogenetic tree of mammals to explore the relative importance of these three processes in structuring present‐day richness gradients. Location The global terrestrial biosphere. Methods We combine macroecological data with phylogenetic methods more typically used in community ecology to describe the phylogenetic history of regional faunas. Using simulations, we explore two simple phylogenetic metrics, the mean and variance in the pairwise distances between taxa, and describe their relationship to phylogenetic tree topology. We then use these two metrics to characterize the evolutionary relationships among mammal species assemblages across the terrestrial biome. Results We show that the mean and variance in the pairwise distances describe phylogenetic tree topology well, but are less sensitive to phylogenetic uncertainty than more direct measures of tree shape. We find the phylogeny for South American mammals is imbalanced and ‘stemmy’ (long branches towards the root), consistent with recent diversification within evolutionarily disparate lineages. In contrast, the phylogeny for African mammals is balanced and ‘tippy’ (long branches towards the tips), more consistent with the slow accumulation of diversity over long times, reflecting the Old World origin of many mammal clades. Main conclusions We show that phylogeny can accurately capture biogeographical processes operating at broad spatial scales and over long time periods. Our results support inferences from the fossil record – that the New World tropics are a diversity cradle whereas the Old World tropics are a museum of old diversity.  相似文献   

12.
The further evolution of molecularly imprinted polymer science and technology necessitates the development of robust predictive tools capable of handling the complexity of molecular imprinting systems. A combination of the rapid growth in computer power over the past decade and significant software developments have opened new possibilities for simulating aspects of the complex molecular imprinting process. We present here a survey of the current status of the use of in silico-based approaches to aspects of molecular imprinting. Finally, we highlight areas where ongoing and future efforts should yield information critical to our understanding of the underlying mechanisms sufficient to permit the rational design of molecularly imprinted polymers.  相似文献   

13.
The remarkable conservation of protein structure, compared with that of sequences, suggests that in the course of evolution, residue substitutions which tend to destabilize a particular structure must be compensated by other substitutions that confer greater stability on that structure. Several approaches have been designed to detect correlated changes in a set of homologous sequences. However, most of them do not take into account the phylogeny of the sequences, and it has been shown that their detection power is weak. It remains unclear whether coevolution could be a general process at the level of amino acids of proteins. In the present study, we analyze the phylogenetic reconstruction of 15 sets of homologous proteins to assess, under different conditions, whether a significant amount of coevolving sites can be detected. Two criteria are used to detect significantly cosubstituting sites. One criterion corresponds to that of Shindyalov, Kolchanov, and Sander. The second one is based on intensive simulations of evolution of protein sequences along a phylogeny to estimate the significance of the number of observed cosubstitutions for pairs of sites. Our results show an important sensitivity of the detection of cosubstituting sites to the model used for the phylogenetic reconstruction. Not considering the uncertainty associated with the reconstructed data might lead to detecting numerous false-positive pairs of sites. Finally, significant amounts of coevolving pairs could be found only when substitutions affecting the physicochemical properties of the amino acids were considered. Such results suggest evidence of a cosubstitution mechanism in protein evolution. However, the identification of nonambiguous coevolving sites is still unresolved.  相似文献   

14.
Phylogenetic comparative methods (PCMs) can be used to study evolutionary relationships and trade-offs among species traits. Analysts using PCM may want to (1) include latent variables, (2) estimate complex trait interdependencies, (3) predict missing trait values, (4) condition predicted traits upon phylogenetic correlations and (5) estimate relationships as slope parameters that can be compared with alternative regression methods. The Comprehensive R Archive Network (CRAN) includes well-documented software for phylogenetic linear models (phylolm), phylogenetic path analysis (phylopath), phylogenetic trait imputation (Rphylopars) and structural equation models (sem), but none of these can simultaneously accomplish all five analytical goals. We therefore introduce a new package phylosem for phylogenetic structural equation models (PSEM) and summarize features and interface. We also describe new analytical options, where users can specify any combination of Ornstein-Uhlenbeck, Pagel's-δ and Pagel's-λ transformations for species covariance. For the first time, we show that PSEM exactly reproduces estimates (and standard errors) for simplified cases that are feasible in sem, phylopath, phylolm and Rphylopars and demonstrate the approach by replicating a well-known case study involving trade-offs in plant energy budgets.  相似文献   

15.
Polyhydroxybutyrate (PHB) is a sustainable bioplastic produced by bacteria that is a potential replacement for conventional plastics. This study delivers an integrated experimental and computational modeling approach to decipher metabolic factors controlling PHB production and offers engineering design strategies to boost production. In the metabolically robust Rhodopseudomonas palustris CGA009, PHB production significantly increased when grown on the carbon- and electron-rich lignin breakdown product p-coumarate (C9H8O3) compared to virtually no PHB titer from acetate (C2H3NaO2). The maximum yield did not improve further when grown on coniferyl alcohol (C10H12O3), but comparison of the PHB profiles showed that coniferyl alcohol's higher carbon content resulted in a higher rate of PHB production. Combined experimental results revealed that cytoplasmic space may be a limiting factor for maximum PHB titer. In order to obtain a systems-level understanding of factors driving PHB yield, a model-driven investigation was performed. The model yielded several engineering design strategies including utilizing reduced, high molecular weight substrates that bypass the thiolase reaction (phaA). Based on these strategies, utilization of butyrate was predicted and subsequently validated to produce PHB. Model analysis also explained why nitrogen starvation was not essential for PHB production and revealed that renewable and abundant lignin aromatics are ideal candidates for PHB production. Most importantly, the generality of the derived design rules allows them to be applied to any PHB-producing microbe with similar metabolic features.  相似文献   

16.
Yu  Yun  Jermaine  Christopher  Nakhleh  Luay 《BMC genomics》2016,17(10):784-124

Background

Phylogenetic networks are leaf-labeled graphs used to model and display complex evolutionary relationships that do not fit a single tree. There are two classes of phylogenetic networks: Data-display networks and evolutionary networks. While data-display networks are very commonly used to explore data, they are not amenable to incorporating probabilistic models of gene and genome evolution. Evolutionary networks, on the other hand, can accommodate such probabilistic models, but they are not commonly used for exploration.

Results

In this work, we show how to turn evolutionary networks into a tool for statistical exploration of phylogenetic hypotheses via a novel application of Gibbs sampling. We demonstrate the utility of our work on two recently available genomic data sets, one from a group of mosquitos and the other from a group of modern birds. We demonstrate that our method allows the use of evolutionary networks not only for explicit modeling of reticulate evolutionary histories, but also for exploring conflicting treelike hypotheses. We further demonstrate the performance of the method on simulated data sets, where the true evolutionary histories are known.

Conclusion

We introduce an approach to explore phylogenetic hypotheses over evolutionary phylogenetic networks using Gibbs sampling. The hypotheses could involve reticulate and non-reticulate evolutionary processes simultaneously as we illustrate on mosquito and modern bird genomic data sets.
  相似文献   

17.
SUMMARY: GeneContent is a software system to infer the genome phylogeny based on an additive genome distance that can be estimated from the extended gene content data, which contains the genome-wide information (absence of a gene family, presence as single copy or presence as duplicates) across multiple species. GeneContent can also be used to explore the genome-wide evolutionary pattern of gene loss and proliferation. AVAILABILITY: Distribution packages of GeneContent for both Microsoft Windows and Linux operating systems are available at http://xgu.zool.iastate.edu CONTACT: xgu@iastate.edu.  相似文献   

18.
It has long been recognized that phylogenetic trees are more unbalanced than those generated by a Yule process. Recently, the degree of this imbalance has been quantified using the large set of phylogenetic trees available in the TreeBASE data set. In this article, a more precise analysis of imbalance is undertaken. Trees simulated under a range of models are compared with trees from TreeBASE and two smaller data sets. Several simple models can match the amount of imbalance measured in real data. Most of them also match the variance of imbalance among empirical trees to a remarkable degree. Statistics are developed to measure balance and to distinguish between trees with the same overall imbalance. The match between models and data for these statistics is investigated. In particular, age-dependent (Bellman-Harris) branching process are studied in detail. It remains difficult to separate the process of macroevolution from biases introduced by sampling. The lessons for phylogenetic analysis are clearer. In particular, the use of the usual proportional to distinguishable arrangements (uniform) prior on tree topologies in Bayesian phylogenetic analysis is not recommended.  相似文献   

19.
We have developed a rapid parsimony method for reconstructing ancestral nucleotide states that allows calculation of initial branch lengths that are good approximations to optimal maximum-likelihood estimates under several commonly used substitution models. Use of these approximate branch lengths (rather than fixed arbitrary values) as starting points significantly reduces the time required for iteration to a solution that maximizes the likelihood of a tree. These branch lengths are close enough to the optimal values that they can be used without further iteration to calculate approximate maximum-likelihood scores that are very close to the "exact" scores found by iteration. Several strategies are described for using these approximate scores to substantially reduce times needed for maximum-likelihood tree searches.  相似文献   

20.
MOTIVATION: The computation of large phylogenetic trees with statistical models such as maximum likelihood or bayesian inference is computationally extremely intensive. It has repeatedly been demonstrated that these models are able to recover the true tree or a tree which is topologically closer to the true tree more frequently than less elaborate methods such as parsimony or neighbor joining. Due to the combinatorial and computational complexity the size of trees which can be computed on a Biologist's PC workstation within reasonable time is limited to trees containing approximately 100 taxa. RESULTS: In this paper we present the latest release of our program RAxML-III for rapid maximum likelihood-based inference of large evolutionary trees which allows for computation of 1.000-taxon trees in less than 24 hours on a single PC processor. We compare RAxML-III to the currently fastest implementations for maximum likelihood and bayesian inference: PHYML and MrBayes. Whereas RAxML-III performs worse than PHYML and MrBayes on synthetic data it clearly outperforms both programs on all real data alignments used in terms of speed and final likelihood values. Availability SUPPLEMENTARY INFORMATION: RAxML-III including all alignments and final trees mentioned in this paper is freely available as open source code at http://wwwbode.cs.tum/~stamatak CONTACT: stamatak@cs.tum.edu.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号