共查询到20条相似文献,搜索用时 0 毫秒
1.
Identifying potential protein interactions is of great importance in understanding the topologies of cellular networks, which is much needed and valued in current systematic biological studies. The development of our computational methods to predict protein-protein interactions have been spurred on by the massive sequencing efforts of the genomic revolution. Among these methods is phylogenetic profiling, which assumes that proteins under similar evolutionary pressures with similar phylogenetic profiles might be functionally related. Here, we introduce a method for inferring functional linkages between proteins from their evolutionary scenarios. The term evolutionary scenario refers to a series of events that occurred in speciation over time, which can be reconstructed given a phylogenetic profile and a species tree. Common evolutionary pressures on two proteins can then be inferred by comparing their evolutionary scenarios, which is a direct indication of their functional linkage. This scenario method has proven to have better performance compared with the classical phylogenetic profile method, when applied to the same test set. In addition, predicted results of the two methods are found to be fairly different, suggesting the possibility of merging them in order to achieve a better performance. We analyzed the influence of the topology of the phylogenetic tree on the performance of this method, and found it to be robust to perturbations in the topology of the tree. However, if a completely random tree is incorporated, performance will decline significantly. The evolutionary scenario method was used for inferring functional linkages in 67 species, and 40,006 linkages were predicted. We examine our prediction for budding yeast and find that almost all predicted linkages are supported by further evidence. 相似文献
2.
3.
MOTIVATION: Co-evolution is a powerful mechanism for understanding protein function. Prior work in this area has shown that co-evolving proteins are more likely to share the same function than those that do not because of functional constraints. Many of the efforts founded on this observation, however, are at the level of entire sequences, implicitly assuming that the complete protein sequence follows a single evolutionary trajectory. Since it is well known that a domain can exist in various contexts, this assumption is not valid for numerous multi-domain proteins. Motivated by these observations, we introduce a novel technique called Coevolutionary-Matrix that captures co-evolution between regions of two proteins. Instead of using existing domain information, the method exploits residue-level conservation to identify co-evolving regions that might correspond to domains. RESULTS: We show that the Coevolutionary-Matrix method can detect greater number of known functional associations for the Escherichia coli proteins when compared with earlier implementations of phylogenetic profiles. Furthermore, co-evolving regions of proteins detected by our method enable us to make hypotheses about their specific functions, many of which are supported by existing biochemical studies. 相似文献
4.
Inferring admixture proportions from molecular data 总被引:19,自引:2,他引:17
We derive here two new estimators of admixture proportions based on acoalescent approach that explicitly takes into account molecularinformation as well as gene frequencies. These estimators can be applied toany type of molecular data (such as DNA sequences, restriction fragmentlength polymorphisms [RFLPs], or microsatellite data) for which the extentof molecular diversity is related to coalescent times. Monte Carlosimulation studies are used to analyze the behavior of our estimators. Weshow that one of them (mY) appears suitable for estimating admixture frommolecular data because of its absence of bias and relatively low variance.We then compare it to two conventional estimators that are based on genefrequencies. mY proves to be less biased than conventional estimators overa wide range of situations and especially for microsatellite data. However,its variance is larger than that of conventional estimators when parentalpopulations are not very differentiated. The variance of mY becomes smallerthan that of conventional estimators only if parental populations have beenkept separated for about N generations and if the mutation rate is high.Simulations also show that several loci should always be studied to achievea drastic reduction of variance and that, for microsatellite data, the meansquare error of mY rapidly becomes smaller than that of conventionalestimators if enough loci are surveyed. We apply our new estimator to thecase of admixed wolflike Canid populations tested for microsatellite data. 相似文献
5.
Predicting functional linkages from gene fusions with confidence 总被引:1,自引:0,他引:1
Pairs of genes that function together in a pathway or cellular system can sometimes be found fused together in another organism as a Rosetta Stone protein--a fusion protein whose separate domains are homologous to the two functionally-related proteins. The finding of such a Rosetta Stone protein allows the prediction of a functional linkage between the component proteins. The significance of these deduced functional linkages, however, varies depending on the prevalence of each of the two domains. Here, we develop a statistical measure for the significance of predicted functional linkages, and test this measure for proteins of E. coli on a functional benchmark based on the KEGG database. By applying this statistical measure, proteins can be linked with over 70% accuracy. Using the Rosetta Stone method and this scoring scheme, we find all significant functional linkages for proteins of E. coli, P. horikshii and S. cerevisiae, and measure the extent of the resulting protein networks. 相似文献
6.
The advent of whole-genome sequencing has led to methods that infer protein function and linkages. We have combined four such algorithms (phylogenetic profile, Rosetta Stone, gene neighbor and gene cluster) in a single database--Prolinks--that spans 83 organisms and includes 10 million high-confidence links. The Proteome Navigator tool allows users to browse predicted linkage networks interactively, providing accompanying annotation from public databases. The Prolinks database and the Proteome Navigator tool are available for use online at http://dip.doe-mbi.ucla.edu/pronav. 相似文献
7.
8.
9.
Inferring admixture proportions from molecular data: extension to any number of parental populations 总被引:10,自引:0,他引:10
The relative contribution of two parental populations to a hybrid group (the admixture proportions) can be estimated using not only the frequencies of different alleles, but also the degree of molecular divergence between them. In this paper, we extend this possibility to the case of any number of parental populations. The newly derived multiparental estimator is tested by Monte Carlo simulations and by generating artificial hybrid groups by pooling mtDNA samples from human populations. The general properties (including the variance) of the two-parental estimator seem to be retained by the multiparental estimator. When mixed human populations are considered and hypervariable single-locus data are analyzed (mtDNA control region), errors in the estimated contributions appear reasonably low only when highly differentiated parental populations are involved. Finally, the method applied to the hybrid Canary Island population points to a much lower female contribution from Spain than has previously been estimated. 相似文献
10.
11.
Inferring functional relationships of proteins from local sequence and spatial surface patterns 总被引:2,自引:0,他引:2
We describe a novel approach for inferring functional relationship of proteins by detecting sequence and spatial patterns of protein surfaces. Well-formed concave surface regions in the form of pockets and voids are examined to identify similarity relationship that might be directly related to protein function. We first exhaustively identify and measure analytically all 910,379 surface pockets and interior voids on 12,177 protein structures from the Protein Data Bank. The similarity of patterns of residues forming pockets and voids are then assessed in sequence, in spatial arrangement, and in orientational arrangement. Statistical significance in the form of E and p-values is then estimated for each of the three types of similarity measurements. Our method is fully automated without human intervention and can be used without input of query patterns. It does not assume any prior knowledge of functional residues of a protein, and can detect similarity based on surface patterns small and large. It also tolerates, to some extent, conformational flexibility of functional sites. We show with examples that this method can detect functional relationship with specificity for members of the same protein family and superfamily, as well as remotely related functional surfaces from proteins of different fold structures. We envision that this method can be used for discovering novel functional relationship of protein surfaces, for functional annotation of protein structures with unknown biological roles, and for further inquiries on evolutionary origins of structural elements important for protein function. 相似文献
12.
Shakhnovich BE 《Bioinformatics (Oxford, England)》2006,22(14):e440-e445
Analysis of increasingly saturated sequence databases have shown that gene family sizes are highly skewed with many families being small and few containing many, far-diverged homologs. Additionally, recently published results have identified a structural determinant of mutational plasticity: designability that correlates strongly with gene family size. In this paper, we explore the possible links between the two observations, exploring the possible effect of designability on duplication and divergence. We show that designability has an inverse of expected relationship with strength of selection. More designable domains that should have more mutational plasticity evolve slower. However, we also present evidence that recently duplicated genes have variable probability of locus fixation correlated with strength of selection. As expected, paralogs under stronger evolutionary pressure have a lower failure rate. Finally, we show that probability of pseudogene formation from gene duplication can be directly tied to designability and functional flexibility of the family. We present evidence that gene families with higher designability have diverged farther because of lower probability of pseudogenization. Additionally, mutational plasticity may play an integral role by influencing pseudogenization rate. Either way, we show that considering the failure rate of duplications is integral in understanding the determinants and dynamics of molecular evolution. 相似文献
14.
Background
The genetic code is redundant, meaning that most amino acids can be encoded by more than one codon. Highly expressed genes tend to use optimal codons to increase the accuracy and speed of translation. Thus, codon usage biases provide a signature of the relative expression levels of genes, which can, uniquely, be quantified across the domains of life.Results
Here we describe a general statistical framework to exploit this phenomenon and to systematically associate genes with environments and phenotypic traits through changes in codon adaptation. By inferring evolutionary signatures of translation efficiency in 911 bacterial and archaeal genomes while controlling for confounding effects of phylogeny and inter-correlated phenotypes, we linked 187 gene families to 24 diverse phenotypic traits. A series of experiments in Escherichia coli revealed that 13 of 15, 19 of 23, and 3 of 6 gene families with changes in codon adaptation in aerotolerant, thermophilic, or halophilic microbes. Respectively, confer specific resistance to, respectively, hydrogen peroxide, heat, and high salinity. Further, we demonstrate experimentally that changes in codon optimality alone are sufficient to enhance stress resistance. Finally, we present evidence that multiple genes with altered codon optimality in aerobes confer oxidative stress resistance by controlling the levels of iron and NAD(P)H.Conclusions
Taken together, these results provide experimental evidence for a widespread connection between changes in translation efficiency and phenotypic adaptation. As the number of sequenced genomes increases, this novel genomic context method for linking genes to phenotypes based on sequence alone will become increasingly useful. 相似文献15.
Plant functional response traits, which consistently respond to the environment, are useful for identifying drivers of vegetation change, particularly in response to disturbance gradients. Similarly, functional diversity indices have proven useful for investigating processes governing community assembly, particularly patterns of functional convergence/divergence. This study investigated the functional ecology of biodiverse, seminatural coastal grasslands (Scottish machair) at the national scale. We examined temporal shifts in functional response traits and functional diversity metrics using a series of null model, multivariate and regression analyses. The aim was to link temporal shifts in traits and diversity metrics to environmental variables in which to gauge the contribution of landuse change to plant functional composition and processes governing plant assembly. We observed significant shifts in the composition of 8 out of 12 functional response traits at the national scale, whereas at the regional scale all traits displayed at least one significant shift. Ordination of response traits found PC axis 1 (accounting for 39% of the variation) to be positively correlated to vegetation height and negatively correlated to specific leaf area, similar to that expected along a disturbance gradient. Significant changes in functional diversity indices were also observed at both national and regional scales, with varying convergence/divergence patterns observed across individual regions. We found functional richness (t = 4.87, p < 0.001) and divergence (t = 9.3, p < 0.001) to increase along PC axis 1, suggesting greater convergence and lower divergence along a disturbance gradient. This study demonstrates the potential for using functional diversity indices in combination with response traits as a sensitive method for detecting landuse change and its impacts on biodiversity. We conclude that landuse change, particularly management declines and intensification is a major driver governing change among the functional composition and functional diversity for machair grasslands, influencing convergence/divergence patterns, and subsequently community assembly processes. 相似文献
16.
Background
The recent emergence of high-throughput automated image acquisition technologies has forever changed how cell biologists collect and analyze data. Historically, the interpretation of cellular phenotypes in different experimental conditions has been dependent upon the expert opinions of well-trained biologists. Such qualitative analysis is particularly effective in detecting subtle, but important, deviations in phenotypes. However, while the rapid and continuing development of automated microscope-based technologies now facilitates the acquisition of trillions of cells in thousands of diverse experimental conditions, such as in the context of RNA interference (RNAi) or small-molecule screens, the massive size of these datasets precludes human analysis. Thus, the development of automated methods which aim to identify novel and biological relevant phenotypes online is one of the major challenges in high-throughput image-based screening. Ideally, phenotype discovery methods should be designed to utilize prior/existing information and tackle three challenging tasks, i.e. restoring pre-defined biological meaningful phenotypes, differentiating novel phenotypes from known ones and clarifying novel phenotypes from each other. Arbitrarily extracted information causes biased analysis, while combining the complete existing datasets with each new image is intractable in high-throughput screens.Results
Here we present the design and implementation of a novel and robust online phenotype discovery method with broad applicability that can be used in diverse experimental contexts, especially high-throughput RNAi screens. This method features phenotype modelling and iterative cluster merging using improved gap statistics. A Gaussian Mixture Model (GMM) is employed to estimate the distribution of each existing phenotype, and then used as reference distribution in gap statistics. This method is broadly applicable to a number of different types of image-based datasets derived from a wide spectrum of experimental conditions and is suitable to adaptively process new images which are continuously added to existing datasets. Validations were carried out on different dataset, including published RNAi screening using Drosophila embryos [Additional files 1, 2], dataset for cell cycle phase identification using HeLa cells [Additional files 1, 3, 4] and synthetic dataset using polygons, our methods tackled three aforementioned tasks effectively with an accuracy range of 85%–90%. When our method is implemented in the context of a Drosophila genome-scale RNAi image-based screening of cultured cells aimed to identifying the contribution of individual genes towards the regulation of cell-shape, it efficiently discovers meaningful new phenotypes and provides novel biological insight. We also propose a two-step procedure to modify the novelty detection method based on one-class SVM, so that it can be used to online phenotype discovery. In different conditions, we compared the SVM based method with our method using various datasets and our methods consistently outperformed SVM based method in at least two of three tasks by 2% to 5%. These results demonstrate that our methods can be used to better identify novel phenotypes in image-based datasets from a wide range of conditions and organisms.Conclusion
We demonstrate that our method can detect various novel phenotypes effectively in complex datasets. Experiment results also validate that our method performs consistently under different order of image input, variation of starting conditions including the number and composition of existing phenotypes, and dataset from different screens. In our findings, the proposed method is suitable for online phenotype discovery in diverse high-throughput image-based genetic and chemical screens. 相似文献17.
Inferring speciation times under an episodic molecular clock 总被引:5,自引:0,他引:5
We extend our recently developed Markov chain Monte Carlo algorithm for Bayesian estimation of species divergence times to allow variable evolutionary rates among lineages. The method can use heterogeneous data from multiple gene loci and accommodate multiple fossil calibrations. Uncertainties in fossil calibrations are described using flexible statistical distributions. The prior for divergence times for nodes lacking fossil calibrations is specified by use of a birth-death process with species sampling. The prior for lineage-specific substitution rates is specified using either a model with autocorrelated rates among adjacent lineages (based on a geometric Brownian motion model of rate drift) or a model with independent rates among lineages specified by a log-normal probability distribution. We develop an infinite-sites theory, which predicts that when the amount of sequence data approaches infinity, the width of the posterior credibility interval and the posterior mean of divergence times form a perfect linear relationship, with the slope indicating uncertainties in time estimates that cannot be reduced by sequence data alone. Simulations are used to study the influence of among-lineage rate variation and the number of loci sampled on the uncertainty of divergence time estimates. The analysis suggests that posterior time estimates typically involve considerable uncertainties even with an infinite amount of sequence data, and that the reliability and precision of fossil calibrations are critically important to divergence time estimation. We apply our new algorithms to two empirical data sets and compare the results with those obtained in previous Bayesian and likelihood analyses. The results demonstrate the utility of our new algorithms. 相似文献
18.
19.
20.
MOTIVATION: Inferring species phylogenies with a history of gene losses and duplications is a challenging and an important task in computational biology. This problem can be solved by duplication-loss models in which the primary step is to reconcile a rooted gene tree with a rooted species tree. Most modern methods of phylogenetic reconstruction (from sequences) produce unrooted gene trees. This limitation leads to the problem of transforming unrooted gene tree into a rooted tree, and then reconciling rooted trees. The main questions are 'What about biological interpretation of choosing rooting?', 'Can we find efficiently the optimal rootings?', 'Is the optimal rooting unique?'. RESULTS: In this paper we present a model of reconciling unrooted gene tree with a rooted species tree, which is based on a concept of choosing rooting which has minimal reconciliation cost. Our analysis leads to the surprising property that all the minimal rootings have identical distributions of gene duplications and gene losses in the species tree. It implies, in our opinion, that the concept of an optimal rooting is very robust, and thus biologically meaningful. Also, it has nice computational properties. We present a linear time and space algorithm for computing optimal rooting(s). This algorithm was used in two different ways to reconstruct the optimal species phylogeny of five known yeast genomes from approximately 4700 gene trees. Moreover, we determined locations (history) of all gene duplications and gene losses in the final species tree. It is interesting to notice that the top five species trees are the same for both methods. AVAILABILITY: Software and documentation are freely available from http://bioputer.mimuw.edu.pl/~gorecki/urec 相似文献