共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Background
Genome and metagenome studies have identified thousands of protein families whose functions are poorly understood and for which techniques for functional characterization provide only partial information. For such proteins, the genome context can give further information about their functional context. 相似文献3.
GSTaxClassifier (Genomic Signature based Taxonomic Classifier) is a program for metagenomics analysis of shotgun DNA sequences. The
program includes
- a simple but effective algorithm, a modification of the Bayesian method, to predict the most probable genomic origins of sequences at different taxonomical ranks, on the basis of genome databases;
- a function to generate genomic profiles of reference sequences with tri-, tetra-, penta-, and hexa-nucleotide motifs for setting a user-defined database;
- two different formats (tabular- and tree-based summaries) to display taxonomic predictions with improved analytical methods; and
- effective ways to retrieve, search, and summarize results by integrating the predictions into the NCBI tree-based taxonomic information.
4.
Myers CL Robson D Wible A Hibbs MA Chiriac C Theesfeld CL Dolinski K Troyanskaya OG 《Genome biology》2005,6(13):R114
We have developed a general probabilistic system for query-based discovery of pathway-specific networks through integration of diverse genome-wide data. This framework was validated by accurately recovering known networks for 31 biological processes in Saccharomyces cerevisiae and experimentally verifying predictions for the process of chromosomal segregation. Our system, bioPIXIE, a public, comprehensive system for integration, analysis, and visualization of biological network predictions for S. cerevisiae, is freely accessible over the worldwide web. 相似文献
5.
Marsden RL Ranea JA Sillero A Redfern O Yeats C Maibaum M Lee D Addou S Reeves GA Dallman TJ Orengo CA 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2006,361(1467):425-440
New directions in biology are being driven by the complete sequencing of genomes, which has given us the protein repertoires of diverse organisms from all kingdoms of life. In tandem with this accumulation of sequence data, worldwide structural genomics initiatives, advanced by the development of improved technologies in X-ray crystallography and NMR, are expanding our knowledge of structural families and increasing our fold libraries. Methods for detecting remote sequence similarities have also been made more sensitive and this means that we can map domains from these structural families onto genome sequences to understand how these families are distributed throughout the genomes and reveal how they might influence the functional repertoires and biological complexities of the organisms. We have used robust protocols to assign sequences from completed genomes to domain structures in the CATH database, allowing up to 60% of domain sequences in these genomes, depending on the organism, to be assigned to a domain family of known structure. Analysis of the distribution of these families throughout bacterial genomes identified more than 300 universal families, some of which had expanded significantly in proportion to genome size. These highly expanded families are primarily involved in metabolism and regulation and appear to make major contributions to the functional repertoire and complexity of bacterial organisms. When comparisons are made across all kingdoms of life, we find a smaller set of universal domain families (approx. 140), of which families involved in protein biosynthesis are the largest conserved component. Analysis of the behaviour of other families reveals that some (e.g. those involved in metabolism, regulation) have remained highly innovative during evolution, making it harder to trace their evolutionary ancestry. Structural analyses of metabolic families provide some insights into the mechanisms of functional innovation, which include changes in domain partnerships and significant structural embellishments leading to modulation of active sites and protein interactions. 相似文献
6.
A common assumption in comparative genomics is that orthologous genes share greater functional similarity than do paralogous genes (the "ortholog conjecture"). Many methods used to computationally predict protein function are based on this assumption, even though it is largely untested. Here we present the first large-scale test of the ortholog conjecture using comparative functional genomic data from human and mouse. We use the experimentally derived functions of more than 8,900 genes, as well as an independent microarray dataset, to directly assess our ability to predict function using both orthologs and paralogs. Both datasets show that paralogs are often a much better predictor of function than are orthologs, even at lower sequence identities. Among paralogs, those found within the same species are consistently more functionally similar than those found in a different species. We also find that paralogous pairs residing on the same chromosome are more functionally similar than those on different chromosomes, perhaps due to higher levels of interlocus gene conversion between these pairs. In addition to offering implications for the computational prediction of protein function, our results shed light on the relationship between sequence divergence and functional divergence. We conclude that the most important factor in the evolution of function is not amino acid sequence, but rather the cellular context in which proteins act. 相似文献
7.
A probabilistic neural network approach for modeling and classification of bacterial growth/no-growth data 总被引:3,自引:0,他引:3
In this paper, we propose to use probabilistic neural networks (PNNs) for classification of bacterial growth/no-growth data and modeling the probability of growth. The PNN approach combines both Bayes theorem of conditional probability and Parzen's method for estimating the probability density functions of the random variables. Unlike other neural network training paradigms, PNNs are characterized by high training speed and their ability to produce confidence levels for their classification decision. As a practical application of the proposed approach, PNNs were investigated for their ability in classification of growth/no-growth state of a pathogenic Escherichia coli R31 in response to temperature and water activity. A comparison with the most frequently used traditional statistical method based on logistic regression and multilayer feedforward artificial neural network (MFANN) trained by error backpropagation was also carried out. The PNN-based models were found to outperform linear and nonlinear logistic regression and MFANN in both the classification accuracy and ease by which PNN-based models are developed. 相似文献
8.
In the emerging field of RNA-based nanotechnology there is a need for automation of the structure design process. Our goal is to develop computer methods for aiding in this process. Towards that end, we created the RNA junction database, which is a repository of RNA junctions, i.e. internal, multi-branch and kissing loops with emanating stem stubs, extracted from the larger RNA structures stored in the PDB database. These junctions can be used as building blocks for nanostructures. Two programs developed in our laboratory, NanoTiler and RNA2D3D, can combine such building blocks with idealized fragments of A-form helices to produce desired 3D nanostructures. Initially, the building blocks are treated as rigid objects and the resulting geometry is tested against the design objectives. Experimental data, however, shows that RNA accommodates its shape to the constraints of larger structural contexts. Therefore we are adding analysis of the flexibility of our building blocks to the full design process. Here we present an example of RNA-based nanostructure design, putting emphasis on the need to characterize the structural flexibility of the building blocks to induce ring closure in the automated exploration. We focus on the use of kissing loops (KL) in nanostructure design, since they have been shown to play an important role in RNA self-assembly. By using an experimentally proven system, the RNA tectosquare, we show that considering the flexibility of the KLs as well as distortions of helical regions may be necessary to achieve a realistic design. 相似文献
9.
Palani Kirubakaran Muthusamy Karthikeyan Kh. Dhanachandra Singh Selvaraman Nagamani Kumpati Premkumar 《Journal of molecular modeling》2013,19(1):407-419
Over expression of T-lymphokine–activated killer cell–originated protein kinase (TOPK) has been associated with leukemia, myeloma tumors and various other cancers. The function and regulatory mechanism of TOPK in tumor cells remains unclear. Structural studies that could reveal the regulatory mechanism have been a challenge because of the unavailabity of TOPK’s crystal structure. Hence, in this study, the 3D structure of TOPK protein has been constructed by using multiple templates. The quality and reliability of the generated model was checked and the molecular dynamics method was utilized to refine the model. APBS method was employed to know the electrostatic potential surface of the modeled protein and it was found that the optimum pH for protein stability is 3.4 which will further help in mechanistic hypothesis of TOPK protein. Active site of TOPK was identified from available literature and HTVS was employed to identify the lead molecules. The expected binding modes of protein-ligand complexes were reproduced in the MD simulation which indicates that the complex is relatively stable. The pharmacokinetic properties of the lead molecules are also under acceptable range. TOPK act as a substrate for CDK1 and the protein-protein docking and dynamics studies were carried out to analyze the effect of Thr9Ala mutation of TOPK in the two protein complex formation. It shows that the wild type complex is more stable when compared with the mutant type. Such structural information at atomic level not only exhibits the action modes of TOPK inhibitors but also furnishes a novel starting point for structure based drug design of TOPK inhibitors. 相似文献
10.
11.
Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome
下载免费PDF全文

Bergman CM Pfeiffer BD Rincón-Limas DE Hoskins RA Gnirke A Mungall CJ Wang AM Kronmiller B Pacleb J Park S Stapleton M Wan K George RA de Jong PJ Botas J Rubin GM Celniker SE 《Genome biology》2002,3(12):research0086.1-862
Background
It is widely accepted that comparative sequence data can aid the functional annotation of genome sequences; however, the most informative species and features of genome evolution for comparison remain to be determined.Results
We analyzed conservation in eight genomic regions (apterous, even-skipped, fushi tarazu, twist, and Rhodopsins 1, 2, 3 and 4) from four Drosophila species (D. erecta, D. pseudoobscura, D. willistoni, and D. littoralis) covering more than 500 kb of the D. melanogaster genome. All D. melanogaster genes (and 78-82% of coding exons) identified in divergent species such as D. pseudoobscura show evidence of functional constraint. Addition of a third species can reveal functional constraint in otherwise non-significant pairwise exon comparisons. Microsynteny is largely conserved, with rearrangement breakpoints, novel transposable element insertions, and gene transpositions occurring in similar numbers. Rates of amino-acid substitution are higher in uncharacterized genes relative to genes that have previously been studied. Conserved non-coding sequences (CNCSs) tend to be spatially clustered with conserved spacing between CNCSs, and clusters of CNCSs can be used to predict enhancer sequences.Conclusions
Our results provide the basis for choosing species whose genome sequences would be most useful in aiding the functional annotation of coding and cis-regulatory sequences in Drosophila. Furthermore, this work shows how decoding the spatial organization of conserved sequences, such as the clustering of CNCSs, can complement efforts to annotate eukaryotic genomes on the basis of sequence conservation alone. 相似文献12.
13.
Foote M 《Biology letters》2012,8(1):135-138
The distribution of species among genera and higher taxa has largely untapped potential to reveal among-clade variation in rates of origination and extinction. The probability distribution of the number of species within a genus is modelled with a stochastic, time-homogeneous birth-death model having two parameters: the rate of species extinction, μ, and the rate of genus origination, γ, each scaled as a multiple of the rate of within-genus speciation, λ. The distribution is more sensitive to γ than to μ, although μ affects the size of the largest genera. The species : genus ratio depends strongly on both γ and μ, and so is not a good diagnostic of evolutionary dynamics. The proportion of monotypic genera, however, depends mainly on γ, and so may provide an index of the genus origination rate. Application to living marine molluscs of New Zealand shows that bivalves have a higher relative rate of genus origination than gastropods. This is supported by the analysis of palaeontological data. This concordance suggests that analysis of living taxonomic distributions may allow inference of macroevolutionary dynamics even without a fossil record. 相似文献
14.
Background
Bioinformatics is an interdisciplinary field at the intersection of molecular biology and computing technology. To characterize the field as convergent domain, researchers have used bibliometrics, augmented with text-mining techniques for content analysis. In previous studies, Latent Dirichlet Allocation (LDA) was the most representative topic modeling technique for identifying topic structure of subject areas. However, as opposed to revealing the topic structure in relation to metadata such as authors, publication date, and journals, LDA only displays the simple topic structure.Methods
In this paper, we adopt the Tang et al.’s Author-Conference-Topic (ACT) model to study the field of bioinformatics from the perspective of keyphrases, authors, and journals. The ACT model is capable of incorporating the paper, author, and conference into the topic distribution simultaneously. To obtain more meaningful results, we use journals and keyphrases instead of conferences and bag-of-words.. For analysis, we use PubMed to collected forty-six bioinformatics journals from the MEDLINE database. We conducted time series topic analysis over four periods from 1996 to 2015 to further examine the interdisciplinary nature of bioinformatics.Results
We analyze the ACT Model results in each period. Additionally, for further integrated analysis, we conduct a time series analysis among the top-ranked keyphrases, journals, and authors according to their frequency. We also examine the patterns in the top journals by simultaneously identifying the topical probability in each period, as well as the top authors and keyphrases. The results indicate that in recent years diversified topics have become more prevalent and convergent topics have become more clearly represented.Conclusion
The results of our analysis implies that overtime the field of bioinformatics becomes more interdisciplinary where there is a steady increase in peripheral fields such as conceptual, mathematical, and system biology. These results are confirmed by integrated analysis of topic distribution as well as top ranked keyphrases, authors, and journals.15.
JACQUES PONCET 《Lethaia: An International Journal of Palaeontology and Stratigraphy》1989,22(4):425-429
Succaminopsis was transferred from the Foraminiferida to the Algae by Skompski in 1986 (Acta Geologica Polonica 36). A study of the early void-filling cementation within 'chambers' led to the identification of a fossilized organic wall, the presence of which, viewed as the fossil organic wall of the stem-cell of a dasycladacean alga, reinforces the new taxonomic attribution of Saccaminopsis. □ Foraminiferida, Algae, cementation, Middle Carboniferous, Algerian Sahara. 相似文献
16.
17.
An algorithm for prediction of the exon-intron structure of higher eukaryotic genes is suggested. The algorithm is based on comparison of genomic sequences of homologous genes from different species. It uses the fact that protein-coding sequences evolve slower than noncoding regions. Unlike the existing comparison methods, the proposed algorithm, which is a modified version of splicing alignment, compares not nucleotide but amino acid sequences, which increases its sensitivity. Conservation of the exon-intron structures of the compared genes is not assumed. The algorithm is implemented in the program Pro-Gen. The testing of the algorithm demonstrated that it can be successfully applied to prediction of vertebrate genes, and in some cases, for more distant comparisons (e.g., vertebrates and insects or nematodes). Thus, the program can be used for prediction of human genes by comparison with genes of model organisms: mouse, fugu, drosophila, and nematode. The algorithm overcomes deficiencies of the existing methods, both statistical (insufficient reliability) and similarity-based (inapplicability to completely new genes). 相似文献
18.
Derrick K Rollins Dongmei Zhai Alrica L Joe Jack W Guidarelli Abhishek Murarka Ramon Gonzalez 《BMC bioinformatics》2006,7(1):377
Background:
The highly dimensional data produced by functional genomic (FG) studies makes it difficult to visualize relationships between gene products and experimental conditions (i.e., assays). Although dimensionality reduction methods such as principal component analysis (PCA) have been very useful, their application to identify assay-specific signatures has been limited by the lack of appropriate methodologies. This article proposes a new and powerful PCA-based method for the identification of assay-specific gene signatures in FG studies. 相似文献19.
1. Patterns in phytoplankton diversity in lakes and their relationships with environmental gradients have been traditionally based on taxonomic analyses and indices, even though measures of functional diversity (FD) might be expected to be more responsive to such gradients. 2. We assessed the influence of water column physical structure, and other components of the overall environment, on lake phytoplankton diversity using two taxonomically based indices [species richness (S) and the Shannon index (H’)] and a FD index, to determine whether these different measures respond in similar ways to habitat structure. The study encompassed 45 lakes in Eastern Canada, within two lake districts [the Eastern Townships Region (ETR) and Laurentians Region (LR)] that vary in geology and landscape and in lake morphometry and chemistry. 3. Across all lakes, S and H’ were higher in lakes having greater vertical temperature heterogeneity and higher susceptibility to wind mixing. In addition, H’ declined with total phosphorus concentration. FD was only related to maximum lake depth, a variable that integrates many other habitat features. 4. Further insight into the factors affecting phytoplankton diversity was obtained by contrasting the two regions. The taxonomically based diversity measures differed little between the regions, while FD was higher in the ETR where more trait variants were present and more evenly distributed amongst species. Whereas factors driving S did not differ between the regions, we found region‐dependent patterns in the relationships of H’ and FD with maximum lake depth: both indices decreased with maximum depth in the region with lakes more exposed to wind (ETR) but increased in the more hilly landscape where lakes are more sheltered from wind mixing (LR). 5. Our study demonstrates that, for phytoplankton communities, a FD index can show simpler and stronger responses to environmental drivers than a taxonomically based index, while shedding further light onto the functional traits that are important in particular lake categories. 相似文献