期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Assessing the functional structure of genomic data

Huttenhower C Troyanskaya OG 《Bioinformatics (Oxford, England)》2008,24(13):i330-i338

相似文献

2.

Inferring functional modules of protein families with probabilistic topic models

Sebastian GA Konietzny Laura Dietz Alice C McHardy 《BMC bioinformatics》2011,12(1):141

Background

Genome and metagenome studies have identified thousands of protein families whose functions are poorly understood and for which techniques for functional characterization provide only partial information. For such proteins, the genome context can give further information about their functional context. 相似文献

3.

GSTaxClassifier: a genomic signature based taxonomic classifier for metagenomic data analysis

Fahong Yu Yijun Sun Li Liu William Farmerie 《Bioinformation》2009,4(1):46-49

GSTaxClassifier (Genomic Signature based Taxonomic Classifier) is a program for metagenomics analysis of shotgun DNA sequences. The program includes

a simple but effective algorithm, a modification of the Bayesian method, to predict the most probable genomic origins of sequences at different taxonomical ranks, on the basis of genome databases;
a function to generate genomic profiles of reference sequences with tri-, tetra-, penta-, and hexa-nucleotide motifs for setting a user-defined database;
two different formats (tabular- and tree-based summaries) to display taxonomic predictions with improved analytical methods; and
effective ways to retrieve, search, and summarize results by integrating the predictions into the NCBI tree-based taxonomic information.

GSTaxClassifier takes input nucleotide sequences and using a modified Bayesian model evaluates the genomic signatures between metagenomic query sequences and reference genome databases. The simulation studies of a numerical data sets showed that GSTaxClassifier could serve as a useful program for metagenomics studies, which is freely available at http://helix2.biotech.ufl.edu:26878/metagenomics/. 相似文献

4.

Discovery of biological networks from diverse functional genomic data 总被引：1，自引：0，他引：1

Myers CL Robson D Wible A Hibbs MA Chiriac C Theesfeld CL Dolinski K Troyanskaya OG 《Genome biology》2005,6(13):R114

We have developed a general probabilistic system for query-based discovery of pathway-specific networks through integration of diverse genome-wide data. This framework was validated by accurately recovering known networks for 31 biological processes in Saccharomyces cerevisiae and experimentally verifying predictions for the process of chromosomal segregation. Our system, bioPIXIE, a public, comprehensive system for integration, analysis, and visualization of biological network predictions for S. cerevisiae, is freely accessible over the worldwide web. 相似文献

5.

Exploiting protein structure data to explore the evolution of protein function and biological complexity

Marsden RL Ranea JA Sillero A Redfern O Yeats C Maibaum M Lee D Addou S Reeves GA Dallman TJ Orengo CA 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2006,361(1467):425-440

New directions in biology are being driven by the complete sequencing of genomes, which has given us the protein repertoires of diverse organisms from all kingdoms of life. In tandem with this accumulation of sequence data, worldwide structural genomics initiatives, advanced by the development of improved technologies in X-ray crystallography and NMR, are expanding our knowledge of structural families and increasing our fold libraries. Methods for detecting remote sequence similarities have also been made more sensitive and this means that we can map domains from these structural families onto genome sequences to understand how these families are distributed throughout the genomes and reveal how they might influence the functional repertoires and biological complexities of the organisms. We have used robust protocols to assign sequences from completed genomes to domain structures in the CATH database, allowing up to 60% of domain sequences in these genomes, depending on the organism, to be assigned to a domain family of known structure. Analysis of the distribution of these families throughout bacterial genomes identified more than 300 universal families, some of which had expanded significantly in proportion to genome size. These highly expanded families are primarily involved in metabolism and regulation and appear to make major contributions to the functional repertoire and complexity of bacterial organisms. When comparisons are made across all kingdoms of life, we find a smaller set of universal domain families (approx. 140), of which families involved in protein biosynthesis are the largest conserved component. Analysis of the behaviour of other families reveals that some (e.g. those involved in metabolism, regulation) have remained highly innovative during evolution, making it harder to trace their evolutionary ancestry. Structural analyses of metabolic families provide some insights into the mechanisms of functional innovation, which include changes in domain partnerships and significant structural embellishments leading to modulation of active sites and protein interactions. 相似文献

6.

Testing the ortholog conjecture with comparative functional genomic data from mammals

Nehrt NL Clark WT Radivojac P Hahn MW 《PLoS computational biology》2011,7(6):e1002073

A common assumption in comparative genomics is that orthologous genes share greater functional similarity than do paralogous genes (the "ortholog conjecture"). Many methods used to computationally predict protein function are based on this assumption, even though it is largely untested. Here we present the first large-scale test of the ortholog conjecture using comparative functional genomic data from human and mouse. We use the experimentally derived functions of more than 8,900 genes, as well as an independent microarray dataset, to directly assess our ability to predict function using both orthologs and paralogs. Both datasets show that paralogs are often a much better predictor of function than are orthologs, even at lower sequence identities. Among paralogs, those found within the same species are consistently more functionally similar than those found in a different species. We also find that paralogous pairs residing on the same chromosome are more functionally similar than those on different chromosomes, perhaps due to higher levels of interlocus gene conversion between these pairs. In addition to offering implications for the computational prediction of protein function, our results shed light on the relationship between sequence divergence and functional divergence. We conclude that the most important factor in the evolution of function is not amino acid sequence, but rather the cellular context in which proteins act. 相似文献

7.

A probabilistic neural network approach for modeling and classification of bacterial growth/no-growth data 总被引：3，自引：0，他引：3

Hajmeer M Basheer I 《Journal of microbiological methods》2002,51(2):217-226

In this paper, we propose to use probabilistic neural networks (PNNs) for classification of bacterial growth/no-growth data and modeling the probability of growth. The PNN approach combines both Bayes theorem of conditional probability and Parzen's method for estimating the probability density functions of the random variables. Unlike other neural network training paradigms, PNNs are characterized by high training speed and their ability to produce confidence levels for their classification decision. As a practical application of the proposed approach, PNNs were investigated for their ability in classification of growth/no-growth state of a pathogenic Escherichia coli R31 in response to temperature and water activity. A comparison with the most frequently used traditional statistical method based on logistic regression and multilayer feedforward artificial neural network (MFANN) trained by error backpropagation was also carried out. The PNN-based models were found to outperform linear and nonlinear logistic regression and MFANN in both the classification accuracy and ease by which PNN-based models are developed. 相似文献

8.

Use of RNA structure flexibility data in nanostructure modeling

Kasprzak W Bindewald E Kim TJ Jaeger L Shapiro BA 《Methods (San Diego, Calif.)》2011,54(2):239-250

In the emerging field of RNA-based nanotechnology there is a need for automation of the structure design process. Our goal is to develop computer methods for aiding in this process. Towards that end, we created the RNA junction database, which is a repository of RNA junctions, i.e. internal, multi-branch and kissing loops with emanating stem stubs, extracted from the larger RNA structures stored in the PDB database. These junctions can be used as building blocks for nanostructures. Two programs developed in our laboratory, NanoTiler and RNA2D3D, can combine such building blocks with idealized fragments of A-form helices to produce desired 3D nanostructures. Initially, the building blocks are treated as rigid objects and the resulting geometry is tested against the design objectives. Experimental data, however, shows that RNA accommodates its shape to the constraints of larger structural contexts. Therefore we are adding analysis of the flexibility of our building blocks to the full design process. Here we present an example of RNA-based nanostructure design, putting emphasis on the need to characterize the structural flexibility of the building blocks to induce ring closure in the automated exploration. We focus on the use of kissing loops (KL) in nanostructure design, since they have been shown to play an important role in RNA self-assembly. By using an experimentally proven system, the RNA tectosquare, we show that considering the flexibility of the KLs as well as distortions of helical regions may be necessary to achieve a realistic design. 相似文献

9.

In silico structural and functional analysis of the human TOPK protein by structure modeling and molecular dynamics studies

Palani Kirubakaran Muthusamy Karthikeyan Kh. Dhanachandra Singh Selvaraman Nagamani Kumpati Premkumar 《Journal of molecular modeling》2013,19(1):407-419

Over expression of T-lymphokine–activated killer cell–originated protein kinase (TOPK) has been associated with leukemia, myeloma tumors and various other cancers. The function and regulatory mechanism of TOPK in tumor cells remains unclear. Structural studies that could reveal the regulatory mechanism have been a challenge because of the unavailabity of TOPK’s crystal structure. Hence, in this study, the 3D structure of TOPK protein has been constructed by using multiple templates. The quality and reliability of the generated model was checked and the molecular dynamics method was utilized to refine the model. APBS method was employed to know the electrostatic potential surface of the modeled protein and it was found that the optimum pH for protein stability is 3.4 which will further help in mechanistic hypothesis of TOPK protein. Active site of TOPK was identified from available literature and HTVS was employed to identify the lead molecules. The expected binding modes of protein-ligand complexes were reproduced in the MD simulation which indicates that the complex is relatively stable. The pharmacokinetic properties of the lead molecules are also under acceptable range. TOPK act as a substrate for CDK1 and the protein-protein docking and dynamics studies were carried out to analyze the effect of Thr⁹Ala mutation of TOPK in the two protein complex formation. It shows that the wild type complex is more stable when compared with the mutant type. Such structural information at atomic level not only exhibits the action modes of TOPK inhibitors but also furnishes a novel starting point for structure based drug design of TOPK inhibitors. 相似文献

10.

Catching genomic rearrangements in the act: Integrating DNA breakage models and functional genomics data

下载免费PDF全文

Ignacio Maeso 《BioEssays : news and reviews in molecular, cellular and developmental biology》2015,37(5):470-471

相似文献

11.

Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome

下载免费PDF全文

Bergman CM Pfeiffer BD Rincón-Limas DE Hoskins RA Gnirke A Mungall CJ Wang AM Kronmiller B Pacleb J Park S Stapleton M Wan K George RA de Jong PJ Botas J Rubin GM Celniker SE 《Genome biology》2002,3(12):research0086.1-862

Background

It is widely accepted that comparative sequence data can aid the functional annotation of genome sequences; however, the most informative species and features of genome evolution for comparison remain to be determined.

Results

We analyzed conservation in eight genomic regions (apterous, even-skipped, fushi tarazu, twist, and Rhodopsins 1, 2, 3 and 4) from four Drosophila species (D. erecta, D. pseudoobscura, D. willistoni, and D. littoralis) covering more than 500 kb of the D. melanogaster genome. All D. melanogaster genes (and 78-82% of coding exons) identified in divergent species such as D. pseudoobscura show evidence of functional constraint. Addition of a third species can reveal functional constraint in otherwise non-significant pairwise exon comparisons. Microsynteny is largely conserved, with rearrangement breakpoints, novel transposable element insertions, and gene transpositions occurring in similar numbers. Rates of amino-acid substitution are higher in uncharacterized genes relative to genes that have previously been studied. Conserved non-coding sequences (CNCSs) tend to be spatially clustered with conserved spacing between CNCSs, and clusters of CNCSs can be used to predict enhancer sequences.

Conclusions

Our results provide the basis for choosing species whose genome sequences would be most useful in aiding the functional annotation of coding and cis-regulatory sequences in Drosophila. Furthermore, this work shows how decoding the spatial organization of conserved sequences, such as the clustering of CNCSs, can complement efforts to annotate eukaryotic genomes on the basis of sequence conservation alone. 相似文献

12.

Evolutionary dynamics of taxonomic structure

Michael Foote 《Biology letters》2012,8(6):1070

相似文献

13.

Evolutionary dynamics of taxonomic structure

Foote M 《Biology letters》2012,8(1):135-138

The distribution of species among genera and higher taxa has largely untapped potential to reveal among-clade variation in rates of origination and extinction. The probability distribution of the number of species within a genus is modelled with a stochastic, time-homogeneous birth-death model having two parameters: the rate of species extinction, μ, and the rate of genus origination, γ, each scaled as a multiple of the rate of within-genus speciation, λ. The distribution is more sensitive to γ than to μ, although μ affects the size of the largest genera. The species : genus ratio depends strongly on both γ and μ, and so is not a good diagnostic of evolutionary dynamics. The proportion of monotypic genera, however, depends mainly on γ, and so may provide an index of the genus origination rate. Application to living marine molluscs of New Zealand shows that bivalves have a higher relative rate of genus origination than gastropods. This is supported by the analysis of palaeontological data. This concordance suggests that analysis of living taxonomic distributions may allow inference of macroevolutionary dynamics even without a fossil record. 相似文献

14.

Analyzing the field of bioinformatics with the multi-faceted topic modeling technique

Go Eun Heo Keun Young Kang Min Song Jeong-Hoon Lee 《BMC bioinformatics》2017,18(7):251

Background

Bioinformatics is an interdisciplinary field at the intersection of molecular biology and computing technology. To characterize the field as convergent domain, researchers have used bibliometrics, augmented with text-mining techniques for content analysis. In previous studies, Latent Dirichlet Allocation (LDA) was the most representative topic modeling technique for identifying topic structure of subject areas. However, as opposed to revealing the topic structure in relation to metadata such as authors, publication date, and journals, LDA only displays the simple topic structure.

Methods

In this paper, we adopt the Tang et al.’s Author-Conference-Topic (ACT) model to study the field of bioinformatics from the perspective of keyphrases, authors, and journals. The ACT model is capable of incorporating the paper, author, and conference into the topic distribution simultaneously. To obtain more meaningful results, we use journals and keyphrases instead of conferences and bag-of-words.. For analysis, we use PubMed to collected forty-six bioinformatics journals from the MEDLINE database. We conducted time series topic analysis over four periods from 1996 to 2015 to further examine the interdisciplinary nature of bioinformatics.

Results

We analyze the ACT Model results in each period. Additionally, for further integrated analysis, we conduct a time series analysis among the top-ranked keyphrases, journals, and authors according to their frequency. We also examine the patterns in the top journals by simultaneously identifying the topical probability in each period, as well as the top authors and keyphrases. The results indicate that in recent years diversified topics have become more prevalent and convergent topics have become more clearly represented.

Conclusion

The results of our analysis implies that overtime the field of bioinformatics becomes more interdisciplinary where there is a steady increase in peripheral fields such as conceptual, mathematical, and system biology. These results are confirmed by integrated analysis of topic distribution as well as top ranked keyphrases, authors, and journals.

相似文献

15.

New data about the taxonomic position of Saccaminopsis

JACQUES PONCET 《Lethaia: An International Journal of Palaeontology and Stratigraphy》1989,22(4):425-429

Succaminopsis was transferred from the Foraminiferida to the Algae by Skompski in 1986 (Acta Geologica Polonica 36). A study of the early void-filling cementation within 'chambers' led to the identification of a fossilized organic wall, the presence of which, viewed as the fossil organic wall of the stem-cell of a dasycladacean alga, reinforces the new taxonomic attribution of Saccaminopsis. □ Foraminiferida, Algae, cementation, Middle Carboniferous, Algerian Sahara. 相似文献

16.

Integrative analysis of genomic, functional and protein interaction data predicts long-range enhancer-target gene interactions

Rödelsperger C Guo G Kolanczyk M Pletschacher A Köhler S Bauer S Schulz MH Robinson PN 《Nucleic acids research》2011,39(7):2492-2502

相似文献

17.

Prediction of the exon-intron structure by comparison of genomic sequences

P. S. Novichkov M. S. Gelfand A. A. Mironov 《Molecular Biology》2000,34(2):200-206

An algorithm for prediction of the exon-intron structure of higher eukaryotic genes is suggested. The algorithm is based on comparison of genomic sequences of homologous genes from different species. It uses the fact that protein-coding sequences evolve slower than noncoding regions. Unlike the existing comparison methods, the proposed algorithm, which is a modified version of splicing alignment, compares not nucleotide but amino acid sequences, which increases its sensitivity. Conservation of the exon-intron structures of the compared genes is not assumed. The algorithm is implemented in the program Pro-Gen. The testing of the algorithm demonstrated that it can be successfully applied to prediction of vertebrate genes, and in some cases, for more distant comparisons (e.g., vertebrates and insects or nematodes). Thus, the program can be used for prediction of human genes by comparison with genes of model organisms: mouse, fugu, drosophila, and nematode. The algorithm overcomes deficiencies of the existing methods, both statistical (insufficient reliability) and similarity-based (inapplicability to completely new genes). 相似文献

18.

A novel data mining method to identify assay-specific signatures in functional genomic studies

Derrick K Rollins Dongmei Zhai Alrica L Joe Jack W Guidarelli Abhishek Murarka Ramon Gonzalez 《BMC bioinformatics》2006,7(1):377

Background:

The highly dimensional data produced by functional genomic (FG) studies makes it difficult to visualize relationships between gene products and experimental conditions (i.e., assays). Although dimensionality reduction methods such as principal component analysis (PCA) have been very useful, their application to identify assay-specific signatures has been limited by the lack of appropriate methodologies. This article proposes a new and powerful PCA-based method for the identification of assay-specific gene signatures in FG studies. 相似文献

19.

Patterns in taxonomic and functional diversity of lake phytoplankton

MARIA L. LONGHI BEATRIX E. BEISNER 《Freshwater Biology》2010,55(6):1349-1366

1. Patterns in phytoplankton diversity in lakes and their relationships with environmental gradients have been traditionally based on taxonomic analyses and indices, even though measures of functional diversity (FD) might be expected to be more responsive to such gradients. 2. We assessed the influence of water column physical structure, and other components of the overall environment, on lake phytoplankton diversity using two taxonomically based indices [species richness (S) and the Shannon index (H’)] and a FD index, to determine whether these different measures respond in similar ways to habitat structure. The study encompassed 45 lakes in Eastern Canada, within two lake districts [the Eastern Townships Region (ETR) and Laurentians Region (LR)] that vary in geology and landscape and in lake morphometry and chemistry. 3. Across all lakes, S and H’ were higher in lakes having greater vertical temperature heterogeneity and higher susceptibility to wind mixing. In addition, H’ declined with total phosphorus concentration. FD was only related to maximum lake depth, a variable that integrates many other habitat features. 4. Further insight into the factors affecting phytoplankton diversity was obtained by contrasting the two regions. The taxonomically based diversity measures differed little between the regions, while FD was higher in the ETR where more trait variants were present and more evenly distributed amongst species. Whereas factors driving S did not differ between the regions, we found region‐dependent patterns in the relationships of H’ and FD with maximum lake depth: both indices decreased with maximum depth in the region with lakes more exposed to wind (ETR) but increased in the more hilly landscape where lakes are more sheltered from wind mixing (LR). 5. Our study demonstrates that, for phytoplankton communities, a FD index can show simpler and stronger responses to environmental drivers than a taxonomically based index, while shedding further light onto the functional traits that are important in particular lake categories. 相似文献

20.

Discovering hotspots in functional genomic data superposed on 3D chromatin configuration reconstructions

Daniel Capurso Henrik Bengtsson Mark R. Segal 《Nucleic acids research》2016,44(5):2028-2035

相似文献